2 Introduction
This section introduces several concepts
relating to names and bindings of those names.
Of particular note is scope, the “visibility” of names
throughout a program.
2.1 What's a binding?
A binding is a general term that refers
to any association between an entity and an attribute.
- This is a very vague definition, which allows us to apply it
to a great many instances.
The entities are “syntactic units” of the language.
- That is, some piece of the language syntax.
The attributes may happen to also an entity of the language,
or may be a more “external” concept.
For instance,
- a variable (entity) will usually have a name (attribute)
- and often a type (attribute),
- a type (entity) should have also have a name (attribute.)
In these notes, we primarily discuss bindings of types, lifetimes and scopes.
Other bindings may come up later in the course.
2.2 Uses of names
Many entities may be have names bound to them;
- variables, constants, parameters/arguments,
- functions, methods, procedures, subroutines,
- modules, packages, contexts, typeclasses, records, and
- types,
to name a few.
2.3 Persistance of names
Take note!
- Some namings are “global”; they persist for “all time and space”.
- Such as the namings of builtin language methods,
- (assuming the language does not allow their names to be reused
or shadowed.)
- But many namings are temporary!
- And one instance of a name may shadow another.
- (Such as when you declare a local variable/argument
with the same name as one already in scope.)
2.4 A note about time and space
Throughout these notes, we will discuss when and where bindings
are in effect.
These discussions of “time” and “space” are relative to the program.
- Our notion of “time” is in terms of the program's runtime.
- Or occasionally, its design time.
- Our notion of “space” is in terms of the program's text.
2.5 Static and dynamic bindings (or “checking of bindings”)
Bindings which are decided (or known) before runtime
are called static.
Conversely, bindings which are not decided (or known) until during runtime
are called dynamic.
We say “or known” because some bindings, such as the types of values,
are not so much “decided” as “discovered”.
- All values have a type, whether or not that type is checked
statically or dynamically.
3 Type checking and polymorphism
Type checking is the process of determining
the binding between a name of an entity and a type.
- Usually, only variables, arguments, functions and similar entities
have types.
- There is at least one exception to this;
sometimes, types may have a type!
- We will discuss types in more detail in the next section.
3.1 “Dynamically checked” instead of “dynamically typed”
As with many bindings, languages can be categorised as
being either statically or dynamically typed.
In “Types and Programming Languages”, though, Pierce argues
that the term “dynamically typed language” is a misnomer.
- It would be better to say “dynamically (type) checked”.
- All values have a type, even before type checking.
3.2 Static and dynamic type checking
It's somewhat natural for interpreted languages to be dynamically typed,
and compiled languages to be statically typed.
- The process of type checking is an upfront cost.
- Compilation is a natural time to carry out that costly process.
- Consider the (primarily) interpreted “scripting” languages;
Python, Ruby, Javascript, Lua, Perl, PHP, etc.
- But it's far from universally true that interpreters use
dynamic typing!
- Haskell has an interpreter, and is definitely
statically typed.
3.3 Polymorphism
Assigning each value in a program a single type, and then
enforcing type correctness (preventint type clashes)
introduces a (solveable) problem; it prohibits code reuse!
- Subroutines can only be used on a particular type of arguments.
Subtyping and polymorphism provide solutions to this.
- With subtyping, there is a sub/super relation between types.
- One classic examples involves a type of “shapes”, which has
a subtype “polygon”, which in turn has subtypes “triangle”
and “rectangle”, etc.
- Subtypes closely resemble the notion of subsets.
- Values of a subtype can be used anywhere
a value of their supertype is expected.
- Sub-classing is one instance of subtyping.
- And is usually just called subtyping.
- Subranges and, sometimes, enumeration types are other examples.
- With polymorphism, a subroutine can have several types.
3.4 Ad hoc polymorphism (overloading)
One notion of polymorphism is ad hoc polymorphism,
also called overloading.
- With overloading, subroutine names
can be reused as long as the types of arguments
differ (in some specified way.)
- The downside is the programmer must define
the “same” subroutine many times; once for each type.
Here is an example of ad hoc polymorphism in Scala.
def sum(a: Int, b: Int): Int = a + b
def sum(a: Boolean, b: Boolean): Boolean = a || b
3.5 Parametric polymorphism
With parametric polymorphism, subroutines have a
most general type, based on the shape or form of their arguments.
- This involves the use of type variables.
- The subroutines behaviour can only be based on the form,
not the specific types.
- This is commonly used in functional languages.
- Several object-oriented languages have generics, which are
essentially the same concept.
Here is an example of parametric polymorphism in Scala.
def swap[A](p : Tuple2[A,A]): Tuple2[A,A] = p match {
case (p1, p2) => (p2, p1)
}
3.6 Duck typing
More formally called row polymorphism.
- Duck typing does not actually check the types of values.
- It is used when calling a method on a value;
the only check is that that method is defined for that value.
- “If it walks like a duck, and quacks like a duck, it's a duck!”
Here is an example of duck typing in Ruby.
class Thing1
def initialize(v,w); @v = v; @w = w end
def do_the_thing() "I'll implement this later." end
end
class Thing2
def initialize(v,w); @v = v; @w = w end
def do_the_thing() @v + @w end
end
def doit(x) x.do_the_thing end
x = Thing1.new(1,2)
y = Thing2.new(1,2)
puts doit(x) # Outputs a string
puts doit(y) # Outputs an integer
4 Lifetime
The lifetime of an entity is the portion of the runtime
(not the code) during which that entity has allocated memory.
To discuss lifetimes, we first need to discuss
the layout of memory and the process of memory allocation.
4.1 The data segment, stack, and heap
In the context of running a program (or a thread of a program):
- The data segment is a portion of memory used to store the
program itself and static/global variables.
- The stack is an organised portion of memory allocated for the program.
- Blocks at the “top” of the stack are reserved when entering
a scope.
- These blocks are called stack frames.
- Stack frames are popped off the stack when leaving the unit of code.
- The heap is an unorganised portion of memory allocated for the program.
- Allocation/deallocation may be independent of units of code.
- Disorganisation can lead to inefficiency and other problems.
- Cache misses, page faults, etc.
- Running out of memory or appropriately sized continguous blocks.
- Garbage, dangling references, etc.
4.3 Kinds of memory allocation
We can distinguish four kinds of memory allocation;
- static,
- stack dynamic,
- implicit heap dynamic, and
- explicit heap dynamic.
4.4 Static and stack dynamic allocation
- Static
- Allocation is done before runtime (static), in the data segment.
- At load time, when the program is loaded into memory.
(Not compile time.)
- Stack dynamic
- Allocation is dynamic, automatic, and on the stack.
- Amount of memory needed must be known when entering their scope.
- This is “the usual method” in most imperative languages.
4.5 Implicit and explicit heap dynamic allocation
- Implicit heap dynamic
- Allocation is dynamic, automatic, and on the heap.
- Memory is simply allocated when the variable is assigned;
the programmer does not need to “do anything”.
- Explicit heap dynamic
- Allocation is dynamic, manual, and on the heap.
- The programmer must use a command to allocate memory
before assignments can be made.
- In fact these blocks of memory can be considered
nameless variables; we need other,
reference variables to point to their address.
4.6 Exercise: Memory allocation methods
Investigate:
- Which of the five kinds of memory allocation are available in
Scala and in Ruby.
- How each kind of memory allocation is denoted.
4.7 The lifetime
To repeat,
the lifetime of an entity is the portion of the runtime
(not the code) during which that entity has allocated memory.
- There may be several instances of a single variable in the code,
each with its own lifetime.
- The lifetime of a variable depends upon the memory allocation
strategy used for it.
- Static: lifetime is the whole of runtime.
- Stack dynamic: lifetime is the portion of runtime between
when the scope containing the variable is entered
and when control returns to the invoker of that scope.
- In the case of functions/procedures, local variables
end their lifetime when the function/procedure returns.
- Note that a variable may go out of scope, but still be alive.
- Implicit heap-dynamic: lifetime begins when assigned.
The end of lifetime depends upon garbage collection.
- Explicit heap-dynamic: lifetime begins when the allocation
command is used. The end of lifetime may be
- when the programmer issues a deallocation command, or
- depend upon garbage collection.
4.8 Garbage collection
Garbage is allocated memory which is no longer accessible.
Garbage collection is the process of deallocating garbage.
- A complex activity.
- We will discuss the basics here, and then likely not
touch on it again.
- Has a large influence on the efficiency of
compiled/interpreted code.
- There are two main categories of garbage collection algorithms;
- tracing,
- which includes the “mark & sweep” algorithm, and
- reference counting.
- Additionally, language implementations may include compile time
“escape analysis”.
- Memory may be allocated off the stack instead of the heap
if no references to the variable are available outside its scope
(if it doesn't “escape”.)
4.8.1 Tracing garbage collection (mark & sweep)
- Usually triggered when a certain memory threshold is reached.
- Starting from some set of base references
(usually those in the stack and data segment),
“trace” references, somehow marking those that are reachable.
- Starting from the set of base references,
“determine the transitive closure of reachability”.
- Once done, free any memory that is not marked.
- “Naive” mark & sweep:
- Each object in memory has a bitflag for this “marking”,
kept clear except during garbage collection.
- When activated, traverse all reachable memory,
“marking” what's reached, then “sweep” away all unmarked memory
(and reset the flags of marked memory.)
- With this naive approach, we have to pause the program to do this;
- better methods do not require pausing.
4.8.2 Reference counting garbage collection
- An “eager” approach.
- As the program runs, keep a count of the number of references
to every object in memory.
- So we must perform some tallying on every assignment!
- When the number of references reaches zero, we can free the memory
for that object.
- This approach needs to account for cycles, though!
- Every object in a cyclic (portion of a) data structure
will always have at least one pointer to it,
even if the structure as a whole is unreachable.
- This necessity increases the overhead.
4.9 Dangling/wild references
Garbage is not the only problem that can occur with
references to memory.
A dangling or wild reference is a reference to memory
which has already been freed.
- Presumably, by a programmer's action.
- After being freed, memory may be left as is until it
is allocated again, or may be wiped in some way.
- Accessing memory that is no longer reserved
usually leads to undefined behaviour.
We will discuss methods of preventing use of wild/dangling references
when we discuss reference types later in the course.
5 Scope
A scope (singular) is a portion of the program to which the visibility
of entities may be limited.
- This is a design decision: what constructs introduce scopes?
- Almost certainly subroutines do.
- Do conditional branches? Loop bodies?
- As a general term, within this section we say block for a construct which
introduces a scope.
- Outside of this section, we have used and will continue to use the
singular scope for this, but here that will conflict with
the adjective scope we are about to define.
5.1 Static scoping
The scope (adjective) of an entity is the portion of the program
in which it is “visible”.
- Usually scope is statically determined.
- It is usually the block in which it is defined,
and all subblocks of that block,
- unless it is shadowed by another entity of the same name.
- Static scoping is so pervasive, you have likely
never come across a language where scope works differently,
outside of differences in what constructs introduce scope.
5.2 Dynamic scoping
With dynamic scoping, the scope of entities depends upon the
“execution trace”, i.e., the “path” through the program.
- Dynamic scoping is rarely used.
- The detriments strongly outweigh the benefits in most cases.
- The one major benefit is relief from having to pass arguments;
5.3 Exercise: On dynamic scoping
- Historically, Lisps (languages directly descended from Lisp)
were dynamically scoped.
- Presently, the only widely used Lisp which uses dynamic scoping
by default is Emacs Lisp (eLisp.)
Try out this snippet of Emacs lisp code,
either in Emacs (if you have it) or using an
online REPL.
(let ((x 2) (y 5)) ; "global" variables x and y
(progn
(defun mult-x-y ()
(* x y)) ; returns x * y
(defun A ()
(let ((x 3)) ; local variable x
(mult-x-y)))
(defun B ()
(let ((y 10)) ; local variable y
(A)))
(message "mult-x-y returns %d, A returns %d and B returns %d"
(mult-x-y) (A) (B))))
- Ensure you understand the results.
- Using this understanding, formulate some advantages and disadvantages
of dynamic scoping.
5.4 Exercise: Entities which introduce scope
In a few different programming languages, investigate whether
condition branches and loop bodies introduce scopes.
I.e., test out code such as
if B then
int x = 0
endif
y = x % Is x still in scope?
Specifically, investigate languages which have iterating loops,
(usually called for
loops.) What is the scope of an iterator
for such loops?