Google Groupes

Re: Closures


Max Hailperin 15 janv. 2006 06:32
Envoyé au groupe : comp.lang.scheme
A closure is what you might call "a procedure object".  It is the
value that is created by evaluating a lambda expression.  For example,
when you evaluate (lambda (x) (+ x 2)), a closure results, which your
particular Scheme system might print out as something like
#<procedure>.

Now, having dispensed with the easy part, I need to explain why this
is called a closure; in particular, I need to explain what is "closed"
as opposed to "open".

An expression can have variable names in it; the above example has
two, namely x and +.  (Many beginners forget that + is just a name in
Scheme.)  Those names can be either "bound" or "free"; in this
example, x is bound and + is free.  A variable is bound in a
particular context if the language structure (such as lambda or let)
that binds it is included, and free otherwise.  A variable that is
free in a narrowly focused context might be bound in a broader
context.  So, for example, although + is free in the above lambda
expression, it is bound in (let ((+ *)) (lambda (x) (+ x 2))).

An expression with no free variables is "closed", whereas one with
free variables is "open".  Thus, the lambda expression and even the
larger let expression are open. (The let expression has * free in it.)
An example of a closed expression would be (lambda (f) (f 2)).

The procedure object created by evaluating a closed lambda expression
can be represented in a way that depends only on the lambda expression,
without any information about the context in which the lambda
expression was evaluated.  For example, any evaluation of (lambda (f)
(f 2)) will produce essentially the same procedure object.

The procedure object created by evaluating an open lambda expression,
on the other hand, depends on the context, because the free variables
are bound in the context.  Evaluating (lambda (x) (+ x 2)) in one
context might create a procedure for adding two, whereas evaluating it
in another context might create a procedure for multiplying by two, as
the earlier examples suggest.

The relevant context may be dynamic in nature; consider for example

(define cat
  (lambda (x)
    (lambda (f)
      (f x))))

(define at-two (cat 2))

(define at-four (cat 4))

(at-two sqrt) => 1.4142135623730951

(at-four sqrt) => 2

In this example, the lambda expression (lambda (f) (f x)) is evaluated
twice.  The static context is the same in both cases: there is only
one copy of the (lambda (f) (f x)), embedded within (lambda (x) ...).
However, there are two different dynamic contexts, one with x bound to
2 (or a memory location containing 2), the other with x bound to 4 (or
a memory location containing 4).  Thus, we wind up with two different
procedure objects, one for applying a procedure to 2 and one for
applying a procedure to 4, which are just as different from each other
as the procedure for adding 2 is from the procedure for multiplying by
2.

The preceding paragraph contains parenthetical remarks about memory
locations.  Those remarks are relevant if you want to start thinking
about how set! works.  Let's put that off and continue with the
simpler fiction that x is bound directly to the number 2 or 4.

There are two possible ways in which the procedure objects can be
created from evaluating an open lambda expression.  One is to first
perform substitution, so that the lambda expression becomes closed.
In the above examples, we would substitute in the value of x (either 2
of 4) and arrive at (lambda (f) (f 2)) or (lambda (f) (f 4)).  This
substitution process can be used even when the values in question
aren't numbers, but doing so may go outside the expressive ability of
the Scheme language.  For example, substituting the addition procedure
for + in (lambda (x) (+ x 2)) would produce something that you could
almost write as (lambda (x) ('#<primitive:+> x 2)) except that this
isn't legal Scheme; there is no way in Scheme to write a constant that
is a primitive procedure.  Nonetheless, internally to a Scheme system
something like this could happen, even if the user doesn't have a way
to write it externally.

The second approach is to represent each procedure object as a pair of
two things: the code of the procedure (which may contain free
variables) and an environment, that is, a record of what those free
variables are bound to.  Informally, I like to think of this as a
"where clause," in the sense that we would have objects like the
following lines of text:

(lambda (x) (+ x 2)), where + = #<primitive:+>

(lambda (x) (+ x 2)), where + = #<primitive:*>

(lambda (f) (f x)), where x = 2

(lambda (f) (f x)), where x = 4.

In each of these examples, the procedure object is the entire line of
text, including both the lambda expression and the where clause.
Notice that although the lambda expression is open, the line as a
whole is closed, because the where clause provides the bindings for
the variables that appear free in the lambda expression.

Because these pairs of code plus environment wrap an open lambda
expression into a closed procedure object, they are called "closures".
This is the representation that is normally used in Scheme
implementations -- or practical implementations of any programming
language where similar issues arise.  The substitution method,
although elegant, is not efficient enough to be used in practice.

The same two approaches can be used even if the names are bound to
mutable objects rather than immutable values.  You just need to be
careful about what substitution means: to substitute a mutable object in
for each occurrence of x does not mean to create multiple copies of the
mutable object.  Instead, the object remains off to the side, and
references to the object get substituted in.  This technique can also
be used to explain Scheme's variables, which are mutable using set!.
What gets substituted in for x, or bound to x in an environment, is a
reference to a memory location.

SICP is misleading in this regard.  It indicates that the substitution
model only works in absence of set!.  Moreover, it suggests that even
with set!, the environments can bind names directly to values, rather
than to memory locations containing values.  The only reason why that
works is because the SICP environments contain memory locations on a
semi-hidden basis; the environments themselves are mutable. The
consequence is that the representation of the environments becomes
very important; the sharing of frames within a tree is necessary in
order that the underlying memory locations are shared.  By contrast,
if the notion of setable memory location were separated out from the
notion of naming, then not only would the substitution model still
work, but also the representation of environments would become
irrelevant; they could be shared or copied at will.

I hope this clarified more than it confused, at least up until the
last couple paragraphs.  Those I kept too brief to be really very
illuminating, because I had wandered so far from the original
question.  -max