At first, I kind of thought of a closure as a block of code with its
own scope. Then I read the wikipedia definition of closure, and it said
that closures store information across function calls, so I thought of
static variables in these blocks of code.
Recently I read a closure was a "generic representation of a callback"
(http://www.le-hacker.org/papers/gobject/ch03.html)
When I think "callback", the example that comes to mind is one in
JavaScript. There is a built in sort method which has default
alphabetical behavior. However it can be modified to work in a
different way (e.g. numerically), or even on different types, like:
// Assume an array of Person objects might need to be sorted by age or
name
// That's easy. Just create two callbacks.
function ageCallback(a,b)
{
return a.age - b.age;
}
function nameCallback(a,b)
{
if (name.a < name.b)
return -1;
else if (name.b > name.a)
return 1;
else return 0;
}
arrayOfPersonObjects.sort(ageCallback); // sort by age
arrayOfPersonObjects.sort(nameCallback); // sort by name
This is a nifty ability, I think. But, um, I can't figure out what it
has to do with closures! And I'm getting kind of confused trying to
put all of these "closure" definitions together. Yeah, so any help
would be great.
> So I'm trying to get my head around exactly what a "closure" is. I've
> read what SICP says about closures via all of the references listed in
> the index. I understand the general concept of closures, but not the
> full implications.
> Part of my problem is that I have a lot of other
> terms running through my head from other languages (namely, JavaScript,
> Java, C++), and they seem to be getting in the way of understanding,
> instead of facilitating it!
>
Well, in Java and C++, an object is some kind of lame closure. If you have
closures in your languages, you probably can make objects as well.
By the way, according to this page: http://www.paulgraham.com/icad.html ,
javascript do support closures:
function foo(n) {
return function (i) {
return n += i } }
...which in scheme is:
(define (foo n)
(lambda (i)
(+ i n)))
In C, you could make a manual closure by doing something like this:
static int innerfunction(closure *c,int i){
return c->n+i:
}
closure* foo(int n){
closure* ret=calloc(1,sizeof(closure));
closure->func=innerfunction;
closure->n=n;
return closure;
}
> At first, I kind of thought of a closure as a block of code with its
> own scope.
Hmm, yeah. Something like that.
> Then I read the wikipedia definition of closure, and it said
> that closures store information across function calls, so I thought of
> static variables in these blocks of code.
No, thats not what it means.
> Recently I read a closure was a "generic representation of a callback"
That sounds like a very confusing definition... :-)
--
> closure* foo(int n){
> closure* ret=calloc(1,sizeof(closure));
> closure->func=innerfunction;
> closure->n=n;
> return closure;
> }
>
Oops. Last line in that function should be "return ret;" :-)
--
> On Sat, 14 Jan 2006, H. wrote:
>
>> So I'm trying to get my head around exactly what a "closure" is. I've
>> read what SICP says about closures via all of the references listed in
>> the index. I understand the general concept of closures, but not the
>> full implications.
>> Part of my problem is that I have a lot of other
>> terms running through my head from other languages (namely, JavaScript,
>> Java, C++), and they seem to be getting in the way of understanding,
>> instead of facilitating it!
>>
>
> Well, in Java and C++, an object is some kind of lame closure. If you have
> closures in your languages, you probably can make objects as well.
>
> By the way, according to this page: http://www.paulgraham.com/icad.html ,
> javascript do support closures:
>
> function foo(n) {
> return function (i) {
> return n += i } }
>
> ...which in scheme is:
>
> (define (foo n)
> (lambda (i)
> (+ i n)))
>
And that last one should have been like this:
(define (foo n)
(lambda (i)
(set! n (+ n i))
n))
Well, time for bed...
--
Now, having dispensed with the easy part, I need to explain why this
is called a closure; in particular, I need to explain what is "closed"
as opposed to "open".
An expression can have variable names in it; the above example has
two, namely x and +. (Many beginners forget that + is just a name in
Scheme.) Those names can be either "bound" or "free"; in this
example, x is bound and + is free. A variable is bound in a
particular context if the language structure (such as lambda or let)
that binds it is included, and free otherwise. A variable that is
free in a narrowly focused context might be bound in a broader
context. So, for example, although + is free in the above lambda
expression, it is bound in (let ((+ *)) (lambda (x) (+ x 2))).
An expression with no free variables is "closed", whereas one with
free variables is "open". Thus, the lambda expression and even the
larger let expression are open. (The let expression has * free in it.)
An example of a closed expression would be (lambda (f) (f 2)).
The procedure object created by evaluating a closed lambda expression
can be represented in a way that depends only on the lambda expression,
without any information about the context in which the lambda
expression was evaluated. For example, any evaluation of (lambda (f)
(f 2)) will produce essentially the same procedure object.
The procedure object created by evaluating an open lambda expression,
on the other hand, depends on the context, because the free variables
are bound in the context. Evaluating (lambda (x) (+ x 2)) in one
context might create a procedure for adding two, whereas evaluating it
in another context might create a procedure for multiplying by two, as
the earlier examples suggest.
The relevant context may be dynamic in nature; consider for example
(define cat
(lambda (x)
(lambda (f)
(f x))))
(define at-two (cat 2))
(define at-four (cat 4))
(at-two sqrt) => 1.4142135623730951
(at-four sqrt) => 2
In this example, the lambda expression (lambda (f) (f x)) is evaluated
twice. The static context is the same in both cases: there is only
one copy of the (lambda (f) (f x)), embedded within (lambda (x) ...).
However, there are two different dynamic contexts, one with x bound to
2 (or a memory location containing 2), the other with x bound to 4 (or
a memory location containing 4). Thus, we wind up with two different
procedure objects, one for applying a procedure to 2 and one for
applying a procedure to 4, which are just as different from each other
as the procedure for adding 2 is from the procedure for multiplying by
2.
The preceding paragraph contains parenthetical remarks about memory
locations. Those remarks are relevant if you want to start thinking
about how set! works. Let's put that off and continue with the
simpler fiction that x is bound directly to the number 2 or 4.
There are two possible ways in which the procedure objects can be
created from evaluating an open lambda expression. One is to first
perform substitution, so that the lambda expression becomes closed.
In the above examples, we would substitute in the value of x (either 2
of 4) and arrive at (lambda (f) (f 2)) or (lambda (f) (f 4)). This
substitution process can be used even when the values in question
aren't numbers, but doing so may go outside the expressive ability of
the Scheme language. For example, substituting the addition procedure
for + in (lambda (x) (+ x 2)) would produce something that you could
almost write as (lambda (x) ('#<primitive:+> x 2)) except that this
isn't legal Scheme; there is no way in Scheme to write a constant that
is a primitive procedure. Nonetheless, internally to a Scheme system
something like this could happen, even if the user doesn't have a way
to write it externally.
The second approach is to represent each procedure object as a pair of
two things: the code of the procedure (which may contain free
variables) and an environment, that is, a record of what those free
variables are bound to. Informally, I like to think of this as a
"where clause," in the sense that we would have objects like the
following lines of text:
(lambda (x) (+ x 2)), where + = #<primitive:+>
(lambda (x) (+ x 2)), where + = #<primitive:*>
(lambda (f) (f x)), where x = 2
(lambda (f) (f x)), where x = 4.
In each of these examples, the procedure object is the entire line of
text, including both the lambda expression and the where clause.
Notice that although the lambda expression is open, the line as a
whole is closed, because the where clause provides the bindings for
the variables that appear free in the lambda expression.
Because these pairs of code plus environment wrap an open lambda
expression into a closed procedure object, they are called "closures".
This is the representation that is normally used in Scheme
implementations -- or practical implementations of any programming
language where similar issues arise. The substitution method,
although elegant, is not efficient enough to be used in practice.
The same two approaches can be used even if the names are bound to
mutable objects rather than immutable values. You just need to be
careful about what substitution means: to substitute a mutable object in
for each occurrence of x does not mean to create multiple copies of the
mutable object. Instead, the object remains off to the side, and
references to the object get substituted in. This technique can also
be used to explain Scheme's variables, which are mutable using set!.
What gets substituted in for x, or bound to x in an environment, is a
reference to a memory location.
SICP is misleading in this regard. It indicates that the substitution
model only works in absence of set!. Moreover, it suggests that even
with set!, the environments can bind names directly to values, rather
than to memory locations containing values. The only reason why that
works is because the SICP environments contain memory locations on a
semi-hidden basis; the environments themselves are mutable. The
consequence is that the representation of the environments becomes
very important; the sharing of frames within a tree is necessary in
order that the underlying memory locations are shared. By contrast,
if the notion of setable memory location were separated out from the
notion of naming, then not only would the substitution model still
work, but also the representation of environments would become
irrelevant; they could be shared or copied at will.
I hope this clarified more than it confused, at least up until the
last couple paragraphs. Those I kept too brief to be really very
illuminating, because I had wandered so far from the original
question. -max
The way I think of closures is that they are a chunk of code with
perhaps *some* but not necessarily *all* the variables bound.
> At first, I kind of thought of a closure as a block of code with its
> own scope.
In general, you want the bound variables in the closure to be
more-or-less private (nested closures have nested scope, so its not
*exacty* private). The Lisp machine allowed you to close over the
binding cells of special-bound variables, but that was a *real* hack.
> Then I read the wikipedia definition of closure, and it said
> that closures store information across function calls, so I thought of
> static variables in these blocks of code.
They are quasi-static. If I make a closure, nothing says I can only
use it once. The closed-over bindings will persist and be present on
each call to the closure.
> Recently I read a closure was a "generic representation of a callback"
> (http://www.le-hacker.org/papers/gobject/ch03.html)
Heh, heh. Not `a callback is a primitive, half-working implementation
of a closure'?
Again, we have code we will invoke, *some* of the bindings are
established (when we create the callback) and the remaining are
established when the callback is invoked.
The difference with a callback is that it is expected that the closure
will have a dynamic extent not much larger than the procedure called.
(So you can allocate the closure on the stack and destroy it when the
called procedure returns rather than leaving it up to the GC.
> When I think "callback", the example that comes to mind is one in
> JavaScript. There is a built in sort method which has default
> alphabetical behavior. However it can be modified to work in a
> different way (e.g. numerically), or even on different types, like:
>
> // Assume an array of Person objects might need to be sorted by age or
> name
> // That's easy. Just create two callbacks.
> function ageCallback(a,b)
> {
> return a.age - b.age;
> }
>
> function nameCallback(a,b)
> {
> if (name.a < name.b)
> return -1;
> else if (name.b > name.a)
> return 1;
> else return 0;
> }
>
> arrayOfPersonObjects.sort(ageCallback); // sort by age
> arrayOfPersonObjects.sort(nameCallback); // sort by name
>
> This is a nifty ability, I think. But, um, I can't figure out what it
> has to do with closures! And I'm getting kind of confused trying to
> put all of these "closure" definitions together. Yeah, so any help
> would be great.
In this example, you aren't really closing over anything in the
callback. Let me make it a bit more complicated. Let's suppose you
have defined your own special array-like data structure that you'd like
to sort. But your data structure allows the user to specify the sort
key at the time the structure is created. I'll write this in Scheme
because I don't want to spend the afternoon debugging JavaScript.
(define make-my-structure cons)
(define sort-key car)
(define underlying-sequence cdr)
(define (make-sort-callback sort-key)
(lambda (l r)
(< (sort-key l) (sort-key r))))
(define (sort-my-sequence ms)
(sort (underlying-sequence ms) (make-sort-callback (sort-key ms))))
The callback in sort needs to know the sort key, but we can't write a
specific callback for each sort key because we don't know what the user
is going to choose. Instead, we contruct the sort callback by closing
over the sort key and hand this callback to the sort procedure.
It's more obvious what this has to do with closures when you rewrite it
like this:
function cmpBy(field) {
return function (a,b) {
if (a[field] < b[field) return -1;
else if (a[field] > b[field]) return 1;
else return 0;
}
}
arrayOfPersonObjects.sort(cmpBy('age'));
arrayOfPersonObjects.sort(cmpBy('name'));
When someone says "callback" in C, they mean a callback to a
statically-defined function, but if you can create functions at
run-time, and those functions can "close over" their run-time
environment, callbacks are much more practical.
True. The better C API designers realize this and take both a function
pointer and a pointer to custom data the function might want. The
Macintosh operating systemm & toolbox were full of this from day #1, for
example (look for all the arguments called "refCon"), but in most
languages you need to use mucky void pointers and unsafe casting.
I came to this thread late and I'm wondering why you changed the syntax
of the field access in your closure? i.e. at least in the langauge I
normally use it can be:
define function cmpBy(field)
method(a, b)
if (a.field < b.field)
-1
elseif (a.field > b.field)
1
else
0
end
end method
end cmpBy
sort!(arrayOfPersonObjects, test: cmpBy(age));
sort!(arrayOfPersonObjects, test: cmpBy(name));
(Note: this is written for consistency with the previous code but is not
actually correct for the built-in sort! in Dylan -- the closure should
be just:
method(a, b) a.field < b.field end
)
--
Bruce | 41.1670S | \ spoken | -+-
Hoult | 174.8263E | /\ here. | ----------O----------
Hmpf. Let me get this right: :-)
typedef struct _closure{
int (*func)(struct _closure*,int);
int n;
}closure;
static int innerfunction(closure *c,int i){
c->n+=i;
return c->n:
}
closure* foo(int n){
closure* ret=calloc(1,sizeof(closure));
ret->func=innerfunction;
ret->n=n;
return ret;
}
Or in scheme:
(define (foo n) (lambda (i) (set! n (+ i n)) n))
--
Wow. Thanks. That's exceedingly helpful. If only I had found *your*
explanation before all those other ones, I could have saved a bit of
time. ;-).
Hmm, I think you are doing this the wrong way. A closure is something
that is supposed to be totally obvious because you have used its concept
hundreds of times already while programming. Do some assignments or make
an application. :-)
--
Right. Well. Actually I had a paying job and made several extensions to
a commercial product, some of which shipped with the product and some
of which are freeware which have seen about 200,000 downloads.
But I was self-taught, and in languages that aren't Scheme, and
closures work differently in other languages, and I wasn't approaching
things very formally anyway. The study of Scheme requires a more formal
approach than I'm used to, which is not a bad thing.
Maybe closures are really obvious if you program hundreds of time in
Scheme. I'm just taking a shortcut and deciding that knowing the
definition before I discover it through making hundreds of programs
might not be a bad thing. :-)
http://www-128.ibm.com/developerworks/linux/library/l-highfunc.html
Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017
In Javascript, a.field means the field literally named "field" -- i.e.
the field-accessor symbol is implicitly quoted, just like C or Java.
Unlike in C or Java, objects can be used as hashes, so a[field] gets the
property named by evaluating field normally.
Hey, that's a really good point -- I think that's even closer to how I
tend to mentally model a closure when I'm thinking about it. It's also a
nice way of explaining why Java has some weird limitations on closures
(only final objects); it operates by substitution so it has to be able
to substitute in literal values (heap addresses for mutable values, or
immutable values directly).
> So I'm trying to get my head around exactly what a "closure" is.
Closures are an implementation detail. You shouldn't care about them
unless/until you're ready to think about how higher-order functions are
implemented. You need to have HOFs down first. They're the motivation
for closures.
-thant
--
"I do not need to explain why I say things. That's the interesting thing
about being president. Maybe somebody needs to explain to me why they
say something, but I don't feel like I owe anybody an explanation."
-- George W. Bush (Bob Woodward, "Bush at War", pp. 145-46)
Yes, I'm ready to think about all that...
(Until tomorrow at least, when school starts, and someone else's idea
of what they think I should learn trumps my own and I play the game so
that I that I can get decent grades and get into a good cs program and
learn the fun stuff.)
Hmm. It certainly seems much more of a hack job in C!
[totalSideQuery]
Do you happen to know if any implementation details changed between C
and C++ that address closure issues ? I would think "no" because C is a
subset of C++, and I would think "yes" because C++ is object-oriented
and C is not.[/totalSideQuery]
[...]
> [totalSideQuery]
> Do you happen to know if any implementation details changed between C
> and C++ that address closure issues ? I would think "no" because C is a
> subset of C++, and I would think "yes" because C++ is object-oriented
> and C is not.[/totalSideQuery]
Not especially. Objects help, and in C++ one can overload the
function-calling operator. So it's all more convenient in C++ than in
C:
class adder {
int i;
public:
adder(int n): i(n) {}
operator()(int n) {return n+i;}
};
...
adder plus_7(7);
std::cout << plus_7(42) << std::endl << plus_7(18) << std::endl;
Still clumsy compared to something that really has closures like Lisp,
ECMAscript, etc.
>
> class adder {
> int i;
> public:
> adder(int n): i(n) {}
> operator()(int n) {return n+i;}
> };
>
> ...
>
> adder plus_7(7);
>
> std::cout << plus_7(42) << std::endl << plus_7(18) << std::endl;
>
>
> Still clumsy compared to something that really has closures like Lisp,
> ECMAscript, etc.
When I try to compile this, I get an "ISO C++ forbids declaration of
'operator()' with no type" error.
It seems like a good example I'd like to play around with more, if it's
a valid one and I can get it to work.
The return type is missing. Without actually having tried to compile
something, I would guess it should be:
int operator()(int n){return n+i;}
You can actually build quite sophisticated function object systems in
C++ by leveraging templates and many other tricks, but they always fall
short of being satisfactory. Issues with memory management, argument
calling conventions, etc. never really go away in C++.
-thant
--
We have now sunk to a depth at which restatement of the obvious is the
first duty of intelligent men. -- George Orwell (attributed)
Not really, AFAIK.
[...]
> When I try to compile this, I get an "ISO C++ forbids declaration of
> 'operator()' with no type" error.
Oops, sorry. Should be int operator()..., obviously.
> It seems like a good example I'd like to play around with more, if
> it's a valid one and I can get it to work.
This kind of thing (creating objects which provide an operator()) is
well known in C++, and there's some convenience functions in the
standard library. More in boost, of course:
<http://www.boost.org/libs/libraries.htm#Function-objects>.
It's not really lisp, or scheme, or closures. But if you're using C++
for some other reason (because you're maintaining a system already
written in it, for example), it can be useful to know what wacky
things are possible, even if you decide not to use many of them.
>
>[totalSideQuery]
>Do you happen to know if any implementation details changed between C
>and C++ that address closure issues ? I would think "no" because C is a
>subset of C++, and I would think "yes" because C++ is object-oriented
>and C is not.[/totalSideQuery]
Technically C and C++ share a [large] common subset ... C itself is
not a subset of C++.
George
--
for email reply remove "/" from address