Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Closures, compile time, pad protos

8 views
Skip to first unread message

Yuval Kogman

unread,
Nov 22, 2006, 9:01:43 AM11/22/06
to perl6-l...@perl.org
Hi,

Anatoly and I don't know what this bit of code prints:

foo();
foo();
for 1..3 {
my $x ::= 3;
sub foo { say ++$x };
say ++$x
};

Is it 4, 5, 6, 6, 6 or 4, 5, 3, 3, 3? It's almost definitely not 4,
5, 6, 7, 8.


I can't rationalize 4, 5, 6, 7, 8 while maintaining the notion that
$x is actually lexical.


To rationalize the other examples:

4, 5, 6, 6, 6 means that the foo declaration does not capture over
an instance of the $x bar, but the actual value in the pad proto
itself (the value that will be the default value of newly allocated
$x variabless).

4, 5, 3, 3, 3 means that at compile time all variables are
instantiated once for BEGIN time captures. Observe:

foo();
bar();
for 1..3 {
my $x;
sub foo { say ++$x }
sub bar { say ++$x }
say ++$x;
}

prints 1, 2, 1, 1, 1 because $x is allocated once at compile time
and captured into both foo and bar, and then separately allocated
once more for each iteration of the loop.

If this is indeed the case, then there is a semantics problem:

foo();
foo();
for 1..3 {
my $x; BEGIN { $x = 3 };
sub foo { say ++$x };
say ++$x
};

Must be 4, 5, 1, 1, 1. This is because BEGIN { } and the
foo share the same compile time allocated copy of $x, but this is
not the copy in the loop.


A related issue is:

foo();
foo();
for 1..3 {
my $x = 10;
sub foo { say ++$x };
say ++$x;
}

Is that 11, 12, 10, 10, 10, or 11, 12, 13, 13, 13, or 1, 2, 10, 10, 10?


Lastly,

sub foo {
my $x;
sub { sub { say ++$x } }
};

my $bar = foo();

my $gorch = $bar.();

$gorch.();
$gorch.();

my $quxx = $bar.();

$quxx.();
$quxx.();

obviously results in the sequence 0, 1, but does the second call to
$bar create a new sequence in $quxx, or is that instance of $x
shared between $gorch and $quxx? Intuitively i'd say it is shared,
which means that the outer sub declaration implicitly captures $x as
well. Can anyone confirm?

Obviously

my $zot = foo().();
$zot.();
$zot.();

Does create a new sequence.

--
Yuval Kogman <nothi...@woobling.org>
http://nothingmuch.woobling.org 0xEBD27418

Yuval Kogman

unread,
Nov 22, 2006, 9:56:08 AM11/22/06
to perl6-l...@perl.org
And what about:

foo();

for 1..3 {
my $x ::= 3;
sub foo { say ++$x };
say ++$x
};

BEGIN {
foo();
foo();
}


or worse:

sub moose {


my $x = 3;
sub foo { say ++$x;
}

BEGIN {
foo();
moose();
foo();
}

foo();
moose();
foo();


*foam oozes out of ears*

Juerd

unread,
Nov 22, 2006, 12:55:15 PM11/22/06
to perl6-l...@perl.org
Yuval Kogman skribis 2006-11-22 16:01 (+0200):

> my $x ::= 3;
> sub foo { say ++$x };

Why would you be allowed to ++ this $x? It's bound to an rvalue!
--
korajn salutojn,

juerd waalboer: perl hacker <ju...@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sa...@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Yuval Kogman

unread,
Nov 22, 2006, 1:35:36 PM11/22/06
to Juerd, perl6-l...@perl.org
On Wed, Nov 22, 2006 at 18:55:15 +0100, Juerd wrote:
> Yuval Kogman skribis 2006-11-22 16:01 (+0200):
> > my $x ::= 3;
> > sub foo { say ++$x };
>
> Why would you be allowed to ++ this $x? It's bound to an rvalue!

Perhaps my $x ::= BEGIN { Scalar.new( :value(3) ) }

What we meant to be doing was to pre-set this value at compile time
to 3.

That doesn't really matter though

Anatoly Vorobey

unread,
Nov 22, 2006, 5:20:50 PM11/22/06
to perl6-l...@perl.org
To add some more confusion to what Yuval wrote:

In general, it doesn't seem to be very clear how inner (lexically)
subs see their enclosing lexical environments. Possibly I'm simply
very confused, in which case un-confusing me would be much appreciated.
Here're some code snippets.

{
my $x = something();
if $x==1 {
...code...
}
}

In this case, when the inner block is entered from the outer as part
of execution, it's pretty clear how the inner block can see $x. There
can be many separate pads (local lexical environments) for the outer
block (for instance, if the outer block's a sub that recursed on itself)
with different values of $x, but only one of them was current just now
before we entered the inner block, and we can definitely arrange that
it be found as the inner block's ::OUTER.

With a closure, that works fine, too:

{
my $x = something();
return { $x++; }
}

As the inner block is cloned, the right pad for the outer block is
at hand, and $x can be copied from it (as a reference) to the
snapshot being built for the inner block.

But what about inner named subs?

{
my $x = something();
sub foo { $x; }
}

&foo is visible outside the outer block because it's a package variable.
Presumably at compile time the block of foo was compiled to a Code
object, and &foo in the current package now points to that Code object
(or actually the Routine object storing the Code object inside it, but
the difference doesn't seem to be relevant here?). All this happens
before runtime starts. Now we call foo() from somewhere far away. What's
it going to see as $x? Certainly not anything meaningful, because the
assignment to $x might not even have run by now (if it's a part of a sub
that was never called), or it may have been run in many versions.

So it seems safe to say that foo() should see undef here as the value
of $x (perhaps because its idea of ::OUTER is really the compile-time
version of the outer pad). Now how about this:

{
my $x = something();
sub foo { $x; }
foo();
}

Presumably in this case we do want foo() to see the right $x, current
at the time. How can it find that pad as its ::OUTER?

More importantly, if in the example when foo() is called from far away
it sees $x as undef, how can the following work at all (I'm assuming
it should work)? At package-level:

my $x = 1;
sub foo {
$x;
}

Now we call foo() from a different package and we expect it to see
$x==1. But how is the situation different?

I thought I had an argument as to why it could be different: we could
say that a named sub's block is a closure like any other block, and it's
closed already at compile-time; at that time $x is captured by foo();
and since the file-level code is only run once, its runtime pad is the
same as its compil-time pad, therefore when $x is later assigned 1 at
runtime, it's the same Scalar object it was (with value undef) at
compile-time, and foo() sees that change.

But now after re-reading S04 and S06 I see that's hopelessly muddled
and wrong in at least two ways. First, S04 says that named subs don't
clone anything until a reference is taken to them - until then they're
just blocks of code. Second, S06 also says that the (file|package)-level
code isn't necessarily run once, it's implicitly part of a &MAIN sub
("The outermost routine at a file-scoped compilation unit is always
named &MAIN in the file's package."), unless &MAIN had been redefined;
and could in principle be called again. So that
$x doesn't necessarily exist, at runtime, in just one pad. It seems that
the above is really something like

sub &MAIN {
my $x = 1;
sub foo {
$x;
}
}

But then it becomes _exactly_ identical to the case before where it
seemed inevitable that calling foo() from a faraway place doesn't let
it see $x==1. So how is this different here and how can it work?

Thanks,
Anatoly.

Buddha Buck

unread,
Nov 22, 2006, 6:43:12 PM11/22/06
to perl6-l...@perl.org
Keep in mind that I am only an egg, and I am putting my intuition and
experience with similar languages to mind. Perl6 might be doing
things differently than I expect.

On 11/22/06, Anatoly Vorobey <avor...@pobox.com> wrote:
> To add some more confusion to what Yuval wrote:
>
> In general, it doesn't seem to be very clear how inner (lexically)
> subs see their enclosing lexical environments. Possibly I'm simply
> very confused, in which case un-confusing me would be much appreciated.
> Here're some code snippets.
>
> {
> my $x = something();
> if $x==1 {
> ...code...
> }
> }
>
> In this case, when the inner block is entered from the outer as part
> of execution, it's pretty clear how the inner block can see $x. There
> can be many separate pads (local lexical environments) for the outer
> block (for instance, if the outer block's a sub that recursed on itself)
> with different values of $x, but only one of them was current just now
> before we entered the inner block, and we can definitely arrange that
> it be found as the inner block's ::OUTER.

My experience with other statically typed by extremely flexable
languages is that the pads tend to be arranged in (possibly
interconnected) linked lists. In this example, I see potentially
three pads linked by the time ...code... is called: One containing
the local variables defined in ...code..., one containing the visibly
defined $x, and one visible outside that scope. A reference to $x in
...code... will traverse the linked list until it finds an $x,
presumably finding the one defined in the sample code.


>
> With a closure, that works fine, too:
>
> {
> my $x = something();
> return { $x++; }
> }
>
> As the inner block is cloned, the right pad for the outer block is
> at hand, and $x can be copied from it (as a reference) to the
> snapshot being built for the inner block.

Er, that's not how I see it.

When {$x++;} is evaluated as a closure, it is for all intents and
purposes a function, with its own linked-list of pads. The head pad
in the list contains nothing, and the next pad (the outer pad
belonging to the function) contains $x. Since the head pad survives
the call, and it has a reference on the outer pad containing $x, that
outer pad survives as well. However, since nothing else points to it,
the value of that particular $x is only visible to invokers of the
closure returned.


>
> But what about inner named subs?
>
> {
> my $x = something();
> sub foo { $x; }
> }
>
> &foo is visible outside the outer block because it's a package variable.
> Presumably at compile time the block of foo was compiled to a Code
> object, and &foo in the current package now points to that Code object
> (or actually the Routine object storing the Code object inside it, but
> the difference doesn't seem to be relevant here?). All this happens
> before runtime starts. Now we call foo() from somewhere far away. What's
> it going to see as $x? Certainly not anything meaningful, because the
> assignment to $x might not even have run by now (if it's a part of a sub
> that was never called), or it may have been run in many versions.

If I understand things, the sub foo {$x;} is not actually compiled
into a callable function until run time. At which time, a pad
containing $x exists, which can be referenced by sub when converting
{$x;} into a Code object bound to the package variable foo.

My suspicion, without testing, is either (a) sub foo {$x;} in that
context doesn't actually define a package variable; (b) running that
block twice will redefine the package variable foo to the new
defintion (i.e., the new pad list); or (c) the redefinition will give
an error for redefining a function.


>
> So it seems safe to say that foo() should see undef here as the value
> of $x (perhaps because its idea of ::OUTER is really the compile-time
> version of the outer pad). Now how about this:

I disagree. foo should either be undefined (because the sub foo {$x;}
hasn't been run yet) or foo() should have the value of something(),
since it is pointing to an $x that was given that value.

>
> {
> my $x = something();
> sub foo { $x; }
> foo();
> }
>
> Presumably in this case we do want foo() to see the right $x, current
> at the time. How can it find that pad as its ::OUTER?
>
> More importantly, if in the example when foo() is called from far away
> it sees $x as undef, how can the following work at all (I'm assuming
> it should work)? At package-level:
>
> my $x = 1;
> sub foo {
> $x;
> }
>
> Now we call foo() from a different package and we expect it to see
> $x==1. But how is the situation different?

It's the scope. The basic scoping rule I expect, regardless of actual
implementation, for a lexically scoped language, is that a variable
should refer to the closest lexically scoped variable declaration of
that name. In all these cases so far, that would be to the $x
declared in the lexical scope the definition of foo is in.

I would expect that this would work:

my $x = 0;
sub foo {
++$x;
}
sub bar {
my $x = 0;
return {++$x;}
}
sub baz { my $x = 0;
sub quux { ++$x; }
}

print foo(); # 1
print foo(); # 2
print bar()(); # 1
print bar()(); # 1
quuux = bar();
print quuux(); # 1
print quuux(); # 2
baz();
print quux(); # 1
print quux(); # 2

I figure one of us is wrong, and we've stated exactly what we think
should happen, so either way, someone will correct one of us.

Anatoly Vorobey

unread,
Nov 22, 2006, 8:38:44 PM11/22/06
to perl6-l...@perl.org
First of all, thanks a lot for your comments.

On Wed, Nov 22, 2006 at 06:43:12PM -0500, Buddha Buck wrote:
> >{
> > my $x = something();
> > if $x==1 {
> > ...code...
> > }
> >}
> >

> My experience with other statically typed by extremely flexable
> languages is that the pads tend to be arranged in (possibly
> interconnected) linked lists. In this example, I see potentially
> three pads linked by the time ...code... is called: One containing
> the local variables defined in ...code..., one containing the visibly
> defined $x, and one visible outside that scope. A reference to $x in
> ...code... will traverse the linked list until it finds an $x,
> presumably finding the one defined in the sample code.

Agreed. By the way, can you offer a perspective on how the pads get
linked up, at runtime? I see each block as having a compile-time pad,
or proto-pad, filled with values known at compile-time; and every time
the block is entered, a new pad is cloned from the proto-pad. At that
point its OUTER reference leads to the proto-pad of the outer block,
and we want to link it up to the "real" pad of the outer block.

One way to do it is to simply say: when we enter the inner block from
the outer block, at that point we can re-link the inner block from the
outer proto-pad to the outer pad we entered from. That by itself works,
but I'm having trouble understanding what happens during a sub call
rather than entering the block "normally". For example:

{

my $x = 1;
sub foo { $x; }

bar();
}

sub bar() { foo(); }

Here we definitely want foo() to see $x==1 (I think), but we get to
foo() via criss-crossing through bar(), and so how would foo() know
where to find the right pad as its outer reference?

Which leads to the natural idea of maintaining a runtime global stack
of dynamically entered scopes, both scopes entered via sub calls and
entered via just going into an inner block. Then, any time we enter
a block, we can search back through the stack and find the most recent
pad on it that is _a_ pad of our outer lexical block, and call that our
OUTER. Is that how this is usually done?

This way takes care of the "criss-crossing" example above, but I still
don't quite understand what to do about calls deeply up and down the
lexical hierarchy; consider a contrived example like

{
my $x = 1;
{
{
{
sub bar() {$x;}
}
}
}
sub foo() {
{ { { { { sub baz { $x; } } } } } }
bar(); baz();
}
}

Here baz() is a few levels below foo(), lexical-wise, while bar() is
on a different branch (in all cases the intermediate levels can be made
nontrivial). But what they all have in common with foo() is
that the block that has $x in its pad is an ancestor to all of them.
So I think we'd want the calls to bar() and baz() to see the value of
$x visible to foo(), but I'm not quite sure how they would find it.
Neither of them seems to have any "real" immediate lexical-parent pad
to link to, that would eventually lead them to $x. But I guess this
takes us right back to the rest of the discussion you addressed:

> >But what about inner named subs?
> >
> >{
> > my $x = something();
> > sub foo { $x; }
> >}
> >

> If I understand things, the sub foo {$x;} is not actually compiled
> into a callable function until run time. At which time, a pad
> containing $x exists, which can be referenced by sub when converting
> {$x;} into a Code object bound to the package variable foo.

I'm pretty sure that's wrong. "sub" is a compile-time macro that will
always run at compile-time and force a compilation of its block,
whatever that means in the context of its enclosing lexical environement
(that is, I'm precisely unsure of what that means). In fact, I believe
a compiled Perl6 program should never compile anything at runtime unless
you do an explicit eval() call. But I'll be glad to have myself
corrected on this if I'm wrong.

Finally, on closures:

> When {$x++;} is evaluated as a closure, it is for all intents and
> purposes a function, with its own linked-list of pads. The head pad
> in the list contains nothing, and the next pad (the outer pad
> belonging to the function) contains $x. Since the head pad survives
> the call, and it has a reference on the outer pad containing $x, that
> outer pad survives as well. However, since nothing else points to it,
> the value of that particular $x is only visible to invokers of the
> closure returned.

Ah, so you're saying that pads aren't explicitly cloned, they're just
referenced so they wouldn't go away when the blocks that created them
exit. Hmm, that's pretty nice (and the easiest thing in the world to
implement), but isn't that a little wasteful? I mean, those pads may
have a 100 lexical variables in them but my closure is ever going to
look at only 3 of them (and I know that at compile-time, by parsing its
leical variable/function/operator references), but the other 97 values
stick around, too?

--
avva

Buddha Buck

unread,
Nov 23, 2006, 5:09:17 PM11/23/06
to perl6-l...@perl.org
On 11/22/06, Anatoly Vorobey <avor...@pobox.com> wrote:
> First of all, thanks a lot for your comments.
>
> On Wed, Nov 22, 2006 at 06:43:12PM -0500, Buddha Buck wrote:
> > >{
> > > my $x = something();
> > > if $x==1 {
> > > ...code...
> > > }
> > >}
> > >
> > My experience with other statically typed by extremely flexable
> > languages is that the pads tend to be arranged in (possibly
> > interconnected) linked lists. In this example, I see potentially
> > three pads linked by the time ...code... is called: One containing
> > the local variables defined in ...code..., one containing the visibly
> > defined $x, and one visible outside that scope. A reference to $x in
> > ...code... will traverse the linked list until it finds an $x,
> > presumably finding the one defined in the sample code.
>
> Agreed. By the way, can you offer a perspective on how the pads get
> linked up, at runtime? I see each block as having a compile-time pad,
> or proto-pad, filled with values known at compile-time; and every time
> the block is entered, a new pad is cloned from the proto-pad. At that
> point its OUTER reference leads to the proto-pad of the outer block,
> and we want to link it up to the "real" pad of the outer block.

The way I see it, everything which defines a separate lexical scope (a
block, a function, a closure. I forget if in "my $a; ... ; my $b" $b
is visible in the ellipsis. If not, then a "my" statement also
defines a separate lexical scope) effectively creates a separate pad,
at run-time, when it is entered. The pad contains all the variables
defined in that lexical scope, and a link to the pad for the
surrounding lexical scope. The search for a variable is done by
looking up the variable in the current pad, and if not found,
recursively searching all linked pads until it is found or you run out
of pads.

There are reasonable optimizations that can be made. If a lexical
scope doesn't create any variables, it can reuse the same pad as its
enclosing lexical scope. If a lexical scope uses only part of an
enclosing pad, the enclosing pad could be broken into two pieces,
linked together, such that only part of it has to be searched or
survives with the enclosed scope, etc.

I haven't read any implementation details as to how Perl6 handles it,
so I'm going to use the following notation: if $p is a pad, then
$p.lookup('$var') returns the value of the variable $var in p,
$p.myvars is a hash containing the local variables defined in $p, and
$p.enclosing is the pad of the lexically enclosing scope.

I think in p6 notation, that would be...

class Pad {
has %!myvars;
has Pad $.outer;

method lookup(String $var) {
return %!myvars{$var} if exists %!myvars{$var};
return $.outer.lookup($var);
}
method set(String $var, $val) {
%!myvars{$var} = $val if exists %!myvars{$val};
return $.outer.lookup($var, $val);
}
...
}

> One way to do it is to simply say: when we enter the inner block from
> the outer block, at that point we can re-link the inner block from the
> outer proto-pad to the outer pad we entered from. That by itself works,
> but I'm having trouble understanding what happens during a sub call
> rather than entering the block "normally". For example:
>
> {
> my $x = 1;
> sub foo { $x; }
> bar();
> }
>
> sub bar() { foo(); }
>
> Here we definitely want foo() to see $x==1 (I think), but we get to
> foo() via criss-crossing through bar(), and so how would foo() know
> where to find the right pad as its outer reference?

I did some experiments with pugs based on explicitly separating what
is visible at compile time from what is visible at run time.
Specifically, I used the following code:

my $x = 25;
sub bar {


my $x = 1;
sub foo {

print ++$x;
}
print $x;
}
print $x; // 25
foo(); // 1
foo(); // 2
bar(); // 1
foo (); // 2
foo (); // 3
bar(); // 1
foo(); // 4

Let me call the protopads of foo and bar foo0 and bar0, respectively.

From what I see, foo is visible before bar is run (which was sort of
unexpected to me, but reasonable). Let's see what happens...

The statement "sub bar{...}" appears to set up a protopad $bar0 which
contains an $x, but doesn't put in any values until bar is run.
Everything is "undef".

The statement "sub foo{...}" also sets up a proto-pad $foo0 which is
empty. It is linked, however, to the protopad for bar. ($foo0.outer
= $bar0)

Running "foo();" before the "bar()" instantiates a pad $foo1 (=
copy($foo0) for this invocation of foo, a copy of its proto-pad.
Since this links to bar0, when ++$x is done, it modifies the $x in
bar0 to 1. At the end of the call, $foo1 is garbage, waiting on
collection.

The next call of "foo();" does something similar... $foo2 =
copy($foo0), $x in bar0 gets accessed and incremented to 2, and $foo2
goes poof.

Running "bar();" instantiates a pad $bar1=copy($bar0) for this
invocation of bar. In theory, the $x in this instantiation is 2, but
the my statement sets it to 1. More importantly, finally the sub
foo{...} is encountered at run-time, and there is a current lexical
scope available for it. Since the code is already compiled, the main
effect is $foo0.outer = $bar1.

Now there is a reference to $bar1 that survives the execution of
bar(), so it isn't GCed. It doesn't become garbage

Running "foo();" now does $foo3=copy($foo0) as before, but now since
$foo0.outer==$bar1, $foo3.outer==$bar1. A lookup of $x in $foo3 now
yields $bar1.myvars{'$x'}, which is 1, so it gets incremented to 2.
$foo3 becomes garbage.

Running "foo();" again creates $foo4 pointing to $bar1, and the $x in
foo goes from 2 to 3. And again, $foo4 becomes garbage.

Running "bar();" now creates a $bar2, but the sub foo {...} has
already been executed once, and now does nothing. $foo0.outer remains
$bar1. $bar2 is garbage.

Running "foo()" creates a $foo5 pointign to $bar1, and $x goes from 3 to 4.

In your sample code.....

> {


> my $x = 1;
> sub foo { $x; }
> bar();
> }
>
> sub bar() { foo(); }

...................I will discuss later, as I am late for Thanksgiving Dinner.

Anatoly Vorobey

unread,
Nov 26, 2006, 3:15:37 AM11/26/06
to perl6-l...@perl.org
On Thu, Nov 23, 2006 at 05:09:17PM -0500, Buddha Buck wrote:
> The way I see it, everything which defines a separate lexical scope (a
> block, a function, a closure. I forget if in "my $a; ... ; my $b" $b
> is visible in the ellipsis. If not, then a "my" statement also
> defines a separate lexical scope) effectively creates a separate pad,
> at run-time, when it is entered.

"my $b" should not be visible in the ellipsis (it is in pugs, and that's
a bug). However, there's no need for each "my" to define a separate
scope, because the invisibility of $b in the block prior to its
declaration is only required for compile time processing. If you hide
a reference to $b in an eval string, the spec explicitly allows the
runtime to use the lexical $b defined later in the block.

0 new messages