On implementing generators

14 views
Skip to first unread message

Scott Graham

unread,
Sep 10, 2009, 3:13:56 AM9/10/09
to sku...@googlegroups.com
I'd have to wholeheartedly agree with Justyn's earlier post lamenting
the complexity of implementing python generators without depending on
recent JS language additions!

Here's my thoughts for those that are interested. (Haven't written the
Skulpt code yet, just trying to figure out what to write...)

-----

In the back of my mind I was hoping to use 'switch' to deal with
generators, so that something like:

def f(n):
num = 0
while num < n:
yield num
num += 1

would translate into something like this in C (of course with
iteration protocol, parameter capture, closure over variables, etc.
added):


typedef struct State
{
int location;
int num;
} State;

int f(State* state, int n)
{
switch (state->location)
{
case 0:
state->num = 0;
while (state->num < n)
{
state->location = 1;
return state->num;
case 1:
state->num += 1;
}

state->location = -1;
}
return -1;
}


The sneaky part here being that "case 1" is inside the body of the
while loop which lets us resume control at the correct location while
emitting pretty 'normal' code otherwise.

Sadly, I guess due to jihad against goto, it seems this form of
control isn't valid in JS. (There's also no goto with which to emulate
this, of course).

This makes things quite a lot more cumbersome, or exposes me as a
closet C programmer at the very least. Instead, we need to transform
the generated code much more heavily. Without 'sneaky' switchs and
just using straight gotos we want something like:


int f2(State* state, int n)
{
if (state->location == 0) goto state0;
if (state->location == 1) goto state1;
return -1;

state0:
state->num = 0;
while (state->num < n)
{
state->location = 1;
return state->num;
state1:
state->num += 1;
}
state->location = -1;
return -1;
}


Now, we can do some nasty transformation on this to remove the gotos:

First, get rid of 'state0'. Eliminate the first forward goto by
skipping what it skips if it's true using an if (this is eliminating a
forward-jumping-goto at the same nesting level):


int f3(State* state, int n)
{
if (!(state->location == 0))
{
if (state->location == 1) goto state1;
return -1;
}

state->num = 0;
while (state->num < n)
{
state->location = 1;
return state->num;
state1:
state->num += 1;
}
state->location = -1;
return -1;
}


Finally, get rid of 'state1'. First, we have to move the goto out of
its containing 'if', and then we have to move it back 'into' the while
loop. This is accomplished by modifying the loop condition, and the
statements above the state1 label to get execution to the correct
location. Then, we make sure to reset the condition so the next around
the loop, the while condition will be executed normally.


int f4(State* state, int n)
{
int _t0 = 0;
if (!(state->location == 0))
{
_t0 = state->location == 1;
if (!_t0)
{
return -1;
}
}
if (!_t0)
{
state->num = 0;
}
while (_t0 || state->num < n)
{
if (!_t0)
{
state->location = 1;
return state->num;
}
_t0 = 0;
state->num += 1;
}
state->location = -1;
return -1;
}


Ugh!

This works in general, I think, but there's a lot of cases to handle
(moving inwards into loops and ifs, outwards, etc.). It's certainly
more complicated than I'd like, and makes me yearn for goto or
generators to be widely deployed in JS.

It's also going to be hairy to test which worries me.

The transformation from goto's to if/while/switch is well known from
the olden days when researchers seem to have been hell-bent on
removing every trace of anything that looked like goto (see f.e.
http://www.sable.mcgill.ca/~hendren/ftp/erosa/thesis.ps.gz) but it's
kind of messy. This is a somewhat reduced version because we know the
gotos are really just a 'switch' at the top of the function to get to
the correct location, but the inwards transformations are still fairly
involved.

There's still the issue of rewriting locals to access state data
variables rather than being local accesses, but I don't think
that will be too hard.

Anyway, if anyone made it this far, I'd appreciate hearing any better
ideas before I start mashing up the compiler.

scott

Jonathan Schemoul

unread,
Sep 10, 2009, 11:11:14 AM9/10/09
to Skulpt
Hi,

I'm very interested in skulpt, as I'm a Python developer and a
Mootools fan (and mootools uses classes similar to python's new-style
classes and a lot of pythonic stuff).

Use of generators would be great to be able to do "dataflow"
programming in JS (lower memory consumption, pseudo-parallelism
without higly complex sliced stuff...).

What I've been doing so far in js with mootools to replicate the kind
of stuff I can achieve in python with yields is to use events in
classes.

Can't we use a registry of function triggers that would send data to
events on yield, and on the other part (the caller, which iterates on
the yield result) something like "wait for event reply" (without being
blocking, maybe transparent callbacks ?) ? (and of course we should
yield some kind of "end of data" event that is equivalent to the
python raise to tell that the iteration is over)

Jonathan
> removing every trace of anything that looked like goto (see f.e.http://www.sable.mcgill.ca/~hendren/ftp/erosa/thesis.ps.gz) but it's

Justyn

unread,
Sep 10, 2009, 3:29:37 PM9/10/09
to Skulpt
I'm struggling a little to visualise whats going on, so I've uploaded
a PDF that lays out how I think the code and loops could be rewritten
into stackable, nestable blocks. On the left are bits of python, on
the right they're rewritten with indicators showing how they could be
split up.

http://x.justyn.eu/generators1.pdf
(Temporary URL. To avoid differing fonts screwing up alignment all
fonts in the PDF should have been converted to paths).

I'm assuming that any optimisations should be suppressed until the
code is working and passing tests, for the sake of clarity.

Essentially a yield statement acts like a marker and the code is diced
into chunks between the markers. So in this way the start/end of the
generator are also markers, as are the start/end of the while loop.

Each chunk of code is reached via a switch statement (there may be
more switch statements inside the chunk, and more in them, etc).
When the end of the chunk is reached it does it will increase the
placeholder value for the switch statement (for that switch statement
only) and return with the relevant value if the chunk ended with a
yield statement.

Each switch statement needs to have its own placeholder associated
with it (maybe it should be a class?)
That way they are easily and infinitely nestable.

So every time generator.next() is called, the code should just drop
through switch statements into the correct place, right?

I'm also assuming that all "for" loops should be converted to while
loops in the standard manner.

Obviously this doesn't address the details and actual javascript
specific challenges, like closures (aargh, my mind!).
But is this more or less what you're planning?

Justyn.
> removing every trace of anything that looked like goto (see f.e.http://www.sable.mcgill.ca/~hendren/ftp/erosa/thesis.ps.gz) but it's

Scott Graham

unread,
Sep 11, 2009, 11:57:06 AM9/11/09
to sku...@googlegroups.com
Hi

> What I've been doing so far in js with mootools to replicate the kind
> of stuff I can achieve in python with yields is to use events in
> classes.
>
> Can't we use a registry of function triggers that would send data to
> events on yield, and on the other part (the caller, which iterates on
> the yield result) something like "wait for event reply" (without being
> blocking, maybe transparent callbacks ?) ? (and of course we should
> yield some kind of "end of data" event that is equivalent to the
> python raise to tell that the iteration is over)

I'm not sure I understand what you're getting at. Generally callbacks
seem like they could work, but I'm not quite sure *how* they'd work.

The two main requirements are that:

1) the iterator object that's returned from a generator function has
to be "drive"able from a .next() method (that's the part I don't see
how to do with callbacks), and
2) there has to be a mechanical translation from python to javascript
of course because it's a compiler that's doing it, not a person :)

Could you give an example of the method you're proposing?

scott

Scott Graham

unread,
Sep 11, 2009, 12:06:47 PM9/11/09
to sku...@googlegroups.com
> http://x.justyn.eu/generators1.pdf
> (Temporary URL. To avoid differing fonts screwing up alignment all
> fonts in the PDF should have been converted to paths).
>
> I'm assuming that any optimisations should be suppressed until the
> code is working and passing tests, for the sake of clarity.
>
> Essentially a yield statement acts like a marker and the code is diced
> into chunks between the markers. So in this way the start/end of the
> generator are also markers, as are the start/end of the while loop.
>
> Each chunk of code is reached via a switch statement (there may be
> more switch statements inside the chunk, and more in them, etc).
> When the end of the chunk is reached it does it will increase the
> placeholder value for the switch statement (for that switch statement
> only) and return with the relevant value if the chunk ended with a
> yield statement.

I think what you describe is what I thought I could do initially (and
works well in C), but I don't think it's possible in javascript. The
problem is that there's no way to jump back "inside" the loops or if
statements on the next execution.

For example, in "Simple loop generator" in your pdf, you can't jump to
"0,1" in the JS code.

In order to get "in" to those loops, the control flow has to be
modified so that it gets there naturally on the next time the function
is called. That's what all my crazy goto elimination code was about in
the long previous email.

It's basically adding extra booleans as required at each if/loop so
that the flow can be controlled when the function is re-entered based
on the previous saved location. Once it's fallen through based on
those booleans, the booleans are all turned "off" and execution
continues normally until a return.

Clear as mud?

scott

Justyn

unread,
Sep 11, 2009, 2:35:27 PM9/11/09
to Skulpt
On Sep 11, 5:06 pm, Scott Graham <sgra...@gmail.com> wrote:
> >http://x.justyn.eu/generators1.pdf
> I think what you describe is what I thought I could do initially (and
> works well in C), but I don't think it's possible in javascript. The
> problem is that there's no way to jump back "inside" the loops or if
> statements on the next execution.
>
> For example, in "Simple loop generator" in your pdf, you can't jump to
> "0,1" in the JS code.

I don't see why not...perhaps I'm missing something obvious (often
happens).

I'll try to break it down.

1) A while loop is essentially an if statement that keeps repeating
until it is no longer true.
2) A "case" block in a switch statement in JS will automatically run
into any following code (including another switch statement) unless it
breaks/returns.

Here is some code of exactly the same form as the "single loop
generator" in the pdf on the left, modified slightly for clarity:

def c(n):
i=0
while (i < n):
print i
yield i
print i / n
yield i / n
i +=1
print "end"

Okay, two yield statements in a while loop.

I'll now rewrite it, still in exactly the same layout as the "single
loop generator" in the pdf, now on the right where blocks in a switch
statement are signified with curly braces, but this time I'll write
real javascript.

All I'm doing is splitting the code up at the yield statements within
the body of the function (generator) - and treating the inside of any
while loop the same way, but independently.

This code assumes a placeholder has been created for each switch
statement, called s0 and s1, and initialised to 0.

function c(n)
{
switch(s0)
{
case(0):
i = 0;
s0 = 1;
case(1):
while(i < n)
{
switch(s1)
{
case(0):
console.log(i);
s1 = 1;
return i;
case(1):
console.log(i / n);
s1 = 2;
return i / n;
case(2):
i = i + 1;
s1 = 0; // back to zero
}
}
s0 = 2
case(2):
console.log("end");
}
// mark "generator" as finished so it cannot be run any more.
}

So every block of code in the switch statement ends by moving the
placeholder on one.

The while loop still works if you break out of it and come back,
acting like an if statement except that it will loop back round if the
last block of code inside it didn't end with a yield/break/return.
The switch statement must be inside the while loop, of course.

In this particular code at least, it performs like c.next() every time
you run it, doesn't it?
I just tried running it (using global variables) and it appears to
work.

I don't see why it shouldn't work no matter how much nesting etc
occurs.

> In order to get "in" to those loops, the control flow has to be
> modified so that it gets there naturally on the next time the function
> is called. That's what all my crazy goto elimination code was about in
> the long previous email.
>
> It's basically adding extra booleans as required at each if/loop so
> that the flow can be controlled when the function is re-entered based
> on the previous saved location. Once it's fallen through based on
> those booleans, the booleans are all turned "off" and execution
> continues normally until a return.

I kind of feel like you might be describing what I am except that the
use of booleans, because the switch statement might contain more than
two chunks of code.

Do you think we're converging on some understanding of each other or
veering apart?

Justyn.

Scott Graham

unread,
Sep 11, 2009, 8:31:48 PM9/11/09
to sku...@googlegroups.com
> Do you think we're converging on some understanding of each other or
> veering apart?

I think we're converging.

I was convinced at first, but I don't think it's quite workable like
that. For example, consider compiling:

def f(n):
i = 0
while i < n:
yield i
i = 100
yield i
i += 1

(Ignoring that it's stupid code...)

If that was compiled with a switch, it wouldn't get back into the
while loop if n was < 100, after the first yield, on reentry, it would
switch to just before the while, but wouldn't enter the while, so
wouldn't be able to get to the second yield. That's why all those
booleans are required to be added to the flow.

Does that make sense, or am I missing something obvious now? :)

scott

Justyn

unread,
Sep 12, 2009, 7:58:04 AM9/12/09
to Skulpt
2009/9/12 Scott Graham <sgr...@gmail.com>:
> def f(n):
> i = 0
> while i < n:
> yield i
> i = 100
> yield i
> i += 1
>
> (Ignoring that it's stupid code...)
>
> If that was compiled with a switch, it wouldn't get back into the
> while loop if n was < 100, after the first yield, on reentry, it would
> switch to just before the while, but wouldn't enter the while, so
> wouldn't be able to get to the second yield. That's why all those
> booleans are required to be added to the flow.

Ah, yes I see what you mean now.

Okay, with a tiny modification the process works - allow the while
loop condition to be true whenever the placeholder is not at its
initial value.
By adding only "|| s1 != 0" in the while condition to the compiled
code in my previous email:

function c(n)
{
switch(s0)
{
case(0):
i = 0;
s0 = 1;
case(1):
while(i < n || s1 != 0)
{
switch(s1)
{
case(0):
console.log(i);
s1 = 1;
return i;
case(1):
console.log(i / n);
s1 = 2;
return i / n;
case(2):
i = i + 1;
s1 = 0; // back to zero
}
}
s0 = 2
case(2):
console.log("end");
}
// mark "generator" as finished so it cannot be run any more.
}


As the final code case in the loop+switch statement will always reset
the placeholder at the end, this extra condition will only come into
play whenever the while loop is only partially completed.

If I compile the code in the example you gave above (again the
placeholders s0 and s1 have been initialised to 0):

function f(n)
{
switch(s0)
{
case(0):
i = 0;
s0 = 1;
case(1):
while (i < n || s1 != 0)
{
switch(s1)
{
case(0):
s1 = 1;
return i;
case(1):
i = 100;
s1 = 2;
return i;
case(2):
i += 1;
s1 = 0; // reset.
}
}
}
// mark generator as finished
}

The code now works - I think it's essentially the same as what you
wrote in c in your first email of this thread, except that the
condition is combined into the while loop instead of an if statement
and a switch statement is still used.

Justyn.

Scott Graham

unread,
Sep 12, 2009, 11:44:53 AM9/12/09
to sku...@googlegroups.com
On Sat, Sep 12, 2009 at 4:58 AM, Justyn <justyn...@googlemail.com> wrote:
>
> [snip]

>
> If I compile the code in the example you gave above (again the
> placeholders s0 and s1 have been initialised to 0):
>
> ...

>
>        while (i < n || s1 != 0)
>        {
>            switch(s1)
>            {
>            case(0):
>                s1 = 1;
>                return i;
>            case(1):
>                i = 100;
>                s1 = 2;
>                return i;
>            case(2):
>                i += 1;
>                s1 = 0; // reset.
>            }
>        }
>    }
> // mark generator as finished
> }
>

Yup, I think that works now. I wonder about some perverse cases with
nested loops now though once we start adding conditions.

while i < n:


while i < n:
yield i
i = 10

yield i

The outer loop would also need to allow re-entry based on the
placeholder of loops inside of it. Not too bad for something like
above, just another "|| s2 != 0". Is there a case where that gets more
complicated than just or'ing in all children? I can't think of one off
the top of my head, but I'm not sure.

Supporting 'break' becomes tricky too. Currently break is compiled to
a JS break, but that'll break out of the switch rather than the while.
Interestingly 'continue' seems like it'll be fine though.

Anyway, I think I might start trying to implement this now and see how
things go. Doing the actual transformation in the compiler is going to
be a whole separate problem!

Thanks for your feedback in thinking through this.

scott

Justyn Butler

unread,
Sep 12, 2009, 1:22:15 PM9/12/09
to sku...@googlegroups.com
2009/9/12 Scott Graham <sgr...@gmail.com>:

> Yup, I think that works now. I wonder about some perverse cases with
> nested loops now though once we start adding conditions.
>
>    while i < n:
>        while i < n:
>            yield i
>            i = 10
>            yield i
>
> The outer loop would also need to allow re-entry based on the
> placeholder of loops inside of it. Not too bad for something like
> above, just another "|| s2 != 0". Is there a case where that gets more
> complicated than just or'ing in all children? I can't think of one off
> the top of my head, but I'm not sure.

Oh, you're right, the outer loop would need to allow re-entry based on
inner loop placeholders.

The alternative is to use a separate variable (ie w1), rather than the
placeholder, for each while loop, set to 1 as soon as the while loop
is entered and back to 0 just before it is left. I think this is back
to what you were writing about originally...we have convergence!

s0 = 0;
s1 = 0;
w1 = 0;

function f(n)
{
switch(s0)
{
case(0):
i = 0;
s0 = 1;
case(1):

while (i < n || w1 == 1)
{
w1 = 1; // signify we're now in the loop


switch(s1)
{
case(0):
s1 = 1;
return i;
case(1):
i = 100;
s1 = 2;
return i;
case(2):
i += 1;
s1 = 0; // reset.
}

w0 = 0; // about to leave loop so reset loop holder


}
}
// mark generator as finished
}

This does prevent ever-lengthening while loop conditions caused by
tracking any inner loop placeholders, no matter how much nesting
occurs.

> Supporting 'break' becomes tricky too. Currently break is compiled to
> a JS break, but that'll break out of the switch rather than the while.
> Interestingly 'continue' seems like it'll be fine though.
>
> Anyway, I think I might start trying to implement this now and see how
> things go. Doing the actual transformation in the compiler is going to
> be a whole separate problem!

Good luck!

Justyn.

lkcl

unread,
Sep 13, 2009, 2:43:46 PM9/13/09
to Skulpt


On Sep 10, 7:13 am, Scott Graham <sgra...@gmail.com> wrote:
> This makes things quite a lot more cumbersome, or exposes me as a
> closet C programmer at the very least. Instead, we need to transform
> the generated code much more heavily.

in pyjamas, bernd added an "introspection" branch, which we didn't
get a chance to merge in (fortunately, because it relies on python 2.6
features of the internal ast code).

in python 2.6 compiler code, there's an AST "walker" function (which
would be easy to replicate).

the plan is to add in "translators" of the AST tree - to do exactly
these kinds of things: transform the AST into another AST.

in this way, it would be much much simpler to transform e.g. +=
operators into "var = var + ..." and much more.

a "yield" transform could be done in the same way.

l.

lkcl

unread,
Sep 13, 2009, 3:42:44 PM9/13/09
to Skulpt, pyjam...@googlegroups.com


On Sep 11, 4:06 pm, Scott Graham <sgra...@gmail.com> wrote:
> I think what you describe is what I thought I could do initially (and
> works well in C), but I don't think it's possible in javascript. The
> problem is that there's no way to jump back "inside" the loops or if
> statements on the next execution.

ahh... this reminds me of an execution-engine idea i had.

basically, you emulate / follow-the-style-of the python "FORTH-like"
interpreter engine, in javascript.

so, instead of executing the javascript directly, statement-following-
statement, you execute javascript fragments (even down to one per
line) ON-DEMAND.

you store the local variables on an emulated stack (NOT in the
standard javascript function stack). you store the exception state on
an emulated stack. you store EVERYTHING on an emulated stack.

in this way, you can do whatever you like. you can even implement
"yield" because you can now "jump about" - on the _emulated_ execution
stack.

the performance i anticipate to be truly dreadful, _but_ it would be
python-syntactically-correct.

i reaallly hope that you can come up with something that doesn't need
this!

lkcl

unread,
Sep 13, 2009, 4:35:59 PM9/13/09
to Skulpt

> This does prevent ever-lengthening while loop conditions caused by
> tracking any inner loop placeholders, no matter how much nesting
> occurs.

how do you cope with the yield being in a (arbitrarily-large) series
of nested function calls?

do you envisage adding in state-tracking variables to the function
call parameters, or to an implementation of a stack, of some kind (in
order not to pollute the function call parameters), or... other?


def fn99(vars)
...
yield x

def fn98....

....
....

def fn2...

def fn1(vars):
...
fn2(vars...)

while ...
...
fn1(vars)
...

Scott Graham

unread,
Sep 13, 2009, 7:08:38 PM9/13/09
to sku...@googlegroups.com

I'm not clear on the problem you're pointing to here. The shadowing of
'vars'? Or the depth of the callstack? The depth will (I think) be the
same python's (or possibly * a constant factor if there needs to be
some helper trampolining going on)

The calling discipline is the same as real python's (as I understand
it at least): the function becomes a constructor for
iterator-generator, which has a next() method that returns the next
value in the iteration.

The sneaky switch business is just to save the location of where the
yield happened during the previous call. During compilation,
function-local references are modified to reference into a state
object that's part of the iterator-generator. Function arguments are
closed over during the initial call.

That said, I've only just started the implementation so far, and not
tested it much at all, so there could easily be some Big Problem
that's not come up yet.

scott

lkcl

unread,
Sep 13, 2009, 8:13:44 PM9/13/09
to Skulpt


On Sep 13, 11:08 pm, Scott Graham <sgra...@gmail.com> wrote:
> I'm not clear on the problem you're pointing to here. The shadowing of
> 'vars'?

no.

> Or the depth of the callstack?

depth of the callstack. how do you ensure that s0, s1 (in the sneaky-
switch business) etc. don't get lost / thrown away, as the yield is
actually in a function that is several levels of function call _away_
from the function that becomes an iterator?

is this in fact even _possible_ in python? i.e. if you do a yield
several layers down the python call stack, does python store the
entire function call stack (in between the function-that's-now-an-
iterator and the point where yield was done) as part of the iterator?

l.

Kees Bos

unread,
Sep 16, 2009, 2:22:21 AM9/16/09
to Skulpt
I'm just adding this to this yield thread for in case somebody gets
bored and needs a challenge...

def foo(value = None):
for i in [-1,0,1,2,3,4]:
if i < 0:
continue
elif i == 0:
yield 0
elif i == 1:
yield 1
yield value
yield 2
else:
try:
yield i/value
except:
yield i

I'm very interested though, how this could be translated to
javascript. Gives me an headache.
Reply all
Reply to author
Forward
0 new messages