Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Generator (re-)definition within a loop

2 views
Skip to first unread message

Pierre Reinbold

unread,
Jun 18, 2010, 3:57:48 PM6/18/10
to pytho...@python.org
Hi all,

This is my first post on the list. I'm mainly a sysadmin and no expert
in programming languages, so this may be a stupid question but it
puzzles me.

I was pondering on the documentation of the function product(*args,
**kwds) in the itertools module. It is said that:

This function is equivalent to the following code, except that the
actual implementation does not build up intermediate results in memory:

def product(*args, **kwds):
# product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
# product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
pools = map(tuple, args) * kwds.get('repeat', 1)
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)

Ok, then what about using a generator expression instead of a list
comprehension to avoid generating the intermediate results in memory?

Letting the **kwds trick aside, it may give:

def genexp_product(*args):
pools = map(tuple, args)
result = [[]]
for pool in pools:
result = (x+[y] for x in result for y in pool)
for prod in result:
yield tuple(prod)

but this do not work as expected:

>>> print list(product("ABC", "xy"))
[('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y'), ('C', 'x'), ('C', 'y')]
>>> print list(genexp_product("ABC", "xy"))
[('x', 'x'), ('x', 'y'), ('y', 'x'), ('y', 'y')]

My question is why ? What is the final generator "result" at the end of
the main loop in genexp_product ? How is it build exactly ? Is this an
effet of some sort of "lazy evaluation" ?

Thanks for your help in understanding this,


3.14r

Terry Reedy

unread,
Jun 18, 2010, 5:48:32 PM6/18/10
to pytho...@python.org
On 6/18/2010 3:57 PM, Pierre Reinbold wrote:
> Hi all,
>
> This is my first post on the list. I'm mainly a sysadmin and no expert
> in programming languages, so this may be a stupid question

no

> but it puzzles me.
>
> I was pondering on the documentation of the function product(*args,
> **kwds) in the itertools module. It is said that:
>
> This function is equivalent to the following code, except that the
> actual implementation does not build up intermediate results in memory:

I suspect the actual algorithm is an iterative algorithm that simulates
nested for-loops to produce one item at a time, much like the working
nested generator version I give below.


>
> def product(*args, **kwds):
> # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
> # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
> pools = map(tuple, args) * kwds.get('repeat', 1)
> result = [[]]
> for pool in pools:
> result = [x+[y] for x in result for y in pool]

This runs iter(result) to completion *before* rebinding the result to
'result'.

> for prod in result:
> yield tuple(prod)
>
> Ok, then what about using a generator expression instead of a list
> comprehension to avoid generating the intermediate results in memory?
>
> Letting the **kwds trick aside, it may give:
>
> def genexp_product(*args):
> pools = map(tuple, args)
> result = [[]]
> for pool in pools:
> result = (x+[y] for x in result for y in pool)

The name binding in the first for-clause in a genexp is handled slightly
differently from that in subsequent for-clauses. I suspect that this is
relevant here. This code rebinds 'result' to a new generator *without*
running running the previously bound 'result' generator.

> for prod in result:
> yield tuple(prod)

This runs N nested generators, but which?

> but this do not work as expected:
>
>>>> print list(product("ABC", "xy"))
> [('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y'), ('C', 'x'), ('C', 'y')]
>>>> print list(genexp_product("ABC", "xy"))
> [('x', 'x'), ('x', 'y'), ('y', 'x'), ('y', 'y')]
>
> My question is why ? What is the final generator "result" at the end of
> the main loop in genexp_product ? How is it build exactly ? Is this an
> effet of some sort of "lazy evaluation" ?

That and/or a namespace issue.

Let's apply Reedy's Rule: when you have trouble understanding a function
expression, replace it with the (near) equivalent def statement. (Among
other advantages, one can insert print calls!)

Genexps, like lambdas, are specialized function expressions.

def augment(s_of_s, s):
for x in s_of_s:
for y in s:
yield x+[y]

def gen_product(*args):


pools = map(tuple, args)
result = [[]]
for pool in pools:

result = augment(result,pool)


for prod in result:
yield tuple(prod)

print(list(gen_product("ABC", "xy")))

>>> #3.1


[('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y'), ('C', 'x'), ('C', 'y')]

Terry Jan Reedy

Pierre Reinbold

unread,
Jun 21, 2010, 3:29:55 AM6/21/10
to pytho...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Very instructive post ! Thank you !

Just trying to understand, I have apply the Reedy's Rule in an attempt
to reproduce the same behaviour as the generator expression. The idea, I
guess, is to (re-)define the generator function inside the loop.

My first try gives this:

def badgen_product1(*args, **kwds):


pools = map(tuple, args)
result = [[]]
for pool in pools:

def result():
for x in result():
for y in pool:
yield x+[y]
for prod in result():
yield tuple(prod)

But this does not reproduce the generator expression, it leads naturally
to an infinite recursion (which is what I expected first for the
generator expression btw)

for x in result():
RuntimeError: maximum recursion depth exceeded

Another try to avoid infinite recursion:

def badgen_product2(*args, **kwds):


pools = map(tuple, args)
result = [[]]
for pool in pools:

def augments():
for x in result:
for y in pool:
yield x+[y]
result = augments()


for prod in result:
yield tuple(prod)

And this one gives the infamous:

for x in result:
ValueError: generator already executing

Which seems to indicate that the lazy evaluation leads to eventually
bind everything to the same generator.

So, neither of my attempts reproduce the behaviour of the generator
expression. What do I miss here ?

Thank for your help,


3.14r
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwfFNIACgkQ9D25xYOIvisX3ACgu3BjwvepFDcbuUbtvaUc10eD
pJkAnA3cqPEM60kYfrcNLI6qmFzHvtRe
=LxCp
-----END PGP SIGNATURE-----

Thomas Jollans

unread,
Jun 21, 2010, 2:06:30 PM6/21/10
to pytho...@python.org
On 06/21/2010 09:29 AM, Pierre Reinbold wrote:
> [snip]

>
> Another try to avoid infinite recursion:
>
> def badgen_product2(*args, **kwds):
> pools = map(tuple, args)
> result = [[]]
> for pool in pools:
> def augments():
> for x in result:
This line is executed when you start iterating over the generator,
(because generators are lazy...). To get "result", which is not defined
in the inner function, Python looks in the namespace of the outer
function, where it's already been re-bound. So the generator basically
tries to iterate over itself (I think)

> for y in pool:
> yield x+[y]
> result = augments()
> for prod in result:
> yield tuple(prod)
>
> And this one gives the infamous:
>
> for x in result:
> ValueError: generator already executing
>
> Which seems to indicate that the lazy evaluation leads to eventually
> bind everything to the same generator.
>
> So, neither of my attempts reproduce the behaviour of the generator
> expression. What do I miss here ?

Bind "result" within the inner function. Either of these concepts should
work (not tested):

...
for pool in pools:
def augments(result):
for x in result:
...
result = augments(result)
....

#or
...
result = lambda: [[]]
for pool in pools:
def result(oldresult=result):
for x in oldresult():
...

With the second option, Python takes "result" and saves it in the
definition of the new "result".


-- Thomas

Terry Reedy

unread,
Jun 21, 2010, 2:17:52 PM6/21/10
to pytho...@python.org
On 6/21/2010 3:29 AM, Pierre Reinbold wrote:

> On 06/18/2010 11:48 PM, Terry Reedy wrote:

>> Let's apply Reedy's Rule: when you have trouble understanding a function
>> expression, replace it with the (near) equivalent def statement. (Among
>> other advantages, one can insert print calls!)
>>
>> Genexps, like lambdas, are specialized function expressions.
>>
>> def augment(s_of_s, s):
>> for x in s_of_s:
>> for y in s:
>> yield x+[y]
>>
>> def gen_product(*args):

>> pools = map(tuple, args)
>> result = [[]]
>> for pool in pools:

>> result = augment(result,pool)


>> for prod in result:
>> yield tuple(prod)
>>

>> print(list(gen_product("ABC", "xy")))
>>
>>>>> #3.1
>> [('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y'), ('C', 'x'), ('C', 'y')]
>
> Very instructive post ! Thank you !

> Just trying to understand, I have apply the Reedy's Rule in an attempt
> to reproduce the same behaviour as the generator expression.

I intentionally said '(near) equivalent' because, as you seem to have
found, exactly reproducing a bug in a function expression with a def
statement may be difficult to impossible. The form I applied above was
'equivalent to what one *thought* one was writing'. If one just wants to
fix the bug without completely understanding it, that should be enough.

> The idea, I
> guess, is to (re-)define the generator function inside the loop.

To produce a similar bug, in this case, yes.


>
> My first try gives this:
>

> def badgen_product1(*args, **kwds):


> pools = map(tuple, args)
> result = [[]]
> for pool in pools:

> def result():
> for x in result():

> for y in pool:
> yield x+[y]

> for prod in result():
> yield tuple(prod)
>
> But this does not reproduce the generator expression, it leads naturally
> to an infinite recursion (which is what I expected first for the
> generator expression btw)
>
> for x in result():
> RuntimeError: maximum recursion depth exceeded
>

> Another try to avoid infinite recursion:
>
> def badgen_product2(*args, **kwds):
> pools = map(tuple, args)
> result = [[]]
> for pool in pools:
> def augments():
> for x in result:

> for y in pool:
> yield x+[y]
> result = augments()
> for prod in result:
> yield tuple(prod)
>
> And this one gives the infamous:
>
> for x in result:
> ValueError: generator already executing
>
> Which seems to indicate that the lazy evaluation leads to eventually
> bind everything to the same generator.
>
> So, neither of my attempts reproduce the behaviour of the generator
> expression. What do I miss here ?

See above. Exact reproduction is not always possible.
>
> Thank for your help,

You are welcome. I hope you learned something. Wrestling with code is a
good way to do that. I love Python because it makes it so easy to do that.

--
Terry Jan Reedy

0 new messages