Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Odd name shadowing in comprehension

36 views
Skip to first unread message

Chris Angelico

unread,
Oct 22, 2016, 7:58:00 PM10/22/16
to
This surprised me.

Python 3.4.2 (default, Oct 8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> y=6
>>> [(x,y) for x in range(y) for y in range(3)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2,
2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (5, 0), (5, 1),
(5, 2)]
>>> [(x,y) for x in range(3) for z in range(y) for y in range(3)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
UnboundLocalError: local variable 'y' referenced before assignment

Normally, a comprehension is described as being equivalent to an
unrolled loop, inside a nested function. That would be like this:

def temp():
ret = []
for x in range(y):
for y in range(3):
ret.append((x,y))
return ret
temp()

But it seems that the first iterator (and only that one) is evaluated
in the parent context:

def temp(iter):
ret = []
for x in iter:
for y in range(3):
ret.append((x, y))
return ret
temp(iter(range(y)))

Why is this? It seems rather curious.

ChrisA

Terry Reedy

unread,
Oct 22, 2016, 8:44:49 PM10/22/16
to
On 10/22/2016 7:57 PM, Chris Angelico wrote:
> This surprised me.
>
> Python 3.4.2 (default, Oct 8 2014, 10:45:20)
> [GCC 4.9.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> y=6
>>>> [(x,y) for x in range(y) for y in range(3)]
> [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2,
> 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (5, 0), (5, 1),
> (5, 2)]
>>>> [(x,y) for x in range(3) for z in range(y) for y in range(3)]
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 1, in <listcomp>
> UnboundLocalError: local variable 'y' referenced before assignment
>
> Normally, a comprehension is described as being equivalent to an
> unrolled loop, inside a nested function. That would be like this:
>
> def temp():
> ret = []
> for x in range(y):
> for y in range(3):
> ret.append((x,y))
> return ret
> temp()

This would make the first example fail, which would not be nice.


> But it seems that the first iterator (and only that one) is evaluated
> in the parent context:

Because the first iterator *can* always be evaluated.

> def temp(iter):
> ret = []
> for x in iter:
> for y in range(3):
> ret.append((x, y))
> return ret
> temp(iter(range(y)))
>
> Why is this? It seems rather curious.

Guido explained this somewhere some time ago.
Not sure it is documented very well.
In general, subordinate clauses depend on the initial loop variable,
hence cannot be evaluated.


--
Terry Jan Reedy

eryk sun

unread,
Oct 22, 2016, 8:52:51 PM10/22/16
to
On Sat, Oct 22, 2016 at 11:57 PM, Chris Angelico <ros...@gmail.com> wrote:
>
> Normally, a comprehension is described as being equivalent to an
> unrolled loop, inside a nested function.
> ...
> But it seems that the first iterator (and only that one) is evaluated
> in the parent context:
>
> Why is this? It seems rather curious.

It matches the behavior of generator expressions, for which Guido
gives the following example, as quoted in PEP 289:

Consider sum(x for x in foo()). Now suppose there's a bug in foo()
that raises an exception, and a bug in sum() that raises an
exception before it starts iterating over its argument. Which
exception would you expect to see? I'd be surprised if the one in
sum() was raised rather the one in foo(), since the call to foo()
is part of the argument to sum(), and I expect arguments to be
processed before the function is called.

Chris Angelico

unread,
Oct 22, 2016, 9:26:28 PM10/22/16
to
On Sun, Oct 23, 2016 at 11:43 AM, Terry Reedy <tjr...@udel.edu> wrote:
>> Normally, a comprehension is described as being equivalent to an
>> unrolled loop, inside a nested function. That would be like this:
>>
>> def temp():
>> ret = []
>> for x in range(y):
>> for y in range(3):
>> ret.append((x,y))
>> return ret
>> temp()
>
>
> This would make the first example fail, which would not be nice.
>

Actually, I discovered this while trying to prove that order of the
loops in a comprehension was an easily-detected problem - that if you
got them wrong, you'd get a quick UnboundLocalError. While this is
true if you have three loops, it's NOT true of the one that's
evaluated first. (This in relation to the python-ideas thread about
wanting to be able to switch the order. Why bother, if getting it
wrong gives instant feedback?)

It's a bit of an edge case, as created by the nested function. Lambda
functions don't have this issue, because they can't assign to anything
other than the parameters (which are up front); genexp and
comprehension functions, however, have internal assignments caused by
the 'for' loops. As such, it's possible to trigger UnboundLocalError -
*except* in that very first loop iterable, which is now evaluated as a
parameter.

Needs to be documented somewhere, methinks.

ChrisA

Chris Angelico

unread,
Oct 22, 2016, 9:28:20 PM10/22/16
to
On Sun, Oct 23, 2016 at 11:51 AM, eryk sun <ery...@gmail.com> wrote:
> It matches the behavior of generator expressions, for which Guido
> gives the following example, as quoted in PEP 289:
>
> Consider sum(x for x in foo()). Now suppose there's a bug in foo()
> that raises an exception, and a bug in sum() that raises an
> exception before it starts iterating over its argument. Which
> exception would you expect to see? I'd be surprised if the one in
> sum() was raised rather the one in foo(), since the call to foo()
> is part of the argument to sum(), and I expect arguments to be
> processed before the function is called.

Fair enough, except that a generator expression is syntactic sugar for
a generator function, and the return value of a generator function is
a generator object that hasn't yet been started. So where the boundary
is, then, is a bit of a fuzzy line.

Thanks for digging that up, at least.

ChrisA

eryk sun

unread,
Oct 22, 2016, 10:16:28 PM10/22/16
to
On Sun, Oct 23, 2016 at 1:28 AM, Chris Angelico <ros...@gmail.com> wrote:
>
> Fair enough, except that a generator expression is syntactic sugar for
> a generator function, and the return value of a generator function is
> a generator object that hasn't yet been started. So where the boundary
> is, then, is a bit of a fuzzy line.

I meant the behavior seems to have been copied to align with generator
expressions, even though the cited rationale doesn't apply. I'm not
saying this is wrong. It's useful that the expression for the outer
iterator is evaluated in the defining scope. However, it's only
documented for generator expressions, in 6.2.8. The documentation for
comprehensions in 6.2.4 makes no mention of it. Actually, it states
without qualification that a comprehension is evaluated in a separate
scope, which could be misleading:

>>> class A:
... a = [x for x in range(locals().setdefault('y', 2))]
...
>>> A.y
2

Steve D'Aprano

unread,
Oct 22, 2016, 10:42:27 PM10/22/16
to
On Sun, 23 Oct 2016 10:57 am, Chris Angelico wrote:

> This surprised me.
>
> Python 3.4.2 (default, Oct 8 2014, 10:45:20)
> [GCC 4.9.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> y=6
>>>> [(x,y) for x in range(y) for y in range(3)]
> [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2,
> 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (5, 0), (5, 1),
> (5, 2)]

That surprises me too. I wouldn't have expected that y is both global and
local to the comprehension at the same time.

I think that this is a bug in list comprehensions and should at least give a
warning that y is being used as both local and non-local.


This is what happens if you try to access a local before it exists:

py> [(y, x) for x in (1, y) for y in (10, 20)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined


Swap the order, and it works:

py> [(y, x) for y in (10, 20) for x in (1, y)]
[(10, 1), (10, 10), (20, 1), (20, 20)]


But if y happens to exist as a global, the first version mysteriously works!

py> y = 999
py> [(y, x) for x in (1, y) for y in (10, 20)]
[(10, 1), (20, 1), (10, 999), (20, 999)]


Let's expand the list comp to a function:

y = 999
def list_comp():
result = []
for x in (1, y):
for y in (10, 20):
result.append((y, x))
return result


Calling that function gives a NameError, specifically:

UnboundLocalError: local variable 'y' referenced before assignment


> Normally, a comprehension is described as being equivalent to an
> unrolled loop, inside a nested function.

I don't think that's quite the right description, but something like that.

> That would be like this:
>
> def temp():
> ret = []
> for x in range(y):
> for y in range(3):
> ret.append((x,y))
> return ret
> temp()

Indeed.




--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Steve D'Aprano

unread,
Oct 22, 2016, 11:14:50 PM10/22/16
to
I don't understand what you mean by that. If I take you literally, it is
obviously not true:

py> [x for x in garglebarblewarble]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'garglebarblewarble' is not defined

but I'm sure you know that, so I don't understand what you mean by "always".


According to the normal Python scoping rules[1], variables only come from a
single scope at a time. Lua has different rules: translating into Python,
Lua functions work like this:

x = 'global'
def foo():
print x # here, x will be the global x
x = 'local'
print x # but now it is the local x

and foo() will print "global" then "local". But according to Python's
scoping rules, foo must raise NameError, specifically

UnboundLocalError: local variable 'x' referenced before assignment

So it seems strange that a little bit of Lua's behaviour has crept into list
comprehensions. I doubt that's intentional.


There's definitely something strange going on. Compare the what happens when
the semi-global variable is in the first loop iterable versus the second
loop iterable. In this first example, y refers to both the global and the
local, yet strangely there's no error:


py> y = 999
py> [(y, z, x) for x in (1, y) for z in (10, 20) for y in (100,)]
[(100, 10, 1), (100, 20, 1), (100, 10, 999), (100, 20, 999)]


but if we move the reference to y into the second loop, the usual rule about
undefined local variables is used:

py> [(y, z, x) for x in (1, 2) for z in (10, y) for y in (100,)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
UnboundLocalError: local variable 'y' referenced before assignment


Of course there's no problem with accessing globals in the second loop, so
long as the name doesn't clash with a local:

py> Y = 999
py> [(y, z, x) for x in (1, 2) for z in (10, Y) for y in (100,)]
[(100, 10, 1), (100, 999, 1), (100, 10, 2), (100, 999, 2)]





[1] Function declarations are *slightly* different, so we can write this:

def func(a, b=b)

to define a parameter (local variable) "b" that takes its default value from
b in the surrounding scope. But that's a declaration, not an expression.

Chris Angelico

unread,
Oct 22, 2016, 11:24:57 PM10/22/16
to
On Sun, Oct 23, 2016 at 2:14 PM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
>>> But it seems that the first iterator (and only that one) is evaluated
>>> in the parent context:
>>
>> Because the first iterator *can* always be evaluated.
>
> I don't understand what you mean by that. If I take you literally, it is
> obviously not true:
>
> py> [x for x in garglebarblewarble]
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'garglebarblewarble' is not defined
>
> but I'm sure you know that, so I don't understand what you mean by "always".

AIUI, what he means is this:

1) Sometimes, all the iterables can be evaluated in advance.

dice_2d6 = [a+b for a in range(1,7) for b in range(1,7)]

2) But sometimes, subsequent iterables depend on the outer loop.

triangle = [a+b for a in range(1, 7) for b in range(1, a+1)]

So in case #2, you cannot evaluate the second range until you're
actually in the loop - but in both cases, the first one can be
pre-evaluated. Also, the most-outside loop's iterable gets evaluated
exactly once, whereas inner loops might be evaluated more often (or
less, for that matter).

Only in extreme edge cases involving name scoping can the evaluation
of the first iterable depend on whether it's inside or outside the
invisible function.

ChrisA

Chris Angelico

unread,
Oct 22, 2016, 11:27:17 PM10/22/16
to
On Sun, Oct 23, 2016 at 2:14 PM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
> There's definitely something strange going on. Compare the what happens when
> the semi-global variable is in the first loop iterable versus the second
> loop iterable. In this first example, y refers to both the global and the
> local, yet strangely there's no error:
>

Right. Disassembly shows that it's something like this:

y = 6
val1 = [(x,y) for x in range(y) for y in range(3)]

def temp(iter):
ret = []
for x in iter:
for y in range(3):
ret.append((x, y))
return ret
val2 = temp(iter(range(y)))

That first iterable is evaluated in the outer context, then passed as
a parameter.

ChrisA

Steve D'Aprano

unread,
Oct 22, 2016, 11:44:32 PM10/22/16
to
I don't think that this is relevant. Or rather, it might explain why the
behaviour is the way it is, but it isn't justification for it.

Here's a link to the PEP, showing the quote in context:

https://www.python.org/dev/peps/pep-0289/#early-binding-versus-late-binding

But this isn't a question about early or late binding, it is asking why the
variable y is treated as both global and local in the same comprehension.
This may be an unexpected side-effect of other choices, but I don't see any
discussion or consideration of this specific issue.

[Aside: Guido's quote in the PEP is unsourced; there's a reference given,
but it goes to a different email from Guido, not one that includes the
claimed explanation.]

eryk sun

unread,
Oct 23, 2016, 1:09:05 AM10/23/16
to
On Sun, Oct 23, 2016 at 3:44 AM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
>
> https://www.python.org/dev/peps/pep-0289/#early-binding-versus-late-binding
>
> But this isn't a question about early or late binding, it is asking why the
> variable y is treated as both global and local in the same comprehension.
> This may be an unexpected side-effect of other choices, but I don't see any
> discussion or consideration of this specific issue.

For generator expressions, it's about early binding of the outer
iterator. This makes the result of the expression for the outer
iterator behave like a function parameter. Obviously it has to be
evaluated in the defining scope. Comprehensions have adopted the same
behavior, and it's not a bug or a flawed design. However it does need
to be documented better in section 6.2.4.

If the expression for the outer iterator is `range(y)`, the compiler
emits whatever operation is required to load y in the defining scope.
At the module or class level it emits a LOAD_NAME. At function scope,
it uses LOAD_FAST when y is local, LOAD_DEREF when y is a free
variable, and otherwise LOAD_GLOBAL. The interpreter evaluates the
expression (e.g. it calls range) and executes GET_ITER on the result.
The resulting iterator is passed as parameter ".0" to the [generator]
function that implements the comprehension or generator expression.

Terry Reedy

unread,
Oct 23, 2016, 1:17:10 AM10/23/16
to
On 10/22/2016 11:24 PM, Chris Angelico wrote:
> On Sun, Oct 23, 2016 at 2:14 PM, Steve D'Aprano
> <steve+...@pearwood.info> wrote:
>>>> But it seems that the first iterator (and only that one) is evaluated
>>>> in the parent context:
>>>
>>> Because the first iterator *can* always be evaluated.
>>
>> I don't understand what you mean by that. If I take you literally, it is
>> obviously not true:
>>
>> py> [x for x in garglebarblewarble]
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> NameError: name 'garglebarblewarble' is not defined
>>
>> but I'm sure you know that, so I don't understand what you mean by "always".
>
> AIUI, what he means is this:

Right. You put it better than I did.

> 1) Sometimes, all the iterables can be evaluated in advance.
>
> dice_2d6 = [a+b for a in range(1,7) for b in range(1,7)]
>
> 2) But sometimes, subsequent iterables depend on the outer loop.
>
> triangle = [a+b for a in range(1, 7) for b in range(1, a+1)]
>
> So in case #2, you cannot evaluate the second range until you're
> actually in the loop - but in both cases, the first one can be
> pre-evaluated. Also, the most-outside loop's iterable gets evaluated
> exactly once, whereas inner loops might be evaluated more often (or
> less, for that matter).
>
> Only in extreme edge cases involving name scoping can the evaluation
> of the first iterable depend on whether it's inside or outside the
> invisible function.
>
> ChrisA
>


--
Terry Jan Reedy

Chris Angelico

unread,
Oct 23, 2016, 1:47:58 AM10/23/16
to
On Sun, Oct 23, 2016 at 4:08 PM, eryk sun <ery...@gmail.com> wrote:
> For generator expressions, it's about early binding of the outer
> iterator. This makes the result of the expression for the outer
> iterator behave like a function parameter. Obviously it has to be
> evaluated in the defining scope. Comprehensions have adopted the same
> behavior, and it's not a bug or a flawed design. However it does need
> to be documented better in section 6.2.4.

It may not be flawed design, but it's certainly unintuitive, given the
normal explanation:

gen = ((x,y) for x in range(y) for y in range(z))

def g():
for x in range(y):
for y in range(z):
yield (x,y)
gen = g()

It's actually this, which is definitely unintuitive:

def g(it):
for x in it:
for y in range(z):
yield (x,y)
gen = g(iter(range(y)))

Do you ever see them explained like that? Me neither. So if it's
intentional, it needs to be explained somewhere, unless the assumption
is that it'll almost never matter.

ChrisA

Steve D'Aprano

unread,
Oct 23, 2016, 2:37:30 AM10/23/16
to
On Sun, 23 Oct 2016 02:24 pm, Chris Angelico wrote:

> 1) Sometimes, all the iterables can be evaluated in advance.
>
> dice_2d6 = [a+b for a in range(1,7) for b in range(1,7)]
>
> 2) But sometimes, subsequent iterables depend on the outer loop.
>
> triangle = [a+b for a in range(1, 7) for b in range(1, a+1)]
>
> So in case #2, you cannot evaluate the second range until you're
> actually in the loop

Obviously not.


> - but in both cases, the first one can be pre-evaluated.

That doesn't follow. Consider:

gen = (x for x in [time.time()])

Should next(gen) return the time that the generator expression was created,
or the time when you first call next()? The answer to that depends on
whether you want early binding or late binding. If that's not clear,
consider this instead:

y = 'first'
gen = (x for x in [y])
y = 'second'

What will next(gen) return? With *early binding*, it will return "first".
This is how function parameter defaults work. With *late binding*, it will
return "second", which is how access to globals normally work.

But that's a separate issue to the question of what happens if y is a local
variable:

gen = (x for x in [y] for y in [999])

I can see how it happens and why it happens, but I think it is still weird
and deserves at least a warning.



> Also, the most-outside loop's iterable gets evaluated
> exactly once, whereas inner loops might be evaluated more often (or
> less, for that matter).

Again, that's quite straight forward.


> Only in extreme edge cases involving name scoping can the evaluation
> of the first iterable depend on whether it's inside or outside the
> invisible function.

I don't think that this is an extreme edge case.

Steve D'Aprano

unread,
Oct 23, 2016, 2:38:07 AM10/23/16
to
On Sun, 23 Oct 2016 01:15 pm, eryk sun wrote:

> I meant the behavior seems to have been copied to align with generator
> expressions, even though the cited rationale doesn't apply. I'm not
> saying this is wrong. It's useful that the expression for the outer
> iterator is evaluated in the defining scope. However, it's only
> documented for generator expressions, in 6.2.8. The documentation for
> comprehensions in 6.2.4 makes no mention of it.

6.2.8? 6.2.4? What are these references to?

eryk sun

unread,
Oct 23, 2016, 5:58:01 AM10/23/16
to
On Sun, Oct 23, 2016 at 6:15 AM, Steve D'Aprano
<steve+...@pearwood.info> wrote:
> On Sun, 23 Oct 2016 01:15 pm, eryk sun wrote:
>
>> I meant the behavior seems to have been copied to align with generator
>> expressions, even though the cited rationale doesn't apply. I'm not
>> saying this is wrong. It's useful that the expression for the outer
>> iterator is evaluated in the defining scope. However, it's only
>> documented for generator expressions, in 6.2.8. The documentation for
>> comprehensions in 6.2.4 makes no mention of it.
>
> 6.2.8? 6.2.4? What are these references to?

They're section numbers in the Python 3 language reference. Chapter 6
covers expressions.

https://docs.python.org/3/reference/expressions
0 new messages