Dangerous behavior of list(generator)

Tom Machinski

unread,

Dec 12, 2009, 7:15:23 PM12/12/09

to Python-list

In most cases, `list(generator)` works as expected. Thus,
`list(<generator expression>)` is generally equivalent to `[<generator
expression>]`.

Here's a minimal case where this equivalence breaks, causing a serious
and hard-to-detect bug in a program:

>>> def sit(): raise StopIteration()
...
>>> [f() for f in (lambda:1, sit, lambda:2)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in sit
StopIteration
>>> list(f() for f in (lambda:1, sit, lambda:2))
[1]

I was bitten hard by this inconsistency when sit() was returning the
idiom `(foo for foo in bar if foo.is_baz()).next()`. The nonexistence
of a foo with is_baz() True in that query raises an exception as
designed, which expresses itself when I use the list comprehension
version of the code above; the generator version muffles the error and
silently introduces a subtle, confusing bug: `lambda:2` is never
reached, and a truncated list of 1 element (instead of 3) is
"successfully" generated..

Just wondered what you guys think,

-- Tom

Benjamin Kaplan

unread,

Dec 12, 2009, 7:53:57 PM12/12/09

to pytho...@python.org

On Sat, Dec 12, 2009 at 7:15 PM, Tom Machinski <tom.ma...@gmail.com> wrote:
> In most cases, `list(generator)` works as expected. Thus,
> `list(<generator expression>)` is generally equivalent to `[<generator
> expression>]`.
>

Actually, it's list(generator) vs. a list comprehension. I agree that
it can be confusing, but Python considers them to be two different
constructs.

>>> list(xrange(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [xrange(10)]
[xrange(10)]

> Here's a minimal case where this equivalence breaks, causing a serious
> and hard-to-detect bug in a program:
>
> >>> def sit(): raise StopIteration()
> ...
> >>> [f() for f in (lambda:1, sit, lambda:2)]
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 1, in sit
> StopIteration
> >>> list(f() for f in (lambda:1, sit, lambda:2))
> [1]
>
> I was bitten hard by this inconsistency when sit() was returning the
> idiom `(foo for foo in bar if foo.is_baz()).next()`. The nonexistence
> of a foo with is_baz() True in that query raises an exception as
> designed, which expresses itself when I use the list comprehension
> version of the code above; the generator version muffles the error and
> silently introduces a subtle, confusing bug: `lambda:2` is never
> reached, and a truncated list of 1 element (instead of 3) is
> "successfully" generated..
>
> Just wondered what you guys think,
>
> -- Tom

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Ned Deily

unread,

Dec 12, 2009, 9:01:24 PM12/12/09

to pytho...@python.org

In article
<ec96e1390912121653w56c...@mail.gmail.com>,

Benjamin Kaplan <benjami...@case.edu> wrote:
> On Sat, Dec 12, 2009 at 7:15 PM, Tom Machinski <tom.ma...@gmail.com>
> wrote:
> > In most cases, `list(generator)` works as expected. Thus,
> > `list(<generator expression>)` is generally equivalent to `[<generator
> > expression>]`.
> Actually, it's list(generator) vs. a list comprehension. I agree that
> it can be confusing, but Python considers them to be two different
> constructs.
>
> >>> list(xrange(10))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> >>> [xrange(10)]
> [xrange(10)]

That's not a list comprehension, that's a list with one element.

>>> [x for x in xrange(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

<CrocodileDundee> Now *that's* a list comprehension. </CrocodileDundee>

--
Ned Deily,
n...@acm.org

Benjamin Kaplan

unread,

Dec 12, 2009, 9:16:36 PM12/12/09

to pytho...@python.org

I know. But the OP was wondering why list(<generator expression>) was
behaving differently than [<generator-expression>] and I was pointing
out that list comprehensions are considered their own syntax- the list
comprehension [x for x in xrange(10)] is different than [(x for x in
xrange(10)].
> --
> Ned Deily,
> n...@acm.org
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Ned Deily

unread,

Dec 12, 2009, 9:43:20 PM12/12/09

to pytho...@python.org

In article <nad-8CDB63.1...@news.gmane.org>,
Ned Deily <n...@acm.org> wrote:

> In article
> <ec96e1390912121653w56c...@mail.gmail.com>,
> Benjamin Kaplan <benjami...@case.edu> wrote:
> > On Sat, Dec 12, 2009 at 7:15 PM, Tom Machinski <tom.ma...@gmail.com>
> > wrote:
> > > In most cases, `list(generator)` works as expected. Thus,
> > > `list(<generator expression>)` is generally equivalent to `[<generator
> > > expression>]`.
> > Actually, it's list(generator) vs. a list comprehension. I agree that
> > it can be confusing, but Python considers them to be two different
> > constructs.
> >
> > >>> list(xrange(10))
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> > >>> [xrange(10)]
> > [xrange(10)]
>
> That's not a list comprehension, that's a list with one element.
>
> >>> [x for x in xrange(10)]
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> <CrocodileDundee> Now *that's* a list comprehension. </CrocodileDundee>

Which is not quite the point Benjamin was trying to make - sorry!

Consulting the adjacent sections on "List displays" and "Generator
expressions" in the Language Reference:

http://docs.python.org/reference/expressions.html#list-displays

for generator expressions "the parentheses can be omitted on calls with
only one argument " but the expressions in a list_comprehension are not
in a call context. So there is no ambiguity: [<generator expression>]
requires parens around the generator expression and that list display
produces a list with one element as Benjamin points out.

--
Ned Deily,
n...@acm.org

Terry Reedy

unread,

Dec 13, 2009, 3:45:39 AM12/13/09

to pytho...@python.org

Tom Machinski wrote:
> In most cases, `list(generator)` works as expected. Thus,
> `list(<generator expression>)` is generally equivalent to `[<generator
> expression>]`.
>

> Here's a minimal case where this equivalence breaks, causing a serious
> and hard-to-detect bug in a program:
>
> >>> def sit(): raise StopIteration()

StopIteration is intended to be used only within the .__next__ method of
iterators. The devs know that other 'off-label' use results in the
inconsistency you noted, but their and my view is 'don't do that'.

Gabriel Genellina

unread,

Dec 13, 2009, 3:53:33 AM12/13/09

to pytho...@python.org

En Sat, 12 Dec 2009 23:43:20 -0300, Ned Deily <n...@acm.org> escribiï¿œ:

> In article <nad-8CDB63.1...@news.gmane.org>,
> Ned Deily <n...@acm.org> wrote:
>> In article
>> <ec96e1390912121653w56c...@mail.gmail.com>,
>> Benjamin Kaplan <benjami...@case.edu> wrote:
>> > On Sat, Dec 12, 2009 at 7:15 PM, Tom Machinski
>> <tom.ma...@gmail.com>
>> > wrote:

>> > > >>> def sit(): raise StopIteration()
>> > > ...
>> > > >>> [f() for f in (lambda:1, sit, lambda:2)]
>> > > Traceback (most recent call last):
>> > > File "<stdin>", line 1, in <module>
>> > > File "<stdin>", line 1, in sit
>> > > StopIteration
>> > > >>> list(f() for f in (lambda:1, sit, lambda:2))
>> > > [1]

>> > > In most cases, `list(generator)` works as expected. Thus,

>> > > `list(<generator expression>)` is generally equivalent to
>> `[<generator
>> > > expression>]`.

>> > Actually, it's list(generator) vs. a list comprehension. I agree that

>> > it can be confusing, but Python considers them to be two different
>> > constructs.

I think nobody has addressed the OP arguments (as I understand them).

First, except the obvious outer delimiters (and some corner cases in 2.x,
fixed in Python 3), a list comprehension and a generator expression share
the same syntax: (x for x in some_values) vs [x for x in some_values].

Also, *almost* always, both list(<comprehension>) and [<comprehension>],
when evaluated, yield the same result [1]. *Almost* because StopIteration
is handled differently as the OP discovered: the list comprehension
propagates a StopIteration exception to its caller; the list constructor
swallows the exception and the caller never sees it.

Despite a promise in PEP 289, generator expressions semantics isn't
explained in detail in the language reference. I can't tell if the
difference is intentional, accidental, undocumented behavior, an
implementation accident, a bug, or what...

[1] <comprehension> being a syntactic construct like:
x**2 for x in range(5)
or:
f() for f in [lambda:1, sit, lambda:2]

--
Gabriel Genellina

Gabriel Genellina

unread,

Dec 13, 2009, 3:53:33 AM12/13/09

to pytho...@python.org

exa...@twistedmatrix.com

unread,

Dec 13, 2009, 9:35:21 AM12/13/09

to Terry Reedy, pytho...@python.org

On 08:45 am, tjr...@udel.edu wrote:

>Tom Machinski wrote:
>>In most cases, `list(generator)` works as expected. Thus,
>>`list(<generator expression>)` is generally equivalent to `[<generator
>>expression>]`.
>>

>>Here's a minimal case where this equivalence breaks, causing a serious
>>and hard-to-detect bug in a program:
>>
>> >>> def sit(): raise StopIteration()
>
>StopIteration is intended to be used only within the .__next__ method
>of iterators. The devs know that other 'off-label' use results in the
>inconsistency you noted, but their and my view is 'don't do that'.

Which is unfortunate, because it's not that hard to get StopIteration
without explicitly raising it yourself and this behavior makes it
difficult to debug such situations.

What's with this view, exactly? Is it just that it's hard to implement
the more desirable behavior?

Jean-Paul

Steven D'Aprano

unread,

Dec 13, 2009, 3:18:47 PM12/13/09

to

On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:

>>StopIteration is intended to be used only within the .__next__ method of
>>iterators. The devs know that other 'off-label' use results in the
>>inconsistency you noted, but their and my view is 'don't do that'.
>
> Which is unfortunate, because it's not that hard to get StopIteration
> without explicitly raising it yourself and this behavior makes it
> difficult to debug such situations.

I can't think of any way to get StopIteration without explicitly raising
it yourself. It's not like built-ins or common data structures routinely
raise StopIteration. I don't think I've *ever* seen a StopIteration that
I didn't raise myself.

> What's with this view, exactly? Is it just that it's hard to implement
> the more desirable behavior?

What is that "more desirable behaviour"? That StopIteration is used to
signal that Python should stop iterating except when you want it to be
ignored? Unfortunately, yes, it's quite hard to implement "do what the
caller actually wants, not what he asked for" behaviour -- and even if it
were possible, it goes against the grain of the Zen of Python.

If you've ever had to debug faulty "Do What I Mean" software, you'd see
this as a good thing.

--
Steven

exa...@twistedmatrix.com

unread,

Dec 13, 2009, 5:45:58 PM12/13/09

to pytho...@python.org

On 08:18 pm, st...@remove-this-cybersource.com.au wrote:
>On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:
>>>StopIteration is intended to be used only within the .__next__ method
>>>of
>>>iterators. The devs know that other 'off-label' use results in the
>>>inconsistency you noted, but their and my view is 'don't do that'.
>>
>>Which is unfortunate, because it's not that hard to get StopIteration
>>without explicitly raising it yourself and this behavior makes it
>>difficult to debug such situations.
>
>I can't think of any way to get StopIteration without explicitly
>raising
>it yourself. It's not like built-ins or common data structures
>routinely
>raise StopIteration. I don't think I've *ever* seen a StopIteration
>that
>I didn't raise myself.

Call next on an iterator. For example: iter(()).next()

>
>>What's with this view, exactly? Is it just that it's hard to
>>implement
>>the more desirable behavior?
>
>What is that "more desirable behaviour"? That StopIteration is used to
>signal that Python should stop iterating except when you want it to be
>ignored? Unfortunately, yes, it's quite hard to implement "do what the
>caller actually wants, not what he asked for" behaviour -- and even if
>it
>were possible, it goes against the grain of the Zen of Python.
>
>If you've ever had to debug faulty "Do What I Mean" software, you'd see
>this as a good thing.

I have plenty of experience developing and debugging software, Steven.
Your argument is specious, as it presupposes that only two possibilities
exist: the current behavior of some kind of magical faerie land
behavior.

I'm surprised to hear you say that the magical faerie land behavior
isn't desirable either, though. I'd love a tool that did what I wanted,
not what I asked. The only serious argument against this, I think, is
that it is beyond our current ability to create (and so anyone claiming
to be able to do it is probably mistaken).

You chopped out all the sections of this thread which discussed the more
desirable behavior. You can go back and read them in earlier messages
if you need to be reminded. I'm not talking about anything beyond
what's already been raised.

I'm pretty sure I know the answer to my question, though - it's hard to
implement, so it's not implemented.

Jean-Paul

Lie Ryan

unread,

Dec 13, 2009, 9:50:10 PM12/13/09

to

On 12/14/2009 9:45 AM, exa...@twistedmatrix.com wrote:
> On 08:18 pm, st...@remove-this-cybersource.com.au wrote:
>> On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:
>>>> StopIteration is intended to be used only within the .__next__
>>>> method of
>>>> iterators. The devs know that other 'off-label' use results in the
>>>> inconsistency you noted, but their and my view is 'don't do that'.
>>>
>>> Which is unfortunate, because it's not that hard to get StopIteration
>>> without explicitly raising it yourself and this behavior makes it
>>> difficult to debug such situations.
>>
>> I can't think of any way to get StopIteration without explicitly raising
>> it yourself. It's not like built-ins or common data structures routinely
>> raise StopIteration. I don't think I've *ever* seen a StopIteration that
>> I didn't raise myself.
>
> Call next on an iterator. For example: iter(()).next()

.next() is not meant to be called directly, that's why it's renamed to
.__next__() in python 3. Just like .__add__() is never meant to be
called directly since you'll then have to handle NotImplemented.

If you really need to call .__next__() you will call next() builtin
function instead which has a second argument to return a sentinel value
instead of StopIteration. IMNSHO next()'s sentinel should always be
specified except if you're implementing __next__() or if the sequence is
an infinite iterator.

>>> What's with this view, exactly? Is it just that it's hard to implement
>>> the more desirable behavior?
>>
>> What is that "more desirable behaviour"? That StopIteration is used to
>> signal that Python should stop iterating except when you want it to be
>> ignored? Unfortunately, yes, it's quite hard to implement "do what the
>> caller actually wants, not what he asked for" behaviour -- and even if it
>> were possible, it goes against the grain of the Zen of Python.
>>
>> If you've ever had to debug faulty "Do What I Mean" software, you'd see
>> this as a good thing.
>
> I have plenty of experience developing and debugging software, Steven.
> Your argument is specious, as it presupposes that only two possibilities
> exist: the current behavior of some kind of magical faerie land behavior.
>
> I'm surprised to hear you say that the magical faerie land behavior
> isn't desirable either, though. I'd love a tool that did what I wanted,
> not what I asked. The only serious argument against this, I think, is
> that it is beyond our current ability to create (and so anyone claiming
> to be able to do it is probably mistaken).

In your world, this is what happens:
>>> list = [a, b, c]
>>> # print list
>>> print list
["a", "b", "c"]
>>> # make a copy of list
>>> alist = list(llst) # oops a mistype
>>> alist = alist - "]" + ", "d"]"
>>> print alist
["a", "b", "c", "d"]
>>> alist[:6] + "i", + alist[6:]
>>> print alist
["a", "i", "b", "c", "d"]
>>> print alist
>>> # hearing the sound of my deskjet printer...
>>> C:\fikle.text.write(alist)
>>> print open("C:\file.txt").read()
<h1>a</h1>
<ul>
<li>i</li>
<li>b</li>
<li>c d</li>
>>> # great, exactly what I needed

exa...@twistedmatrix.com

unread,

Dec 13, 2009, 10:29:38 PM12/13/09

to Lie Ryan, pytho...@python.org

On 02:50 am, lie....@gmail.com wrote:
>On 12/14/2009 9:45 AM, exa...@twistedmatrix.com wrote:
>>On 08:18 pm, st...@remove-this-cybersource.com.au wrote:
>>>On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:
>>>>>StopIteration is intended to be used only within the .__next__
>>>>>method of
>>>>>iterators. The devs know that other 'off-label' use results in the
>>>>>inconsistency you noted, but their and my view is 'don't do that'.
>>>>
>>>>Which is unfortunate, because it's not that hard to get
>>>>StopIteration
>>>>without explicitly raising it yourself and this behavior makes it
>>>>difficult to debug such situations.
>>>
>>>I can't think of any way to get StopIteration without explicitly
>>>raising
>>>it yourself. It's not like built-ins or common data structures
>>>routinely
>>>raise StopIteration. I don't think I've *ever* seen a StopIteration
>>>that
>>>I didn't raise myself.
>>
>>Call next on an iterator. For example: iter(()).next()
>
>.next() is not meant to be called directly

Doesn't matter. Sometimes it makes sense to call it directly. And I
was just giving an example of a way to get StopIteration raised without
doing it yourself - which is what Steve said he couldn't think of.

I don't understand the point of this code listing, sorry. I suspect you
didn't completely understand the magical faerie land I was describing -
where all your programs would work, no matter what mistakes you made
while writing them.

Jean-Paul

Steven D'Aprano

unread,

Dec 13, 2009, 11:11:28 PM12/13/09

to

On Sun, 13 Dec 2009 22:45:58 +0000, exarkun wrote:

> On 08:18 pm, st...@remove-this-cybersource.com.au wrote:
>>On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:
>>>>StopIteration is intended to be used only within the .__next__ method
>>>>of
>>>>iterators. The devs know that other 'off-label' use results in the
>>>>inconsistency you noted, but their and my view is 'don't do that'.
>>>
>>>Which is unfortunate, because it's not that hard to get StopIteration
>>>without explicitly raising it yourself and this behavior makes it
>>>difficult to debug such situations.
>>
>>I can't think of any way to get StopIteration without explicitly raising
>>it yourself. It's not like built-ins or common data structures routinely
>>raise StopIteration. I don't think I've *ever* seen a StopIteration that
>>I didn't raise myself.
>
> Call next on an iterator. For example: iter(()).next()

Or in more recent versions of Python, next(iter(())).

Good example. But next() is a special case, and since next() is
documented as raising StopIteration if you call it and it raises
StopIteration, you have raised it yourself. Just not explicitly.

>>>What's with this view, exactly? Is it just that it's hard to implement
>>>the more desirable behavior?
>>
>>What is that "more desirable behaviour"? That StopIteration is used to
>>signal that Python should stop iterating except when you want it to be
>>ignored? Unfortunately, yes, it's quite hard to implement "do what the
>>caller actually wants, not what he asked for" behaviour -- and even if
>>it were possible, it goes against the grain of the Zen of Python.
>>
>>If you've ever had to debug faulty "Do What I Mean" software, you'd see
>>this as a good thing.
>
> I have plenty of experience developing and debugging software, Steven.
> Your argument is specious, as it presupposes that only two possibilities
> exist: the current behavior of some kind of magical faerie land
> behavior.
>
> I'm surprised to hear you say that the magical faerie land behavior
> isn't desirable either, though. I'd love a tool that did what I wanted,
> not what I asked. The only serious argument against this, I think, is
> that it is beyond our current ability to create (and so anyone claiming
> to be able to do it is probably mistaken).

I'd argue that tools that do what you want rather than what you ask for
are not just currently impossible, but always will be -- no matter how
good the state of the art of artificial intelligent mind-reading software
becomes.

> You chopped out all the sections of this thread which discussed the more
> desirable behavior. You can go back and read them in earlier messages
> if you need to be reminded. I'm not talking about anything beyond
> what's already been raised.

I'm glad for you. But would you mind explaining for those of us aren't
mind-readers what YOU consider the "more desirable behaviour"?

If you're talking the list constructor and list comprehensions treating
StopIteration the same, then I don't think it is at all self-evident that
the current behaviour is a bad thing, nor that the only reason for it is
that to do otherwise would be hard.

(I don't think it would be hard to have list comps swallow a
StopIteration.)

> I'm pretty sure I know the answer to my question, though - it's hard to
> implement, so it's not implemented.
>
> Jean-Paul

--
Steven

exa...@twistedmatrix.com

unread,

Dec 13, 2009, 11:33:04 PM12/13/09

to pytho...@python.org

But if you mistakenly don't catch it, and you're trying to debug your
code to find this mistake, you probably won't be aided in this pursuit
by the exception-swallowing behavior of generator expressions.

That may be true. I won't try to make any predictions about the
arbitrarily distant future, though.

>>You chopped out all the sections of this thread which discussed the
>>more
>>desirable behavior. You can go back and read them in earlier messages
>>if you need to be reminded. I'm not talking about anything beyond
>>what's already been raised.
>
>I'm glad for you. But would you mind explaining for those of us aren't
>mind-readers what YOU consider the "more desirable behaviour"?

The behavior of list comprehensions is pretty good. The behavior of
constructing a list out of a generator expression isn't as good. The
behavior which is more desirable is for a StopIteration raised out of
the `expression` part of a `generator_expression` to not be treated
identically to the way a StopIteration raised out of the `genexpr_for`
part is. This could provide behavior roughly equivalent to the behavior
of a list comprehension.

>
>If you're talking the list constructor and list comprehensions treating
>StopIteration the same, then I don't think it is at all self-evident
>that
>the current behaviour is a bad thing, nor that the only reason for it
>is
>that to do otherwise would be hard.

I don't expect it to be self-evident. I wasn't even trying to convince
anyone that it's desirable (although I did claim it, so I won't fault
anyone for making counter-arguments). The only thing I asked was what
the motivation for the current behavior is. If the motivation is that
it is self-evident that the current behavior is the best possible
behavior, then someone just needs to say that and my question is
answered. :)

Jean-Paul

Terry Reedy

unread,

Dec 14, 2009, 1:46:52 AM12/14/09

to pytho...@python.org

On 12/13/2009 10:29 PM, exa...@twistedmatrix.com wrote:

> Doesn't matter. Sometimes it makes sense to call it directly.

It only makes sense to call next (or .__next__) when you are prepared to
explicitly catch StopIteration within a try..except construct.
You did not catch it, so it stopped execution.

Let me repeat: StopIteration is intended only for stopping iteration.
Outside that use, it is a normal exception with no special meaning.

Terry Jan Reedy

Terry Reedy

unread,

Dec 14, 2009, 2:08:08 AM12/14/09

to pytho...@python.org

On 12/13/2009 11:33 PM, exa...@twistedmatrix.com wrote:
> But if you mistakenly don't catch it, and you're trying to debug your
> code to find this mistake, you probably won't be aided in this pursuit
> by the exception-swallowing behavior of generator expressions.

As I remember, it was the call to list that swalled the exception, not
the generator expression. List() takes an iterable as arg and stopping
on StopIteration is what it does and how it knows to stop and return the
new list.

> The behavior of list comprehensions is pretty good. The behavior of
> constructing a list out of a generator expression isn't as good.

I think you are confused. A generator expression is a shorthand for a
def statement that defines a generator function followed by a call to
the generator function to get a generator followed by deletion of the
function. When you call list() to make a list, it constructs the list
from the generator, not from the expression itself. List has no idea
that you used a generator expression or even that it was passed a
generator. Leaving error checks out, it operates something like

def list(it):
res = []
it = iter(it)
for item in it: # stops whenever it raises StopIteration
res.append(item)
return res

> The
> behavior which is more desirable is for a StopIteration raised out of
> the `expression` part of a `generator_expression` to not be treated
> identically to the way a StopIteration raised out of the `genexpr_for`
> part is.

It is not. StopIteration in for part stops the for loop in the
generator. StopIteration in the expression part stops the loop in the
list() call (sooner than it would have been otherwise). When the
generator raises StopIteration, list() has no idea what statement within
the body raised it. It MUST stop.

> This could provide behavior roughly equivalent to the behavior
> of a list comprehension.

Impossible. The only serious option for consistency is to special case
list comps to also trap StopIteration raised in the expression part, but
the devs decided not to do this as doing do is arguably a bug.

Terry Jan Reedy

Lie Ryan

unread,

Dec 14, 2009, 5:26:49 AM12/14/09

to exa...@twistedmatrix.com, pytho...@python.org

On 12/14/09, exa...@twistedmatrix.com <exa...@twistedmatrix.com> wrote:
> On 02:50 am, lie....@gmail.com wrote:

>>On 12/14/2009 9:45 AM, exa...@twistedmatrix.com wrote:
>>>On 08:18 pm, st...@remove-this-cybersource.com.au wrote:
>>>>On Sun, 13 Dec 2009 14:35:21 +0000, exarkun wrote:
>>>>>>StopIteration is intended to be used only within the .__next__
>>>>>>method of
>>>>>>iterators. The devs know that other 'off-label' use results in the
>>>>>>inconsistency you noted, but their and my view is 'don't do that'.
>>>>>
>>>>>Which is unfortunate, because it's not that hard to get
>>>>>StopIteration
>>>>>without explicitly raising it yourself and this behavior makes it
>>>>>difficult to debug such situations.
>>>>
>>>>I can't think of any way to get StopIteration without explicitly
>>>>raising
>>>>it yourself. It's not like built-ins or common data structures
>>>>routinely
>>>>raise StopIteration. I don't think I've *ever* seen a StopIteration
>>>>that
>>>>I didn't raise myself.
>>>
>>>Call next on an iterator. For example: iter(()).next()
>>
>>.next() is not meant to be called directly
>

> Doesn't matter. Sometimes it makes sense to call it directly. And I
> was just giving an example of a way to get StopIteration raised without
> doing it yourself - which is what Steve said he couldn't think of.
>>>

> I don't understand the point of this code listing, sorry. I suspect you
> didn't completely understand the magical faerie land I was describing -
> where all your programs would work, no matter what mistakes you made
> while writing them.

Exactly, that's what's happening. It just works. It knows that when I
said alist[:6] + "i", + alist[6:] ; I want to insert "i" between the
sixth character of the textual representation of the list. It knows to
find the correct variable when I made a typo. It correctly guess that
I want to print to a paper instead of to screen. It knows that when I
wrote to C:\path.write(), it knows I wanted a HTML output in that
specific format. It just works (TM), whatever mistakes I made. That's
what you wanted, right?

Peter Otten

unread,

Dec 14, 2009, 5:35:11 AM12/14/09

to

Terry Reedy wrote:

A viable option might be to introduce a different exception type and
translate

(expr(v) for v in items if cond(v))

into

def gen(items, expr, cond):
for v in items:
try:
if cond(v):
yield expr(v)
except StopIteration:
raise TypeError("StopIteration raised in "
"'expr' or 'cond' part of "
"a generator expression")

Peter

exa...@twistedmatrix.com

unread,

Dec 14, 2009, 9:31:44 AM12/14/09

to pytho...@python.org

On 06:46 am, tjr...@udel.edu wrote:
>On 12/13/2009 10:29 PM, exa...@twistedmatrix.com wrote:

>>Doesn't matter. Sometimes it makes sense to call it directly.
>

>It only makes sense to call next (or .__next__) when you are prepared
>to explicitly catch StopIteration within a try..except construct.
>You did not catch it, so it stopped execution.
>
>Let me repeat: StopIteration is intended only for stopping iteration.
>Outside that use, it is a normal exception with no special meaning.

You cut out the part of my message where I wrote that one might have
forgotten the exception handling code that you posit is required, and
that the current behavior makes debugging this situation unnecessarily
challenging.

Jean-Paul

M.-A. Lemburg

unread,

Dec 14, 2009, 9:58:51 AM12/14/09

to exa...@twistedmatrix.com, pytho...@python.org

exa...@twistedmatrix.com wrote:
> On 08:45 am, tjr...@udel.edu wrote:
>> Tom Machinski wrote:
>>> In most cases, `list(generator)` works as expected. Thus,
>>> `list(<generator expression>)` is generally equivalent to `[<generator
>>> expression>]`.
>>>
>>> Here's a minimal case where this equivalence breaks, causing a serious
>>> and hard-to-detect bug in a program:
>>>
>>> >>> def sit(): raise StopIteration()
>>

>> StopIteration is intended to be used only within the .__next__ method
>> of iterators. The devs know that other 'off-label' use results in the
>> inconsistency you noted, but their and my view is 'don't do that'.
>
> Which is unfortunate, because it's not that hard to get StopIteration
> without explicitly raising it yourself and this behavior makes it
> difficult to debug such situations.
>

> What's with this view, exactly? Is it just that it's hard to implement
> the more desirable behavior?

I'm not exactly sure what you're asking for.

The StopIteration exception originated as part of the for-loop
protocol. Later on it was generalized to apply to generators
as well.

The reason for using an exception is simple: raising and catching
exceptions is fast at C level and since the machinery for
communicating exceptions up the call stack was already there
(and doesn't interfere with the regular return values), this
was a convenient method to let the upper call levels know
that an iteration has ended (e.g. a for-loop 4 levels up the
stack).

I'm not sure whether that answers your question, but it's the
reason for things being as they are :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 14 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

exa...@twistedmatrix.com

unread,

Dec 14, 2009, 10:21:09 AM12/14/09

to pytho...@python.org

I'm asking about why the behavior of a StopIteration exception being
handled from the `expression` of a generator expression to mean "stop
the loop" is accepted by "the devs" as acceptable. To continue your
comparison to for loops, it's as if a loop like this:

for a in b:
c

actually meant this:

for a in b:
try:
c
except StopIteration:
break

Note, I know *why* the implementation leads to this behavior. I'm
asking why "the devs" *accept* this.

Jean-Paul

Mel

unread,

Dec 14, 2009, 11:09:19 AM12/14/09

to

exa...@twistedmatrix.com wrote:
[ ... ]

it's as if a loop like this:
>
> for a in b:
> c
>
> actually meant this:
>
> for a in b:
> try:
> c
> except StopIteration:
> break
>
> Note, I know *why* the implementation leads to this behavior. I'm
> asking why "the devs" *accept* this.

It's part of the price Python pays for letting people get their hands on the
controls. Consider also:

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class dict2(dict):
... def __getitem__ (self, key):
... if key == 'fatal':
... raise KeyError
...
>>> d = dict2()
>>> d['fatal'] = 'Hello, world!'
>>> print d['fatal']

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "<stdin>", line 4, in __getitem__
KeyError
>>>

"KeyError when we just put the item into the dict?"
"Yep."

Mel.

>
> Jean-Paul

Terry Reedy

unread,

Dec 14, 2009, 1:00:07 PM12/14/09

to pytho...@python.org

On 12/14/2009 10:21 AM, exa...@twistedmatrix.com wrote:

> I'm asking about why the behavior of a StopIteration exception being
> handled from the `expression` of a generator expression to mean "stop
> the loop" is accepted by "the devs" as acceptable.

Any unhandled exception within a loop stops the loop,
and the exception is passed to the surrounding code.

> To continue your
> comparison to for loops, it's as if a loop like this:

>
> for a in b:
> c
>
> actually meant this:
>
> for a in b:
> try:
> c
> except StopIteration:
> break

No it does not. If c raises any exception, the loop stops *and* the
exception is passed up to the surrounding code.

> Note, I know *why* the implementation leads to this behavior.

You do not seem to know what the behavior is.
Read what I wrote last night.

Terry Jan Reedy

exa...@twistedmatrix.com

unread,

Dec 14, 2009, 2:05:29 PM12/14/09

to pytho...@python.org

On 06:00 pm, tjr...@udel.edu wrote:
>On 12/14/2009 10:21 AM, exa...@twistedmatrix.com wrote:
>>I'm asking about why the behavior of a StopIteration exception being
>>handled from the `expression` of a generator expression to mean "stop
>>the loop" is accepted by "the devs" as acceptable.
>
>Any unhandled exception within a loop stops the loop,
>and the exception is passed to the surrounding code.
>>To continue your
>>comparison to for loops, it's as if a loop like this:
>>
>>for a in b:
>>c
>>
>>actually meant this:
>>
>>for a in b:
>>try:
>>c
>>except StopIteration:
>>break
>
>No it does not.

No what does not? I said "It is as if". This is a hypothetical. I'm
not claiming this is the actual behavior of anything.

>>Note, I know *why* the implementation leads to this behavior.
>
>You do not seem to know what the behavior is.
>Read what I wrote last night.

Well, I'm a bit tired of this thread. Please disregard my question
above. I'm done here. Sorry for the confusion. Have a nice day.

Jean-Paul

Antoine Pitrou

unread,

Dec 14, 2009, 4:30:00 PM12/14/09

to pytho...@python.org

Le Mon, 14 Dec 2009 15:21:09 +0000, exarkun a écrit :
>
> I'm asking about why the behavior of a StopIteration exception being
> handled from the `expression` of a generator expression to mean "stop
> the loop" is accepted by "the devs" as acceptable.

It's not "accepted as acceptable", it's just a side effect of how various
means of iterating (including for loops and generators) are implemented
in CPython.
Seeing how it doesn't seem to prevent or promote any useful programming
idiom, there was no incentive to either 1) codify it as official spec or
2) change it.

In other words, it should be considered undefined behaviour, and perhaps
other Python implementations behave differently.

Regards

Antoine.

Carl Banks

unread,

Dec 14, 2009, 5:05:24 PM12/14/09

to

On Dec 14, 7:21 am, exar...@twistedmatrix.com wrote:
> Note, I know *why* the implementation leads to this behavior. I'm
> asking why "the devs" *accept* this.

As noted, the problem isn't with generators but with iteration
protocol. The devs "allowed" this because it was a necessary evil for
correct functionality. As the system is set up it can't discriminate
between a legitimate and a spurrious StopIteration. (Which is why
Steven and others called you out for suggesting that the compiler has
to read your mind.)

However, as far as I'm concerned there is no reasonable argument that
this behavior is good. So how, hypothetically, would one go about
fixing it, short of ripping out and replacing the existing machinery?

The first argument is that StopIteration has no place within a
generator expression. Therefore a generator expression (but not a
generator function) could deliberately catch StopIteration and raise a
different exception.

I don't like it, though: who says that StopIteration has no place
within a generator expression? Currently it's possible to do
something like this to terminate a genexp early, and I won't the one
saying you shouldn't do it.

def stop(): raise StopIteration

list(x or stop() for x in stream)

(Though personally I'd bite the bullet and write it as a generator
function).

What else? The way I see it, when you throw StopIteration you are
trying to stop a specific generator. Therefore StopIteration should
(somehow) contain a reference to the generator that it's being applied
to. Perhaps it can be obtained by crawling ths stack, which shouldn't
be a significant penalty (it'd be only called once per iteration, and
most of the time it'd be only one or two frames up). The looping
logic within Python should check whether the reference matches the
object it's iterating over; if not it raises a LeakyLoopException or
something like that.

I haven't thought this out though, I'm just kind of throwing this out
there. Any issues?

But to answer your question, I think "simple is better than complex"
rules the day. Right now StopIteration stops an iteration, simple as
that. Any fix would add complexity.

Carl Banks

Steven D'Aprano

unread,

Dec 14, 2009, 5:48:34 PM12/14/09

to

I don't see why you think this is any more challenging to debug than any
other equivalent bug. If anything I would think it was easier to debug:
if the problem is that you get a StopIteration traceback, well that's
easy and straightforward, and if the problem is that you don't (and
consequently you end up with fewer items in the list than you expect),
the obvious debugging technique is to build the list by hand and inspect
each item before adding it to the list:

L = []
for i, item in enumerate(iterable):
print i, item,
value = item() # raises StopIteration
print value
L.append(value)

That will expose the StopIteration exception and reveal the problem.

But even if I have missed something, and it is a challenging problem to
debug, oh well. It should be a quite unusual situation to come across.

--
Steven

Carl Banks

unread,

Dec 14, 2009, 6:26:25 PM12/14/09

to

On Dec 14, 2:48 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Mon, 14 Dec 2009 14:31:44 +0000, exarkun wrote:

> > On 06:46 am, tjre...@udel.edu wrote:

> >>On 12/13/2009 10:29 PM, exar...@twistedmatrix.com wrote:
> >>>Doesn't matter. Sometimes it makes sense to call it directly.
>
> >>It only makes sense to call next (or .__next__) when you are prepared to
> >>explicitly catch StopIteration within a try..except construct. You did
> >>not catch it, so it stopped execution.
>
> >>Let me repeat: StopIteration is intended only for stopping iteration.
> >>Outside that use, it is a normal exception with no special meaning.
>
> > You cut out the part of my message where I wrote that one might have
> > forgotten the exception handling code that you posit is required, and
> > that the current behavior makes debugging this situation unnecessarily
> > challenging.
>
> I don't see why you think this is any more challenging to debug than any
> other equivalent bug.

"Errors should never pass silently."

I'm not saying it's necessarily difficult to debug--although building
a list by hand to test it is a lot more work than reading an exception
traceback--but it'a stark violation of a Zen and common sense, so it
is more serious than other sorts of errors.

Carl Banks

Steven D'Aprano

unread,

Dec 15, 2009, 1:26:52 AM12/15/09

to

On Mon, 14 Dec 2009 15:26:25 -0800, Carl Banks wrote:

> On Dec 14, 2:48 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Mon, 14 Dec 2009 14:31:44 +0000, exarkun wrote:
>> > On 06:46 am, tjre...@udel.edu wrote:
>> >>On 12/13/2009 10:29 PM, exar...@twistedmatrix.com wrote:
>> >>>Doesn't matter. Sometimes it makes sense to call it directly.
>>
>> >>It only makes sense to call next (or .__next__) when you are prepared
>> >>to explicitly catch StopIteration within a try..except construct. You
>> >>did not catch it, so it stopped execution.
>>
>> >>Let me repeat: StopIteration is intended only for stopping iteration.
>> >>Outside that use, it is a normal exception with no special meaning.
>>
>> > You cut out the part of my message where I wrote that one might have
>> > forgotten the exception handling code that you posit is required, and
>> > that the current behavior makes debugging this situation
>> > unnecessarily challenging.
>>
>> I don't see why you think this is any more challenging to debug than
>> any other equivalent bug.
>
> "Errors should never pass silently."

StopIteration isn't an error, it's a signal. The error is *misusing*
StopIteration, and the above Zen no more applies than it would if I did
x-y instead of y-x and complained that I got no traceback. Some errors
are programming mistakes, and they are the deadliest error because they
can and do pass silently. There's nothing you can do about that except
Don't Make Mistakes.

> I'm not saying it's necessarily difficult to debug--although building a
> list by hand to test it is a lot more work than reading an exception
> traceback

Of course. Nobody said the life of a programmer was all beer and
skittles :)

> --but it'a stark violation of a Zen and common sense, so it is
> more serious than other sorts of errors.

I'm happy to accept it is a Gotcha, but a bug? I'm not convinced.

--
Steven

Michele Simionato

unread,

Dec 16, 2009, 5:31:57 AM12/16/09

to

On Dec 14, 11:05 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> But to answer your question, I think "simple is better than complex"
> rules the day. Right now StopIteration stops an iteration, simple as
> that. Any fix would add complexity.

+1

Gregory Ewing

unread,

Dec 16, 2009, 6:48:22 PM12/16/09

to

exa...@twistedmatrix.com wrote:

> Which is unfortunate, because it's not that hard to get StopIteration
> without explicitly raising it yourself and this behavior makes it
> difficult to debug such situations.

It might not be hard if you set out to do it, but in
my experience it's pretty rare to end up getting a
StopIteration raised accidentally in an unexpected
place.

--
Greg

Albert van der Horst

unread,

Dec 18, 2009, 12:58:21 PM12/18/09

to

In article <mailman.1818.1260694...@python.org>,
Gabriel Genellina <gags...@yahoo.com.ar> wrote:
<SNIP>

>
>Despite a promise in PEP 289, generator expressions semantics isn't
>explained in detail in the language reference. I can't tell if the
>difference is intentional, accidental, undocumented behavior, an
>implementation accident, a bug, or what...

Philosophically speaking ...
An important feature that is not documented is a severe defect.
(important maps to severe).
Before it is documented, there can be no discrepancy between
specification and implementation so other defects are formally
not present in relation to this situation.

>--
>Gabriel Genellina
>

Groetjes Albert.

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Gregory Ewing

unread,

Dec 19, 2009, 12:15:18 AM12/19/09

to

Albert van der Horst wrote:

> An important feature that is not documented is a severe defect.

This isn't something that I would expect to find documented
under the heading of generator expressions, because it doesn't
have anything to do with them. It's an interaction between
the iterator protocol and the list() constructor. Any other
iterable that leaked a StopIteration exception would cause
the same effect.

--
Greg

Tom Machinski

unread,

Dec 30, 2009, 6:18:11 PM12/30/09

to pytho...@python.org

Thanks for the comment and discussion guys.

Bottom line, I'm going to have to remove this pattern from my code:

foo = (foo for foo in foos if foo.bar).next()

I used to have that a lot in cases where not finding at least one
valid foo is an actual fatal error. But using StopIteration to signal
a fatal condition becomes a bug when interacting with list() as shown
in the original post.

It would be nice if there was a builtin for "get the first element in
a genexp, or raise an exception (which isn't StopIteration)", sort of
like:

from itertools import islice

def first_or_raise(genexp):
L = list(islice(genexp, 1))
if not L:
raise RuntimeError('no elements found')
return L[0]

I also think Jean-Paul's had a good point about how the problems in
the list/genexp interaction could be addressed.

Thank you,

-- Tom

Steven D'Aprano

unread,

Dec 30, 2009, 7:01:39 PM12/30/09

to

On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote:

> Thanks for the comment and discussion guys.
>
> Bottom line, I'm going to have to remove this pattern from my code:
>
> foo = (foo for foo in foos if foo.bar).next()

I don't see why. What's wrong with it? Unless you embed it in a call to
list, or similar, it will explicitly raise StopIteration as expected.

> I used to have that a lot in cases where not finding at least one valid
> foo is an actual fatal error.

What's wrong with the obvious solution?

if not any(foo for foo in foos if foo.bar):
raise ValueError('need at least one valid foo')

> But using StopIteration to signal a fatal
> condition becomes a bug when interacting with list() as shown in the
> original post.

You shouldn't use StopIteration to signal fatal conditions, because
that's not what it is for. It's acceptable to catch it when *directly*
calling next, but otherwise you should expect that StopIteration will be
caught and suppressed by just about anything.

> It would be nice if there was a builtin for "get the first element in a
> genexp, or raise an exception (which isn't StopIteration)",

Not everything needs to be a built-in.

def get_first_or_fail(iterable_or_sequence):
it = iter(iterable_or_sequence)
try:
return it.next() # use next(it) in Python 3
except StopIteration:
raise ValueError('empty iterable')

This is perfectly usable as a helper function, or it's short enough to be
used in-line if you prefer.

--
Steven

Benjamin Kaplan

unread,

Dec 30, 2009, 11:20:06 PM12/30/09

to pytho...@python.org

On Wed, Dec 30, 2009 at 7:01 PM, Steven D'Aprano
<st...@remove-this-cybersource.com.au> wrote:
>
> I don't see why. What's wrong with it? Unless you embed it in a call to
> list, or similar, it will explicitly raise StopIteration as expected.
>
>
>> I used to have that a lot in cases where not finding at least one valid
>> foo is an actual fatal error.
>
> What's wrong with the obvious solution?
>
> if not any(foo for foo in foos if foo.bar):
> raise ValueError('need at least one valid foo')

That would require 2 iterations through foos- once in the test, once
for the assignment if successful. If foos takes a long time to iterate
through, it might be faster to put a try-except around the original
statement, catch the StopIteration, and raise a ValueError in its
place. Which I agree is much better practice than letting the
StopIteration signal the fatal error.

Peter Otten

unread,

Dec 31, 2009, 4:54:59 AM12/31/09

to

Tom Machinski wrote:

> It would be nice if there was a builtin for "get the first element in
> a genexp, or raise an exception (which isn't StopIteration)", sort of
> like:
>
> from itertools import islice
>
> def first_or_raise(genexp):
> L = list(islice(genexp, 1))
> if not L:
> raise RuntimeError('no elements found')
> return L[0]

Somewhat related in 2.6 there's the next() built-in which accepts a default
value. You can provide a sentinel and test for that instead of using
try...except:

>>> from random import randrange
>>> from functools import partial
>>> def g():
... return iter(partial(randrange, 3), 2)
...
>>> next(g(), "empty")
1
>>> next(g(), "empty")
1
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
'empty'
>>> next(g(), "empty")
0

Peter

Steven D'Aprano

unread,

Dec 31, 2009, 6:44:29 AM12/31/09

to

On Wed, 30 Dec 2009 23:20:06 -0500, Benjamin Kaplan wrote:

>>> I used to have that a lot in cases where not finding at least one
>>> valid foo is an actual fatal error.
>>
>> What's wrong with the obvious solution?
>>
>> if not any(foo for foo in foos if foo.bar):
>> raise ValueError('need at least one valid foo')
>
> That would require 2 iterations through foos- once in the test, once for
> the assignment if successful.

Remember though that any is a lazy test: it returns as soon as it gets a
result. In the case of an empty list, it returns immediately with False,
and in the case of a non-empty list, it returns immediately it reaches a
true item. It doesn't matter if there are twenty thousand items, it will
only look at the first so long as it is true.

Which of course answers my own question... what's wrong with using any is
that it fails if the objects are all considered false in a boolean
context, or if they might be. That means it will work for some objects
(e.g. the re module's MatchObject instances which are always true), but
not for arbitrary objects which may be false.

--
Steven

Tom Machinski

unread,

Dec 31, 2009, 2:34:39 PM12/31/09

to Steven D'Aprano, pytho...@python.org

On Wed, Dec 30, 2009 at 4:01 PM, Steven D'Aprano
<st...@remove-this-cybersource.com.au> wrote:
> On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote:
>> Bottom line, I'm going to have to remove this pattern from my code:
>>
>> foo = (foo for foo in foos if foo.bar).next()
>
> I don't see why. What's wrong with it? Unless you embed it in a call to
> list, or similar, it will explicitly raise StopIteration as expected.

Exactly; this seems innocuous, but if some caller of this code uses it
in a list() constructor, a very subtle and dangerous bug is introduced
- see OP. This is the entire point of this post.

In a large, non-trivial application, you simply cannot afford the
assumption that no caller will ever do that. Even if you have perfect
memory, some of your other developers or library users may not.

As for what's wrong with the "if not any" solution, Benjamin Kaplan's
post hits the nail on its head. This is a bioinformatics application,
so the iterable "foos" tends to be very large, so saving half the
runtime makes a big difference.

-- Tom

Tom Machinski

unread,

Dec 31, 2009, 2:42:37 PM12/31/09

to Peter Otten, pytho...@python.org

On Thu, Dec 31, 2009 at 1:54 AM, Peter Otten <__pet...@web.de> wrote:
> Somewhat related in 2.6 there's the next() built-in which accepts a default
> value. You can provide a sentinel and test for that instead of using
> try...except:

Thanks. This can be useful in some of the simpler cases. As you surely
realize, to be perfectly safe, especially when the iterable can
contain any value (including your sentinel), we must use an
out-of-band return value, hence an exception is the only truly safe
solution.

-- Tom

Tom Machinski

unread,

Dec 31, 2009, 7:28:09 PM12/31/09

to Stephen Hansen, pytho...@python.org

On Thu, Dec 31, 2009 at 12:18 PM, Stephen Hansen <apt.s...@gmail.com> wrote:
> Hmm? Just use a sentinel which /can't/ exist in the list: then its truly
> safe. If the list can contain all the usual sort of sentinels (False, None,
> 0, -1, whatever), then just make a unique one all your own.
> sentinel = object()
> if next(g(), sentinel) is sentinel:
> ...
> Its impossible to get a false-positive then, as nothing g() can ever produce
> would ever be precisely "sentinel" (which would usually for me be some
> global const if I need to do such things in multiple places).
> --S

That's not a bad idea. Another nice feature is support for callable
"default" values; it would make several new things easier, including
raising an exception when you really want that (i.e. if not finding a
single element is truly exceptional).

-- Tom

Steven D'Aprano

unread,

Dec 31, 2009, 8:47:17 PM12/31/09

to

On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote:

> On Wed, Dec 30, 2009 at 4:01 PM, Steven D'Aprano
> <st...@remove-this-cybersource.com.au> wrote:
>> On Wed, 30 Dec 2009 15:18:11 -0800, Tom Machinski wrote:
>>> Bottom line, I'm going to have to remove this pattern from my code:
>>>
>>> foo = (foo for foo in foos if foo.bar).next()
>>
>> I don't see why. What's wrong with it? Unless you embed it in a call to
>> list, or similar, it will explicitly raise StopIteration as expected.
>
> Exactly; this seems innocuous, but if some caller of this code uses it
> in a list() constructor, a very subtle and dangerous bug is introduced -
> see OP. This is the entire point of this post.

Then don't use it in a list() constructor.

That's a glib answer, of course. A better answer is to point out that the
problem is not with the above expression, but with letting StopIteration
bubble up as an error exception instead of dealing with it immediately.
That's not what it's for, and you can't trust it not to be captured by
something. If StopIteration represents an error condition, you need to
deal with it immediately and convert it to an exception which isn't
likely to disappear.

> In a large, non-trivial application, you simply cannot afford the
> assumption that no caller will ever do that. Even if you have perfect
> memory, some of your other developers or library users may not.

You shouldn't put the responsibility of dealing with the StopIteration on
the caller, because StopIteraction is a signal not an error condition,
and you can't tell when that signal will disappear. The responsibility
lies on the writer of the function containing the line (that is, the
Original Poster of this thread).

So you need something like this:

def my_function():
try:

foo = (foo for foo in foos if foo.bar).next()

except StopIteration:
handle_empty_foos()
else:
handle_at_least_one_foo()

handle_empty_foos may be as simple as raising a new exception and letting
that bubble up to whatever layer of the application is expected to deal
with it.

> As for what's wrong with the "if not any" solution, Benjamin Kaplan's
> post hits the nail on its head. This is a bioinformatics application, so
> the iterable "foos" tends to be very large, so saving half the runtime
> makes a big difference.

Possibly you haven't seen my reply to Benjamin, so I'll paraphrase:
that's incorrect, because any() is lazy and will return as soon as it
hits a non-false item. See the docs:

http://docs.python.org/library/functions.html#any

If the foo items are considered true (e.g. non-empty strings), then you
can guarantee that any() will return on the very first item.

If the foo items are arbitrary objects which have an equal chance of
being considered true or false, then on average it will have to look at
half the list, which is O(N) and may be a tad expensive for large N. But
how likely is that? One has to be realistic here, and consider the type
of data you realistically need to deal with and not pathological cases.
There's no limit to the problems you may have with sufficiently
pathological data:

class Evil(object):
@property
def bar(self):
import time
time.sleep(1e8)
return True

foos = [Evil(), "a", "b", "c", "d"]

foo = (foo for foo in foos if foo.bar).next()

any() is the standard, idiomatic solution for solving this sort of
problem. Before rejecting it on the basis of slowness, you need to
determine that long runs of false items ahead of the first true item is a
realistic scenario, and that calling any() really is a bottleneck.
Anything less is premature optimization.

--
Steven

Wolfram Hinderer

unread,

Jan 1, 2010, 8:19:02 AM1/1/10

to

On 1 Jan., 02:47, Steven D'Aprano <st...@REMOVE-THIS-

Tom's point is that
if not any(foo for foo in foos if foo.bar):

foo = (foo for foo in foos if foo.bar).next()

iterates twice over (the same first few elements of) foos, which
should take about twice as long as iterating once. The lazyness of
"any" does not seem to matter here.
Of course, you're right that the iteration might or might not be the
bottleneck. On the other hand, foos might not even be reiterable.

> If the foo items are arbitrary objects which have an equal chance of
> being considered true or false, then on average it will have to look at
> half the list,

By which definition of chance? :-)

Wolfram

Steven D'Aprano

unread,

Jan 1, 2010, 9:42:29 AM1/1/10

to

On Fri, 01 Jan 2010 05:19:02 -0800, Wolfram Hinderer wrote:

> On 1 Jan., 02:47, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote:

[...]

>> > As for what's wrong with the "if not any" solution, Benjamin Kaplan's
>> > post hits the nail on its head. This is a bioinformatics application,
>> > so the iterable "foos" tends to be very large, so saving half the
>> > runtime makes a big difference.
>>
>> Possibly you haven't seen my reply to Benjamin, so I'll paraphrase:
>> that's incorrect, because any() is lazy and will return as soon as it
>> hits a non-false item.
>
> Tom's point is that
> if not any(foo for foo in foos if foo.bar):
> foo = (foo for foo in foos if foo.bar).next()
> iterates twice over (the same first few elements of) foos, which should
> take about twice as long as iterating once. The lazyness of "any" does
> not seem to matter here.

That's no different from any "Look Before You Leap" idiom. If you do this:

if key in dict:
x = dict[key]

you search the dict twice, once to see if the key is there, and the
second time to fetch the value. Whether that is better or faster than the
alternative:

try:
x = dict[key]
except KeyError:
pass

depends on how often you expect the lookup to fail.

In any case, I would claim that Tom's argument is a classic example of
premature optimization: by his own admission:

'the iterable "foos" tends to be very large'

which implies that whatever happens to the foos after this test, it will
probably be very time consuming. If it takes (for the sake of the
argument) 10 milliseconds to process the entire iterable, who cares
whether it takes 0.01 or 0.02 ms to check that the iterable is valid?

> Of course, you're right that the iteration might or might not be the
> bottleneck. On the other hand, foos might not even be reiterable.

If that's the case, then the existing solution potentially throws away
the first value of foos every time the caller tests to see if it is empty.

Dealing with non-reiterable iterators can be a nuisance. In such a case,
it may be best to avoid Look Before You Leap altogether:

empty = True
for foo in foos:
if foo.bar:
empty = False
process(foo)
if empty:
handle_error_condition()

--
Steven

Martin v. Loewis

unread,

Jan 2, 2010, 3:17:59 PM1/2/10

to Tom Machinski

>> Bottom line, I'm going to have to remove this pattern from my code:
>>
>> foo = (foo for foo in foos if foo.bar).next()

I recommend to rewrite this like so:

def first(gen):
try:
return gen.next()
except StopIteration:
raise ValueError, "No first value"

foo = first(foo for foo in foos if foo.bar)

As others have said: don't let StopIteration appear unexpectedly;
IOW, consume generators right away in a loop construct (where
this first function is a loop construct as well). A different
way of writing it would be

def first(gen):
for value in gen:
return value
raise ValueError, "empty collection"

Regards,
Martin

Martin v. Loewis

unread,

Jan 2, 2010, 3:27:54 PM1/2/10

to exa...@twistedmatrix.com

> I'm asking about why the behavior of a StopIteration exception being
> handled from the `expression` of a generator expression to mean "stop
> the loop" is accepted by "the devs" as acceptable.

I may be late to this discussion, but the answer is "most definitely
yes". *Any* exception leads to termination of the iterator, and
StopIteration is no different:

py> def stop(e):
... def f():
... raise e
... return f
...
py> g = (f() for f in (lambda:1,stop(StopIteration),lambda:2))
py> g.next
<method-wrapper 'next' of generator object at 0xb7960fac>
py> g.next()
1
py> g.next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "<stdin>", line 1, in <genexpr>
File "<stdin>", line 3, in f
StopIteration
py> g.next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

StopIteration
py> g = (f() for f in (lambda:1,stop(ValueError),lambda:2))
py> g.next()
1
py> g.next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "<stdin>", line 1, in <genexpr>
File "<stdin>", line 3, in f
ValueError
py> g.next()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

StopIteration

Regards,
Martin