Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

bool evaluations of generators vs lists

4 views
Skip to first unread message

Josh Dukes

unread,
Feb 10, 2009, 2:15:14 PM2/10/09
to pytho...@python.org
quite simply...what???

In [108]: bool([ x for x in range(10) if False ])
Out[108]: False

In [109]: bool( x for x in range(10) if False )
Out[109]: True

Why do these two evaluate differently? I was expecting that they would
evaluate the same but the generator would return true *as soon as the
first value is detected*. I'd really expect it to act more like...

def has_values(g):
for i in g:
return True
return False

So what's going on here? Am I using the wrong function or is this
actually just a bug?

--

Josh Dukes
MicroVu IT Department

Chris Rebert

unread,
Feb 10, 2009, 2:38:42 PM2/10/09
to Josh Dukes, pytho...@python.org
On Tue, Feb 10, 2009 at 11:15 AM, Josh Dukes <josh....@microvu.com> wrote:
> quite simply...what???
>
> In [108]: bool([ x for x in range(10) if False ])
> Out[108]: False

This evaluates the list comprehension and creates an empty list, which
is considered boolean False by Python.

> In [109]: bool( x for x in range(10) if False )
> Out[109]: True

Whereas this creates a /generator object/, whose inner expression is
*not evaluated until specifically required* (e.g. by for-looping over
the generator object). Generators don't have a specially defined
truthiness critera (it's impossible to define generally -- consider
something like `x for x in all_integers if f(x)`, for a complicated
f(x), which would require a solution to the halting problem to know in
advance if it will have a non-zero length), so they end up using the
default behavior for objects with otherwise undefined boolean truth,
which is to consider them True.

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com

Albert Hopkins

unread,
Feb 10, 2009, 2:39:30 PM2/10/09
to pytho...@python.org
On Tue, 2009-02-10 at 11:15 -0800, Josh Dukes wrote:
> quite simply...what???
>
> In [108]: bool([ x for x in range(10) if False ])
> Out[108]: False
>
> In [109]: bool( x for x in range(10) if False )
> Out[109]: True
>
> Why do these two evaluate differently? I was expecting that they would
> evaluate the same but the generator would return true *as soon as the
> first value is detected*. I'd really expect it to act more like...

The first example is a list. A list of length 0 evaluates to False.

The second example returns a generator object. A generator object
apparently evaluates to true. Your example is not iterating of their
values of the generator, but evaluating bool(generator_object) itself.
My feeling is that bool(generator_object) is ambiguous so shouldn't be
used to begin with.

In both examples, bool() doesn't actually iterate over the arguments.
Maybe that's what confused you.

Instead look at this:

>>> type([x for x in range(10) if False ])
<type 'list'>
>>> type((x for x in range(10) if False ))
<type 'generator'>

> def has_values(g):
> for i in g:
> return True
> return False
>
> So what's going on here? Am I using the wrong function or is this
> actually just a bug?

bool != has_values. Check python.org for how Python determines the
"truthiness" of an object. Generally speaking the following evaluate to
False:

* None
* False
* zero of any numeric type, for example, 0, 0L, 0.0, 0j.
* any empty sequence, for example, '', (), [].
* any empty mapping, for example, {}.
* instances of user-defined classes, if the class defines a
__nonzero__() or __len__() method, when that method returns the
integer zero or bool value False.

All other values are considered true -- so objects of many types are
always true.


Josh Dukes

unread,
Feb 10, 2009, 3:50:02 PM2/10/09
to pytho...@python.org
> The first example is a list. A list of length 0 evaluates to False.
>
> The second example returns a generator object. A generator object
> apparently evaluates to true. Your example is not iterating of their
> values of the generator, but evaluating bool(generator_object) itself.
> My feeling is that bool(generator_object) is ambiguous so shouldn't be
> used to begin with.

I was actually aware of that (thank you, though, for trying to help).
What I was not clear on was if the boolean evaluation is a method of an
object that can be modified via operatior overloading (in the same way
+ is actually .__add__()) or not. Clearly __nonzero__ is the operator I
was curious about. Thanks for that info.

> bool != has_values. Check python.org for how Python determines the
> "truthiness" of an object. Generally speaking the following evaluate
> to False:
>
> * None
> * False
> * zero of any numeric type, for example, 0, 0L, 0.0, 0j.
> * any empty sequence, for example, '', (), [].
> * any empty mapping, for example, {}.
> * instances of user-defined classes, if the class defines a
> __nonzero__() or __len__() method, when that method returns
> the integer zero or bool value False.
>
> All other values are considered true -- so objects of many types are
> always true.

The thing I don't understand is why a generator that has no iterable
values is different from an empty list. Why shouldn't bool ==
has_value?? Technically a list, a tuple, and a string are also objects
but if they lack values they're evaluated as False. It seems to me that
a generator is an object that intends to replace lists where lazy
evaluation would be more efficent. Here is one place where that's
definitely true.
The main reason I'm interested in this is that it improves performance
immensely over boolean evaluation of large lists (as the attached code
shows). It seems to me if I could use find a good use for it in my
experimentation that someone else might also want to do the same thing
in real-world code.

Is there another list I should be asking these questions on?

prime.py

Albert Hopkins

unread,
Feb 10, 2009, 4:25:47 PM2/10/09
to pytho...@python.org
On Tue, 2009-02-10 at 12:50 -0800, Josh Dukes wrote:

> The thing I don't understand is why a generator that has no iterable
> values is different from an empty list. Why shouldn't bool ==
> has_value?? Technically a list, a tuple, and a string are also objects
> but if they lack values they're evaluated as False. It seems to me that
> a generator is an object that intends to replace lists where lazy
> evaluation would be more efficent. Here is one place where that's
> definitely true.

Well, I did not implement generators in python, but my guess would be
that lists and tuples can be checked with len() to see if it is
non-empty. Generators don't have length. You would at least need to
call .next() which "changes" the generator so every time you'd want to
evaluate the boolean of the generator you'd potentially lose the next
item.

Generators are meant to replace lists where you don't want/can't put the
entire "list" in memory or for which there is no (known) end to the
list. You don't know the next value a generator will return (if any)
until you evaluate it. Don't think of generators as containers like
lists, tuples and strings are. Generators don't "contain" values.
Generators are objects that return the "next" value. It has no idea how
many values it contains (it's not a container). It only knows the
".next()" value when it's called. It forgets the value once it's
returned. And it has no idea how far it is in the iteration until it's
finished (StopIteration).

> The main reason I'm interested in this is that it improves performance
> immensely over boolean evaluation of large lists (as the attached code
> shows). It seems to me if I could use find a good use for it in my
> experimentation that someone else might also want to do the same thing
> in real-world code.

I don't understand what you mean by this. But if you really want to
know if a generator is "non-empty":

def non_empty(virgin_generator):
try:
virgin_generator.next() # note you just lost the first value
return True
except StopIteration:
return False

The only way to get around this is to put all the values of a generator
inside a container (e.g. a list):

l = list(generator_object)

but in doing so you've (obviously) lost the advantages of the generator.


> Is there another list I should be asking these questions on?

I don't know. Sorry I wasn't able to help you.

-a


Steven D'Aprano

unread,
Feb 10, 2009, 4:57:56 PM2/10/09
to
On Tue, 10 Feb 2009 12:50:02 -0800, Josh Dukes wrote:

> The thing I don't understand is why a generator that has no iterable
> values is different from an empty list.

How do you know it has no iterable values until you call next() on it and
get StopIteration?

By the way, your "has_values" function is just a slower version of the
built-in any().


--
Steven

Chris Rebert

unread,
Feb 10, 2009, 5:05:19 PM2/10/09
to Steven D'Aprano, pytho...@python.org

<nitpick>
Not quite: if the generator produces one or more elements but those
elements happen to be boolean false according to Python, then any()
will be false but has_values() will be true. The functions serve
different purposes (produces at least 1 value vs. has at least one
true value).

Josh Dukes

unread,
Feb 10, 2009, 5:09:48 PM2/10/09
to pytho...@python.org
ahhh any! ok, yeah, I guess that's what I was looking for. Thanks.


On 10 Feb 2009 21:57:56 GMT


--

Josh Dukes
MicroVu IT Department

Terry Reedy

unread,
Feb 10, 2009, 5:29:23 PM2/10/09
to pytho...@python.org
Josh Dukes wrote:

>
> I was actually aware of that (thank you, though, for trying to help).
> What I was not clear on was if the boolean evaluation is a method of an
> object that can be modified via operatior overloading (in the same way
> + is actually .__add__()) or not. Clearly __nonzero__ is the operator I
> was curious about. Thanks for that info.

.__bool__ in 3.0.


> The thing I don't understand is why a generator that has no iterable

> values is different from an empty list. Why shouldn't bool ==
> has_value?? Technically a list, a tuple, and a string are also objects
> but if they lack values they're evaluated as False. It seems to me that
> a generator is an object that intends to replace lists where lazy
> evaluation would be more efficent. Here is one place where that's
> definitely true.

Generator functions are abbreviated iterator classes. If you want
iterators with more functionality, write an iterator class. In
particular, you can add a .__bool__ method for empty or not or even a
.__len__ method if you can accurately calculate the number of items
remaining.

> The main reason I'm interested in this is that it improves performance
> immensely over boolean evaluation of large lists (as the attached code
> shows). It seems to me if I could use find a good use for it in my
> experimentation that someone else might also want to do the same thing
> in real-world code.

Terry Jan Reedy

Gabriel Genellina

unread,
Feb 11, 2009, 12:07:31 AM2/11/09
to pytho...@python.org
On Tue, 2009-02-10 at 12:50 -0800, Josh Dukes wrote:

>> The thing I don't understand is why a generator that has no iterable
>> values is different from an empty list. Why shouldn't bool ==
>> has_value?? Technically a list, a tuple, and a string are also objects
>> but if they lack values they're evaluated as False. It seems to me that
>> a generator is an object that intends to replace lists where lazy
>> evaluation would be more efficent. Here is one place where that's
>> definitely true.

Just in case it's not perfectly clear: until you call next() there is no
way to know whether the generator will yield any value or not -- and once
it does, it's lost until you explicitely save it.

This generator doesn't yield any value - but you have to wait for a while
if you call .next() on it, until eventually raises StopIteration:
(x for x in xrange(2000000000) if x>100000000000)


En Tue, 10 Feb 2009 19:25:47 -0200, Albert Hopkins
<mar...@letterboxes.org> escribió:

>> The main reason I'm interested in this is that it improves performance
>> immensely over boolean evaluation of large lists (as the attached code
>> shows). It seems to me if I could use find a good use for it in my
>> experimentation that someone else might also want to do the same thing
>> in real-world code.
>

> I don't understand what you mean by this. But if you really want to
> know if a generator is "non-empty":
>
> def non_empty(virgin_generator):
> try:
> virgin_generator.next() # note you just lost the first value
> return True
> except StopIteration:
> return False
>
> The only way to get around this is to put all the values of a generator
> inside a container (e.g. a list):

For a long generator you may not want to do that, also you may not want to
lose the next element. A variation of your function above is useful in
such cases:

py> def end_of_gen(g):
... """returns (False, next_element) when it exists or (True, None) when
it's
empty"""
... try: return False, g.next()
... except StopIteration: return True, None
...
py> g = (c for c in "Python" if c in "aeiou")
py> eog, c = end_of_gen(g)
py> eog
False
py> c
'o'
py> eog, c = end_of_gen(g)
py> eog
True
py> c


--
Gabriel Genellina

Antoon Pardon

unread,
Feb 24, 2009, 2:54:07 AM2/24/09
to

Well he could always use a wrapper like the following:

nothing = object()

class BoolIterator (object):
def __init__(self, iter):
self.iter = iter
self.value = nothing
self.item = nothing

def __iter__(self):
return (self)

def __nonzero__(self):
if self.value is nothing:
try:
self.item = self.iter.next()
self.value = True
return True
except StopIteration:
self.value = False
return False
else:
return self.value

def next(self):
if self.item is not nothing:
result = self.item
self.item = nothing
return result
if self.value is not nothing:
return self.iter.next()
else:
try:
result = self.iter.next()
self.value = True
return result
except StopIteration:
self.value = False
raise

it1 = BoolIterator(i for i in xrange(10))

for i in it1:
print i

it2 = BoolIterator(i for i in xrange(10))

if it2:
print "item found"
for i in it2:
print i
else:
print "problem"

it3 = BoolIterator(i for i in xrange(0))

if it3:
print "problem"
else:
print "it3 is empty"

0 new messages