Consume an iterable

16 views
Skip to first unread message

Muhammad Alkarouri

unread,
Jan 22, 2010, 9:13:30 AM1/22/10
to
In the python help for itertools, the following function is provided:

def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume
entirely."
collections.deque(islice(iterator, n), maxlen=0)

What is the advantage of using a collections.deque against, say, the
following code?

def consume(iterator, n):
for _ in islice(iterator, n): pass


Regards,

Muhammad Alkarouri

Arnaud Delobelle

unread,
Jan 22, 2010, 10:07:19 AM1/22/10
to
Muhammad Alkarouri <malka...@gmail.com> writes:

deque is written in C, and so is islice, so it's less work for the
bytecode interpreter?

--
Arnaud

Arnaud Delobelle

unread,
Jan 22, 2010, 10:24:33 AM1/22/10
to
Muhammad Alkarouri <malka...@gmail.com> writes:

I remember suggesting this function (under the name of 'exhaust'!) a
while ago - although I don't think there was an optional parameter for
the number of elements consumed. I hadn't realised it had been deemed
useful by others!

--
Arnaud

Jan Kaliszewski

unread,
Jan 22, 2010, 10:25:33 AM1/22/10
to pytho...@python.org

Probably the former is faster. But I haven't check it. If you are curious,
use timeit module...

Regards,
*j

--
Jan Kaliszewski (zuo) <z...@chopin.edu.pl>

Raymond Hettinger

unread,
Jan 22, 2010, 2:11:21 PM1/22/10
to

The deque version is faster.


Raymond


Muhammad Alkarouri

unread,
Jan 23, 2010, 6:23:17 AM1/23/10
to
Thanks everyone, but not on my machine (Python 2.6.1, OS X 10.6) it's
not:


In [1]: from itertools import count, islice

In [2]: from collections import deque

In [3]: i1=count()

In [4]: def consume1(iterator, n):
...: deque(islice(iterator, n), maxlen=0)
...:
...:

In [5]: i2=count()

In [6]: def consume2(iterator, n):
...: for _ in islice(iterator, n): pass
...:
...:

In [7]: timeit consume1(i1, 10)
1000000 loops, best of 3: 1.63 us per loop

In [8]: timeit consume2(i2, 10)
1000000 loops, best of 3: 846 ns per loop


Can somebody please test whether it is only my machine or is this
reproducible?

(Thanks Jan for making me actually carry the test)

Peter Otten

unread,
Jan 23, 2010, 7:45:43 AM1/23/10
to
Muhammad Alkarouri wrote:

I can reproduce it. The deque-based approach has a bigger constant overhead
but better per-item performance. Its asymptotical behaviour is therefore
better.

$ python consume_timeit.py
consume_deque
10: 1.77500414848
100: 3.73333001137
1000: 24.7235469818

consume_forloop
10: 1.22008490562
100: 5.86271500587
1000: 52.2449371815

consume_islice
10: 0.897439956665
100: 1.51542806625
1000: 7.70061397552

$ cat consume_timeit.py
from collections import deque
from itertools import islice, repeat

def consume_deque(n, items):
deque(islice(items, n), maxlen=0)

def consume_forloop(n, items):
for _ in islice(items, n):
pass

def consume_islice(n, items):
next(islice(items, n-1, None), None)

def check(fs):
for consume in fs:
items = iter(range(10))
consume(3, items)
rest = list(items)
assert rest == range(3, 10), consume.__name__

if __name__ == "__main__":
fs = consume_deque, consume_forloop, consume_islice
check(fs)


items = repeat(None)

from timeit import Timer
for consume in fs:
print consume.__name__
for n in (10, 100, 1000):
print "%6d:" % n,
print Timer("consume(%s, items)" % n,
"from __main__ import consume, items").timeit()
print
$

With next(islice(...), None) I seem to have found a variant that beats both
competitors.

Peter

Duncan Booth

unread,
Jan 23, 2010, 7:58:44 AM1/23/10
to
Peter Otten <__pet...@web.de> wrote:

> With next(islice(...), None) I seem to have found a variant that beats
> both competitors.
>

It has different behaviour for n==0 but I'm sure that's easily fixed.

Muhammad Alkarouri

unread,
Jan 23, 2010, 8:14:34 AM1/23/10
to

Thanks Peter, I got more or less the same result on my machine (Python
2.6.1, x86_64, OS X 10.6):

~/tmp> python consume_timeit.py
consume_deque
10: 1.3138859272
100: 3.54495286942
1000: 24.9603481293

consume_forloop
10: 0.658113002777
100: 2.85697007179
1000: 24.6610429287

consume_islice
10: 0.637741088867
100: 1.09042882919
1000: 5.44473600388

The next function performs much better. It is also much more direct
for the purposes of consume and much more understandable (at least for
me) as it doesn't require a specialized data structure which is
subsequently not used as such.
I am thus inclined to report it as a python documentation enhancement
(bug) request. Any comments?

Cheers,

Muhammad

Peter Otten

unread,
Jan 23, 2010, 8:13:37 AM1/23/10
to pytho...@python.org
Duncan Booth wrote:

"Different behaviour" being a euphemism for broken ;)

def consume_islice(n, items):
if n == 0:
return
next(islice(items, n-1, None), None)

Peter

Peter Otten

unread,
Jan 23, 2010, 8:15:25 AM1/23/10
to
Duncan Booth wrote:

"Different behaviour" being a euphemism for broken ;)

Peter Otten

unread,
Jan 23, 2010, 8:32:37 AM1/23/10
to
Muhammad Alkarouri wrote:

> The next function performs much better. It is also much more direct
> for the purposes of consume and much more understandable (at least for
> me) as it doesn't require a specialized data structure which is
> subsequently not used as such.
> I am thus inclined to report it as a python documentation enhancement
> (bug) request. Any comments?

I would support that as a the deque(..., maxlen=0) trick is a bit too clever
for my taste.

Peter

PS: Remember to include the bug fix for n=0 if you proceed.

Peter Otten

unread,
Jan 23, 2010, 8:46:16 AM1/23/10
to
Peter Otten wrote:

Even better:

def consume_islice(n, items):
next(islice(items, n, n), None)

Peter

Muhammad Alkarouri

unread,
Jan 23, 2010, 8:55:12 AM1/23/10
to

Done. http://bugs.python.org/issue7764

Thanks for all the effort.

Muhammad

Muhammad Alkarouri

unread,
Jan 23, 2010, 9:06:32 AM1/23/10
to
On 23 Jan, 13:46, Peter Otten <__pete...@web.de> wrote:
> Peter Otten wrote:
> > Duncan Booth wrote:
>
> >> Peter Otten <__pete...@web.de> wrote:
>
> >>> With next(islice(...), None) I seem to have found a variant that beats
> >>> both  competitors.
>
> >> It has different behaviour for n==0 but I'm sure that's easily fixed.
>
> > "Different behaviour" being a euphemism for broken ;)
>
> > def consume_islice(n, items):
> >     if n == 0:
> >         return
> >     next(islice(items, n-1, None), None)
>
> Even better:
>
> def consume_islice(n, items):
>     next(islice(items, n, n), None)
>
> Peter

I submitted the bug report before considering this alternative, which
is better. I may add this later in the day, but I have to wait a
little as it seems you are going to optimize/improve the function
almost out of existence:)
If you are happy with this one, and can add the comment on the issue
(http://bugs.python.org/issue7764) yourself, please do so.

What I can say is, I am definitely very happy that I asked the
question. You live and learn:)

Cheers,

Muhammad

Peter Otten

unread,
Jan 23, 2010, 9:19:56 AM1/23/10
to
Muhammad Alkarouri wrote:

> On 23 Jan, 13:46, Peter Otten <__pete...@web.de> wrote:

>> def consume_islice(n, items):
>> next(islice(items, n, n), None)

> I submitted the bug report before considering this alternative, which


> is better. I may add this later in the day, but I have to wait a
> little as it seems you are going to optimize/improve the function
> almost out of existence:)

One problem: the above function doesn't consume the entire iterator like the
original example does for n=None. Passing sys.maxint instead is not pretty.

Peter

Jan Kaliszewski

unread,
Jan 24, 2010, 10:34:47 AM1/24/10
to pytho...@python.org
Dnia 23-01-2010 o 15:19:56 Peter Otten <__pet...@web.de> napisał(a):

>>> def consume_islice(n, items):
>>> next(islice(items, n, n), None)
>

> One problem: the above function doesn't consume the entire iterator like
> the original example does for n=None. Passing sys.maxint instead is not
> pretty.

Not very pretty, but noticeably (though not dramatically) faster for
n=None. Consider a modified version of the script from
http://bugs.python.org/issue7764:

import collections, sys


from itertools import islice, repeat

def consume0(iterator, n): # the old one
collections.deque(islice(iterator, n), maxlen=0)

def consume1(iterator, n): # similar to the primary proposal
if n is None:
collections.deque(iterator, maxlen=0)
elif n != 0:
next(islice(iterator, n-1, None), None)

def consume2(iterator, n): # the approved proposal (see #7764)
if n is None:
collections.deque(iterator, maxlen=0)
else:
next(islice(iterator, n, n), None)

def consume3(iterator, n): # with sys.maxint
if n is None:
n = sys.maxint # (maybe should be sys.maxsize instead?)
next(islice(iterator, n, n), None)

def test(fs):
for consume in fs:
iterator = iter(range(10))
consume(iterator, 3)
rest = list(iterator)


assert rest == range(3, 10), consume.__name__

iterator = iter(range(10))
consume(iterator, 0)
rest = list(iterator)
assert rest == range(10), consume.__name__

iterator = iter(range(10))
consume(iterator, None)
rest = list(iterator)
assert rest == [], consume.__name__

if __name__ == "__main__":
from timeit import Timer

fs = (consume0, consume1,
consume2, consume3)
test(fs)

iterator = repeat(None, 1000)


for consume in fs:
print consume.__name__

for n in (10, 100, 1000, None):
print "%6s:" % n,
print Timer("consume(iterator, %s)" % n,
"import collections, sys\n"
"from __main__ import consume,
iterator").timeit()
print


Results [Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3]
on linux2 pentium4 2.4 GHz]:

consume0
10: 2.94313001633
100: 2.91833305359
1000: 2.93242096901
None: 2.90090417862

consume1
10: 1.80793309212
100: 1.7936270237
1000: 1.83439803123
None: 2.37652015686

consume2
10: 1.58784389496
100: 1.5890610218
1000: 1.58557391167
None: 2.37005710602

consume3
10: 1.6071870327
100: 1.61109304428
1000: 1.60717701912
None: 1.81885385513

Peter Otten

unread,
Jan 24, 2010, 11:05:58 AM1/24/10
to
Jan Kaliszewski wrote:

Don't the results look suspicious to you? Try measuring with

iterator = iter([])

I'm sure you'll get the same result. An "easy" fix which introduces some
constant overhead but keeps the results comparable:

for consume in fs:
print consume.__name__
for n in (10, 100, 1000, None):
print "%6s:" % n,

print Timer("consume(repeat(None, 1000), %s)" % n,
"import collections, sys\n"
"from __main__ import consume, repeat").timeit()
print

Just for fun, here's a variant of consume3 for the paranoid:

_sentinel = object()
def consume4(iterator, n):


if n is None:
n = sys.maxint

while next(islice(iterator, n, n), _sentinel) is not _sentinel:
pass

Peter

Jan Kaliszewski

unread,
Jan 24, 2010, 11:49:32 AM1/24/10
to pytho...@python.org
> Don't the results look suspicious to you? Try measuring with
>
> iterator = iter([])

You are obviously right, my brain doesn't work well today :-(

But the corret results even more distinctly support my thesis -- that for
n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
deque (consume1 and consume2):

consume0
10: 4.06217813492
100: 8.45529103279
1000: 53.237226963
None: 54.0063519478

consume1
10: 2.59927105904
100: 3.47109603882
1000: 12.7196500301
None: 26.0995740891

consume2
10: 2.36225390434
100: 3.22979712486
1000: 12.5794699192
None: 28.5096430779

consume3
10: 2.39173388481
100: 3.43043398857
1000: 14.3361399174
None: 14.8560190201

Peter Otten

unread,
Jan 24, 2010, 12:24:49 PM1/24/10
to
Jan Kaliszewski wrote:

> But the corret results even more distinctly support my thesis -- that for
> n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
> deque (consume1 and consume2):

That advantage may not survive the next release:

http://svn.python.org/view/python/trunk/Modules/_collectionsmodule.c?r1=68145&r2=70296&pathrev=70296

Peter

Jan Kaliszewski

unread,
Jan 24, 2010, 1:01:59 PM1/24/10
to pytho...@python.org
24-01-2010, 18:24:49 Peter Otten <__pet...@web.de> wrote:

>> n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
>> deque (consume1 and consume2):
>
> That advantage may not survive the next release:
>
> http://svn.python.org/view/python/trunk/Modules/_collectionsmodule.c?r1=68145&r2=70296&pathrev=70296

Nice. :)

Raymond Hettinger

unread,
Jan 24, 2010, 2:57:01 PM1/24/10
to
>      def consume2(iterator, n):  # the approved proposal (see #7764)
>          if n is None:
>              collections.deque(iterator, maxlen=0)
>          else:
>              next(islice(iterator, n, n), None)

FWIW, the deque() approach becomes even faster in Py2.7 and Py3.1
which has a high-speed path for the case where maxlen is zero.
Here's a snippet from Modules/_collectionsmodule.c:

/* Run an iterator to exhaustion. Shortcut for
the extend/extendleft methods when maxlen == 0. */
static PyObject*
consume_iterator(PyObject *it)
{
PyObject *item;

while ((item = PyIter_Next(it)) != NULL) {
Py_DECREF(item);
}
Py_DECREF(it);
if (PyErr_Occurred())
return NULL;
Py_RETURN_NONE;
}


This code consumes an iterator to exhaustion.
It is short, sweet, and hard to beat.


Raymond

Paul Rubin

unread,
Jan 24, 2010, 3:45:19 PM1/24/10
to
Raymond Hettinger <pyt...@rcn.com> writes:
> This code consumes an iterator to exhaustion.
> It is short, sweet, and hard to beat.

I've always used sum(1 for x in iterator) or some such.

Peter Otten

unread,
Jan 24, 2010, 6:21:36 PM1/24/10
to
Raymond Hettinger wrote:

> FWIW, the deque() approach becomes even faster in Py2.7 and Py3.1
> which has a high-speed path for the case where maxlen is zero.
> Here's a snippet from Modules/_collectionsmodule.c:
>
> /* Run an iterator to exhaustion. Shortcut for
> the extend/extendleft methods when maxlen == 0. */
> static PyObject*
> consume_iterator(PyObject *it)
> {
> PyObject *item;
>
> while ((item = PyIter_Next(it)) != NULL) {
> Py_DECREF(item);
> }
> Py_DECREF(it);
> if (PyErr_Occurred())
> return NULL;
> Py_RETURN_NONE;
> }
>
>
> This code consumes an iterator to exhaustion.
> It is short, sweet, and hard to beat.

islice() is still a tad faster. A possible optimization:

static PyObject*
consume_iterator(PyObject *it)
{
PyObject *item;

PyObject *(*iternext)(PyObject *);

iternext = *Py_TYPE(it)->tp_iternext;

while ((item = iternext(it)) != NULL) {
Py_DECREF(item);
}
Py_DECREF(it);
if (PyErr_Occurred()) {
if(PyErr_ExceptionMatches(PyExc_StopIteration))
PyErr_Clear();
else
return NULL;
}
Py_RETURN_NONE;
}

Before:

$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "next(islice(repeat(None,
1000), maxint, maxint), None)"
100000 loops, best of 3: 6.49 usec per loop

$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "deque(repeat(None, 1000),
maxlen=0)"
100000 loops, best of 3: 9.93 usec per loop

After:

$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "deque(repeat(None, 1000),
maxlen=0)"
100000 loops, best of 3: 6.31 usec per loop

Peter

PS: Two more, for Paul Rubin:

$ ./python -m timeit -s"from itertools import repeat" "sum(0 for _ in
repeat(None, 1000))"
10000 loops, best of 3: 125 usec per loop
$ ./python -m timeit -s"from itertools import repeat" "sum(0 for _ in
repeat(None, 1000) if 0)"
10000 loops, best of 3: 68.3 usec per loop


Reply all
Reply to author
Forward
0 new messages