def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume
entirely."
collections.deque(islice(iterator, n), maxlen=0)
What is the advantage of using a collections.deque against, say, the
following code?
def consume(iterator, n):
for _ in islice(iterator, n): pass
Regards,
Muhammad Alkarouri
deque is written in C, and so is islice, so it's less work for the
bytecode interpreter?
--
Arnaud
I remember suggesting this function (under the name of 'exhaust'!) a
while ago - although I don't think there was an optional parameter for
the number of elements consumed. I hadn't realised it had been deemed
useful by others!
--
Arnaud
Probably the former is faster. But I haven't check it. If you are curious,
use timeit module...
Regards,
*j
--
Jan Kaliszewski (zuo) <z...@chopin.edu.pl>
The deque version is faster.
Raymond
In [1]: from itertools import count, islice
In [2]: from collections import deque
In [3]: i1=count()
In [4]: def consume1(iterator, n):
...: deque(islice(iterator, n), maxlen=0)
...:
...:
In [5]: i2=count()
In [6]: def consume2(iterator, n):
...: for _ in islice(iterator, n): pass
...:
...:
In [7]: timeit consume1(i1, 10)
1000000 loops, best of 3: 1.63 us per loop
In [8]: timeit consume2(i2, 10)
1000000 loops, best of 3: 846 ns per loop
Can somebody please test whether it is only my machine or is this
reproducible?
(Thanks Jan for making me actually carry the test)
I can reproduce it. The deque-based approach has a bigger constant overhead
but better per-item performance. Its asymptotical behaviour is therefore
better.
$ python consume_timeit.py
consume_deque
10: 1.77500414848
100: 3.73333001137
1000: 24.7235469818
consume_forloop
10: 1.22008490562
100: 5.86271500587
1000: 52.2449371815
consume_islice
10: 0.897439956665
100: 1.51542806625
1000: 7.70061397552
$ cat consume_timeit.py
from collections import deque
from itertools import islice, repeat
def consume_deque(n, items):
deque(islice(items, n), maxlen=0)
def consume_forloop(n, items):
for _ in islice(items, n):
pass
def consume_islice(n, items):
next(islice(items, n-1, None), None)
def check(fs):
for consume in fs:
items = iter(range(10))
consume(3, items)
rest = list(items)
assert rest == range(3, 10), consume.__name__
if __name__ == "__main__":
fs = consume_deque, consume_forloop, consume_islice
check(fs)
items = repeat(None)
from timeit import Timer
for consume in fs:
print consume.__name__
for n in (10, 100, 1000):
print "%6d:" % n,
print Timer("consume(%s, items)" % n,
"from __main__ import consume, items").timeit()
print
$
With next(islice(...), None) I seem to have found a variant that beats both
competitors.
Peter
> With next(islice(...), None) I seem to have found a variant that beats
> both competitors.
>
It has different behaviour for n==0 but I'm sure that's easily fixed.
Thanks Peter, I got more or less the same result on my machine (Python
2.6.1, x86_64, OS X 10.6):
~/tmp> python consume_timeit.py
consume_deque
10: 1.3138859272
100: 3.54495286942
1000: 24.9603481293
consume_forloop
10: 0.658113002777
100: 2.85697007179
1000: 24.6610429287
consume_islice
10: 0.637741088867
100: 1.09042882919
1000: 5.44473600388
The next function performs much better. It is also much more direct
for the purposes of consume and much more understandable (at least for
me) as it doesn't require a specialized data structure which is
subsequently not used as such.
I am thus inclined to report it as a python documentation enhancement
(bug) request. Any comments?
Cheers,
Muhammad
"Different behaviour" being a euphemism for broken ;)
def consume_islice(n, items):
if n == 0:
return
next(islice(items, n-1, None), None)
Peter
"Different behaviour" being a euphemism for broken ;)
> The next function performs much better. It is also much more direct
> for the purposes of consume and much more understandable (at least for
> me) as it doesn't require a specialized data structure which is
> subsequently not used as such.
> I am thus inclined to report it as a python documentation enhancement
> (bug) request. Any comments?
I would support that as a the deque(..., maxlen=0) trick is a bit too clever
for my taste.
Peter
PS: Remember to include the bug fix for n=0 if you proceed.
Even better:
def consume_islice(n, items):
next(islice(items, n, n), None)
Peter
I submitted the bug report before considering this alternative, which
is better. I may add this later in the day, but I have to wait a
little as it seems you are going to optimize/improve the function
almost out of existence:)
If you are happy with this one, and can add the comment on the issue
(http://bugs.python.org/issue7764) yourself, please do so.
What I can say is, I am definitely very happy that I asked the
question. You live and learn:)
Cheers,
Muhammad
> On 23 Jan, 13:46, Peter Otten <__pete...@web.de> wrote:
>> def consume_islice(n, items):
>> next(islice(items, n, n), None)
> I submitted the bug report before considering this alternative, which
> is better. I may add this later in the day, but I have to wait a
> little as it seems you are going to optimize/improve the function
> almost out of existence:)
One problem: the above function doesn't consume the entire iterator like the
original example does for n=None. Passing sys.maxint instead is not pretty.
Peter
>>> def consume_islice(n, items):
>>> next(islice(items, n, n), None)
>
> One problem: the above function doesn't consume the entire iterator like
> the original example does for n=None. Passing sys.maxint instead is not
> pretty.
Not very pretty, but noticeably (though not dramatically) faster for
n=None. Consider a modified version of the script from
http://bugs.python.org/issue7764:
import collections, sys
from itertools import islice, repeat
def consume0(iterator, n): # the old one
collections.deque(islice(iterator, n), maxlen=0)
def consume1(iterator, n): # similar to the primary proposal
if n is None:
collections.deque(iterator, maxlen=0)
elif n != 0:
next(islice(iterator, n-1, None), None)
def consume2(iterator, n): # the approved proposal (see #7764)
if n is None:
collections.deque(iterator, maxlen=0)
else:
next(islice(iterator, n, n), None)
def consume3(iterator, n): # with sys.maxint
if n is None:
n = sys.maxint # (maybe should be sys.maxsize instead?)
next(islice(iterator, n, n), None)
def test(fs):
for consume in fs:
iterator = iter(range(10))
consume(iterator, 3)
rest = list(iterator)
assert rest == range(3, 10), consume.__name__
iterator = iter(range(10))
consume(iterator, 0)
rest = list(iterator)
assert rest == range(10), consume.__name__
iterator = iter(range(10))
consume(iterator, None)
rest = list(iterator)
assert rest == [], consume.__name__
if __name__ == "__main__":
from timeit import Timer
fs = (consume0, consume1,
consume2, consume3)
test(fs)
iterator = repeat(None, 1000)
for consume in fs:
print consume.__name__
for n in (10, 100, 1000, None):
print "%6s:" % n,
print Timer("consume(iterator, %s)" % n,
"import collections, sys\n"
"from __main__ import consume,
iterator").timeit()
print
Results [Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3]
on linux2 pentium4 2.4 GHz]:
consume0
10: 2.94313001633
100: 2.91833305359
1000: 2.93242096901
None: 2.90090417862
consume1
10: 1.80793309212
100: 1.7936270237
1000: 1.83439803123
None: 2.37652015686
consume2
10: 1.58784389496
100: 1.5890610218
1000: 1.58557391167
None: 2.37005710602
consume3
10: 1.6071870327
100: 1.61109304428
1000: 1.60717701912
None: 1.81885385513
Don't the results look suspicious to you? Try measuring with
iterator = iter([])
I'm sure you'll get the same result. An "easy" fix which introduces some
constant overhead but keeps the results comparable:
for consume in fs:
print consume.__name__
for n in (10, 100, 1000, None):
print "%6s:" % n,
print Timer("consume(repeat(None, 1000), %s)" % n,
"import collections, sys\n"
"from __main__ import consume, repeat").timeit()
print
Just for fun, here's a variant of consume3 for the paranoid:
_sentinel = object()
def consume4(iterator, n):
if n is None:
n = sys.maxint
while next(islice(iterator, n, n), _sentinel) is not _sentinel:
pass
Peter
You are obviously right, my brain doesn't work well today :-(
But the corret results even more distinctly support my thesis -- that for
n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
deque (consume1 and consume2):
consume0
10: 4.06217813492
100: 8.45529103279
1000: 53.237226963
None: 54.0063519478
consume1
10: 2.59927105904
100: 3.47109603882
1000: 12.7196500301
None: 26.0995740891
consume2
10: 2.36225390434
100: 3.22979712486
1000: 12.5794699192
None: 28.5096430779
consume3
10: 2.39173388481
100: 3.43043398857
1000: 14.3361399174
None: 14.8560190201
> But the corret results even more distinctly support my thesis -- that for
> n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
> deque (consume1 and consume2):
That advantage may not survive the next release:
http://svn.python.org/view/python/trunk/Modules/_collectionsmodule.c?r1=68145&r2=70296&pathrev=70296
Peter
>> n=None using n=sys.maxint (consume3) is noticeably faster than 0-maxlen
>> deque (consume1 and consume2):
>
> That advantage may not survive the next release:
>
> http://svn.python.org/view/python/trunk/Modules/_collectionsmodule.c?r1=68145&r2=70296&pathrev=70296
Nice. :)
FWIW, the deque() approach becomes even faster in Py2.7 and Py3.1
which has a high-speed path for the case where maxlen is zero.
Here's a snippet from Modules/_collectionsmodule.c:
/* Run an iterator to exhaustion. Shortcut for
the extend/extendleft methods when maxlen == 0. */
static PyObject*
consume_iterator(PyObject *it)
{
PyObject *item;
while ((item = PyIter_Next(it)) != NULL) {
Py_DECREF(item);
}
Py_DECREF(it);
if (PyErr_Occurred())
return NULL;
Py_RETURN_NONE;
}
This code consumes an iterator to exhaustion.
It is short, sweet, and hard to beat.
Raymond
I've always used sum(1 for x in iterator) or some such.
> FWIW, the deque() approach becomes even faster in Py2.7 and Py3.1
> which has a high-speed path for the case where maxlen is zero.
> Here's a snippet from Modules/_collectionsmodule.c:
>
> /* Run an iterator to exhaustion. Shortcut for
> the extend/extendleft methods when maxlen == 0. */
> static PyObject*
> consume_iterator(PyObject *it)
> {
> PyObject *item;
>
> while ((item = PyIter_Next(it)) != NULL) {
> Py_DECREF(item);
> }
> Py_DECREF(it);
> if (PyErr_Occurred())
> return NULL;
> Py_RETURN_NONE;
> }
>
>
> This code consumes an iterator to exhaustion.
> It is short, sweet, and hard to beat.
islice() is still a tad faster. A possible optimization:
static PyObject*
consume_iterator(PyObject *it)
{
PyObject *item;
PyObject *(*iternext)(PyObject *);
iternext = *Py_TYPE(it)->tp_iternext;
while ((item = iternext(it)) != NULL) {
Py_DECREF(item);
}
Py_DECREF(it);
if (PyErr_Occurred()) {
if(PyErr_ExceptionMatches(PyExc_StopIteration))
PyErr_Clear();
else
return NULL;
}
Py_RETURN_NONE;
}
Before:
$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "next(islice(repeat(None,
1000), maxint, maxint), None)"
100000 loops, best of 3: 6.49 usec per loop
$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "deque(repeat(None, 1000),
maxlen=0)"
100000 loops, best of 3: 9.93 usec per loop
After:
$ ./python -m timeit -s"from itertools import repeat, islice; from
collections import deque; from sys import maxint" "deque(repeat(None, 1000),
maxlen=0)"
100000 loops, best of 3: 6.31 usec per loop
Peter
PS: Two more, for Paul Rubin:
$ ./python -m timeit -s"from itertools import repeat" "sum(0 for _ in
repeat(None, 1000))"
10000 loops, best of 3: 125 usec per loop
$ ./python -m timeit -s"from itertools import repeat" "sum(0 for _ in
repeat(None, 1000) if 0)"
10000 loops, best of 3: 68.3 usec per loop