-----------------------
>>> ================================ RESTART ================================
>>>
reduce with named function: 37.9864357062
reduce with nested, named function: 39.4710288598
reduce with lambda: 39.2463927678
sum comprehension: 25.9530121845
>>> ================================ RESTART ================================
>>>
reduce with named function: 36.4529584067
reduce with nested, named function: 37.6278529813
reduce with lambda: 38.2629448715
sum comprehension: 26.0197561422
>>>
from timeit import Timer
def add(x,y):
return x+y
def rednamed(lst):
return reduce(add, lst)
def rednn(lst):
def add2(x,y):
return x+y
return reduce(add2, lst)
def redlambda(lst):
return reduce(lambda x,y:x+y, lst)
def com(lst):
return sum(x for x in lst)
s = xrange(101)
t1 = Timer('rednamed(s)', 'from __main__ import rednamed, s')
t2 = Timer('rednn(s)', 'from __main__ import rednn, s')
t3 = Timer('redlambda(s)', 'from __main__ import redlambda, s')
t4 = Timer('com(s)', 'from __main__ import com, s')
print "reduce with named function: ", t1.timeit()
print "reduce with nested, named function: ", t2.timeit()
print "reduce with lambda: ", t3.timeit()
print "sum comprehension: ", t4.timeit()
---------------------------------------
also, using range instead of xrange doesnt seem to generate a
performance-penalty:
>>>
reduce with named function: 36.7560729087
reduce with nested, named function: 38.5393266463
reduce with lambda: 38.3852953378
sum comprehension: 27.9001007111
>>>
> This must be because of implementation right? Shouldn't reduce be faster
> since it iterates once over the list? doesnt sum first construct the
> list then sum it?
No it doesn't. Why should it?
> also, using range instead of xrange doesnt seem to generate a
> performance-penalty:
(De)Allocating a list of length 100 isn't very slow. Try some million
elements. And watch the memory consumption too.
Ciao,
Marc 'BlackJack' Rintsch
> def com(lst):
> return sum(x for x in lst)
You construct a generator over an existing list in your code.
Try sum([x for x in lst]) to see the effect of additional list
construction. And while you're at it, try the simple sum(lst).
Cheers,
- harold -
No, sum also iterates the sequence just once and doesn't create a
list. It is probably implemented similar to
def sum(sequence, start=0):
it = iter(sequence)
total = start
for i in it:
total += i
return i
but then implemented in C for speed. Reduce is probably implemented
pretty similar, but then with a random function instead of addition.
Make sure that you understand the difference between generator
expression and list comprehension, and that [f(x) for x in something]
is (almost) equal to list(f(x) for x in something), so you can emulate
a LC by using the list constructor on the equivalent GE.
HTH,
Bas
> This must be because of implementation right? Shouldn't reduce be faster
> since it iterates once over the list? doesnt sum first construct the
> list then sum it?
What makes you think that?
Given the speed of sum(), it sure doesn't look like it's generating a
full list before summing. Why would it?
> reduce with named function: 37.9864357062
> reduce with nested, named function: 39.4710288598
> reduce with lambda: 39.2463927678
> sum comprehension: 25.9530121845
If you want to see reduce really shine, time it with a C-based function
rather than one written in pure Python:
>>> Timer('reduce(add, xrange(10000))',
... 'from operator import add').repeat(number=10000)
[19.724750995635986, 19.410486936569214, 19.614511013031006]
>>>
>>> Timer('reduce(add, xrange(10000))',
... 'def add(x, y): return x+y').repeat(number=10000)
[45.210143089294434, 44.814558982849121, 46.906874895095825]
You probably won't see much (if any) benefit for small lists, so make
sure your test is on a significantly-sized input list.
Of course, sum() is even faster than reduce:
>>> Timer('sum(xrange(10000))').repeat(number=10000)
[9.814924955368042, 8.7169640064239502, 9.5062401294708252]
--
Steven
Steven D'Aprano wrote:
> If you want to see reduce really shine, time it with a C-based function
> rather than one written in pure Python:
>
>>>> Timer('reduce(add, xrange(10000))',
> ... 'from operator import add').repeat(number=10000)
> [19.724750995635986, 19.410486936569214, 19.614511013031006]
>>>> Timer('reduce(add, xrange(10000))',
> ... 'def add(x, y): return x+y').repeat(number=10000)
> [45.210143089294434, 44.814558982849121, 46.906874895095825]
...
> Of course, sum() is even faster than reduce:
>
>>>> Timer('sum(xrange(10000))').repeat(number=10000)
> [9.814924955368042, 8.7169640064239502, 9.5062401294708252]
'Of course', because the irreducible difference between reduce(add.seq)
and sum(seq) is that reduce has to call an add function while sum has
the operation built-in in place of a call.
>> Of course, sum() is even faster than reduce:
>>
>>>>> Timer('sum(xrange(10000))').repeat(number=10000)
>> [9.814924955368042, 8.7169640064239502, 9.5062401294708252]
>
> 'Of course', because the irreducible difference between
> reduce(add.seq) and sum(seq) is that reduce has to call an add
> function while sum has the operation built-in in place of a call.
Note that, despite appearances, it's not as built-in as one might
wish. sum(seq) is still completely generic and works on all
number-like objects (in fact on all objects that define an __add__
operation except strings, which are explicitly forbidden). This means
that it can't just go ahead and execute a C addition, it must properly
call PyNumber_Add (the C API call equivalent to Python's "+"
operator), which will then inspect the objects and invoke the
appropriate implementation of addition. On the other hand,
operator.add is defined to do exactly that -- call PyNumber_Add on its
arguments and return the result.
It's not entirely obvious why summing numbers using the same method,
repeatedly calling PyNumber_Add, would be significantly slower for
reduce than for sum. Admittedly reduce has to execute a Python-level
function call which requires allocating and filling out a tuple, only
to reach PyNumber_Add, which sum calls immediately. But
builtin_reduce() is actually careful to optimize those repeated
function calls, for example by reusing the same tuple across the loop.
The time machine strikes again! Try using Py2.6.
The built-in sum() function is much smarter and faster.
It does in fact use C addition.
Raymond