Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sum for sequences?

4 views
Skip to first unread message

kj

unread,
Mar 24, 2010, 11:29:07 AM3/24/10
to

Is there a sequence-oriented equivalent to the sum built-in? E.g.:

seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)

?

(By "sequence" I'm referring primarily to lists and tuples, and
excluding strings, since for these there is ''.join()).

TIA!

~K

Glazner

unread,
Mar 24, 2010, 11:39:10 AM3/24/10
to

try itertools.chain

Neil Cerutti

unread,
Mar 24, 2010, 11:42:02 AM3/24/10
to

reduce, or functools.reduce in Python 3.1.

>>> functools.reduce(operator.add, ((1, 2), (5, 6)))
(1, 2, 5, 6)

--
Neil Cerutti
"It's not fun to build walls. But it's even less fun to live
without walls in a world full of zombies." --Greedy Goblin

Steve Holden

unread,
Mar 24, 2010, 11:42:52 AM3/24/10
to pytho...@python.org
Do you mean you want to flatten a list structure? There have been
several discussions about this, which Google will find for you quite easily.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Duncan Booth

unread,
Mar 24, 2010, 12:20:17 PM3/24/10
to
kj <no.e...@please.post> wrote:

> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
>
> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
>
> ?
>

Apart from the suggestions for Google for general list flattening, for this
specific example you could just use the 'sum' built-in:

>>> sum(((1, 2), (5, 6)), ())
(1, 2, 5, 6)

Just give it an empty tuple as the starting value.

Steven D'Aprano

unread,
Mar 24, 2010, 5:07:24 PM3/24/10
to
On Wed, 24 Mar 2010 15:29:07 +0000, kj wrote:

> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
>
> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
>
> ?

Yes, sum.

help(sum) is your friend.

>>> a = range(2)
>>> b = range(3)
>>> c = range(4)
>>> sum((a, b, c), [])
[0, 1, 0, 1, 2, 0, 1, 2, 3]


Beware though that sum on lists and tuples will be fairly inefficient if
you have lots of them. You may find that this will be much more efficient:

result = []
for seq in sequences:
result.extend(seq)

--
Steven

Paul Rubin

unread,
Mar 24, 2010, 7:19:19 PM3/24/10
to
kj <no.e...@please.post> writes:
> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)

use itertools.chain for this. A few people have mentioned that sum will
also work, but I think for that purpose it could have O(n**2)
complexity.

TomF

unread,
Mar 25, 2010, 2:50:23 AM3/25/10
to
On 2010-03-24 14:07:24 -0700, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> said:
> On Wed, 24 Mar 2010 15:29:07 +0000, kj wrote:
>
>> Is there a sequence-oriented equivalent to the sum built-in? E.g.:
>>
>> seq_sum(((1, 2), (5, 6))) --> (1, 2) + (5, 6) --> (1, 2, 5, 6)
>>
>> ?
>
> Yes, sum.
>
> help(sum) is your friend.

You might not want to be so glib. The sum doc sure doesn't sound like
it should work on lists.

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start' (which defaults to 0).

-Tom

Steven D'Aprano

unread,
Mar 25, 2010, 3:34:51 AM3/25/10
to


What part of that suggested to you that sum might not be polymorphic?
Sure, it says numbers (which should be changed, in my opinion), but it
doesn't specify what sort of numbers -- ints, floats, or custom types
that have an __add__ method. It also singles out strings as excluded. Why
would you need to explicitly exclude strings, since they're not numbers,
if sum *only* works with numbers?

E.g. help(math.sin) could have said this, but doesn't:

Return the sine of x (NOT a dictionary)

It doesn't need to, because dicts aren't exceptional: sin doesn't work on
anything *but* numbers. There's no __sin__ method to call on arbitrary
types.

The fact that sum does single out strings is a clear sign that strings
are treated as exceptional and suggests strongly that summing arbitrary
types should work. I'm not saying that help(sum) explicitly states that
it works with lists (it clearly doesn't), but it does suggest the
possibility and makes the experiment worth trying.

I'll also note that the Fine Manual makes it even more clear that sum is
polymorphic:

http://docs.python.org/library/functions.html#sum


--
Steven

Neil Cerutti

unread,
Mar 25, 2010, 8:37:14 AM3/25/10
to
On 2010-03-25, Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> You might not want to be so glib. The sum doc sure doesn't
>> sound like it should work on lists.
>>
>> Returns the sum of a sequence of numbers (NOT strings) plus the
>> value of parameter 'start' (which defaults to 0).
>
> What part of that suggested to you that sum might not be polymorphic?
> Sure, it says numbers (which should be changed, in my opinion), but it
> doesn't specify what sort of numbers -- ints, floats, or custom types
> that have an __add__ method.

WTF.

Stefan Behnel

unread,
Mar 25, 2010, 8:44:35 AM3/25/10
to pytho...@python.org
Neil Cerutti, 25.03.2010 13:37:

> On 2010-03-25, Steven D'Aprano wrote:
>>> You might not want to be so glib. The sum doc sure doesn't
>>> sound like it should work on lists.
>>>
>>> Returns the sum of a sequence of numbers (NOT strings) plus the
>>> value of parameter 'start' (which defaults to 0).
>>
>> What part of that suggested to you that sum might not be polymorphic?
>> Sure, it says numbers (which should be changed, in my opinion), but it
>> doesn't specify what sort of numbers -- ints, floats, or custom types
>> that have an __add__ method.
>
> WTF.

Warning: truth found!

Stefan

Alf P. Steinbach

unread,
Mar 25, 2010, 9:02:05 AM3/25/10
to
* Neil Cerutti:

> On 2010-03-25, Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>>> You might not want to be so glib. The sum doc sure doesn't
>>> sound like it should work on lists.
>>>
>>> Returns the sum of a sequence of numbers (NOT strings) plus the
>>> value of parameter 'start' (which defaults to 0).
>> What part of that suggested to you that sum might not be polymorphic?
>> Sure, it says numbers (which should be changed, in my opinion), but it
>> doesn't specify what sort of numbers -- ints, floats, or custom types
>> that have an __add__ method.
>
> WTF.

I think Steven's argument is that it would be pointless for 'sum' to distinguish
between user-defined numerical types and other types that happen to support '+'.

It could make such a distinction since there's a type hierarchy for numbers, but
then that should IMHO be more clearly documented.

However, given that it isn't restricted to numbers, the restriction wrt. strings
is a bit perplexing in the context of modern CPython. But for Python
implementations that don't offer the '+=' optimization it might help to avoid
gross inefficiencies, namely quadratic time string concatenation. E.g., here's a
natural implementation of sum -- with unoptimized '+=' yielding quadratic time
for the string concatenation (with modern CPython it's linear time, though):

<example>
>>> def sum_all( values, start = 0 ):
... s = start
... for v in values: s += v
... return s
...
>>> sum_all( (1, 2, 3, 4) )
10
>>> sum_all( ("a", "b", "c", "d"), "" )
'abcd'
>>> sum_all( ((1, 2), (3, 4), (5, 6)), () )
(1, 2, 3, 4, 5, 6)
>>> _
</example>

However, if that hypothesis about the rationale is correct, then 'sum' should
also be restricted to not handle tuples or lists, so forth, but at least the
CPython implementation does.

So perhaps the documentation needs to be more clear?


Cheers,

- Alf

Steven D'Aprano

unread,
Mar 25, 2010, 5:11:50 PM3/25/10
to
On Thu, 25 Mar 2010 14:02:05 +0100, Alf P. Steinbach wrote:

> * Neil Cerutti:
>> On 2010-03-25, Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au>
>> wrote:
>>>> You might not want to be so glib. The sum doc sure doesn't sound
>>>> like it should work on lists.
>>>>
>>>> Returns the sum of a sequence of numbers (NOT strings) plus the
>>>> value of parameter 'start' (which defaults to 0).
>>> What part of that suggested to you that sum might not be polymorphic?
>>> Sure, it says numbers (which should be changed, in my opinion), but it
>>> doesn't specify what sort of numbers -- ints, floats, or custom types
>>> that have an __add__ method.
>>
>> WTF.
>
> I think Steven's argument is that it would be pointless for 'sum' to
> distinguish between user-defined numerical types and other types that
> happen to support '+'.

Before Python2.6, which introduced a numeric tower, Python *couldn't*
reliably distinguish between numeric types and other types that
overloaded +. Since Python discourages type-checking in favour of duck-
typing and try...except, this is seen as a good thing.

My argument is that sum isn't hard-coded to only work on the built-ins
ints or floats, but it supports any object that you can use the +
operator on. The *sole* exceptions are str and unicode (not even
UserString), and even there it is very simple to overcome the restriction:

>>> sum(['a', 'b'], '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>> class S:
... def __add__(self, other):
... return other
...
>>> sum(['a', 'b'], S())
'ab'


[...]


> However, given that it isn't restricted to numbers, the restriction wrt.
> strings is a bit perplexing in the context of modern CPython. But for
> Python implementations that don't offer the '+=' optimization it might
> help to avoid gross inefficiencies, namely quadratic time string
> concatenation.

I agree -- the Python philosophy is to allow the user to shoot themselves
in the foot if they wish to. You're responsible for the Big Oh behaviour
of your code, not the compiler.


[...]


> However, if that hypothesis about the rationale is correct, then 'sum'
> should also be restricted to not handle tuples or lists, so forth, but
> at least the CPython implementation does.

The reasoning is that naive users are far, far more likely to try summing
a large list of strings than to try summing a large list of lists, and
therefore in practical terms the consequences of allowing sum on lists is
slight enough and rare enough to not be worth the check.

I suspect that this is just an after the fact rationalisation, and that
the real reason is that those responsible for the hand-holding in sum
merely forgot, or didn't know, that repeated addition of lists and tuples
is also O(N**2). But I've never cared enough to dig through the archives
to find out.

--
Steven

Steve Howell

unread,
Mar 26, 2010, 10:31:17 AM3/26/10
to
On Mar 24, 4:19 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

I agree on the practical matter that itertools.chain and other
solutions are usually the way to go for most tasks that involve
iterating through several lists.

From a purely academic standpoint, I'm not convinced that sum() is
inefficient in terms of big-O complexity, though.

showell@showell-laptop:~$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
>>> class StupidList:
... def __init__(self, lst):
... print 'creating', lst
... self.lst = lst
... def __add__(self, other):
... self.lst += '|'
... self.lst.extend(other.lst)
... return self
...
>>> result = sum([StupidList([1, 2]), StupidList([3,4]),
StupidList([5,6])], StupidList([0]))
creating [1, 2]
creating [3, 4]
creating [5, 6]
creating [0]
>>> result.lst
[0, '|', 1, 2, '|', 3, 4, '|', 5, 6]

If I'm interpreting the above program correctly, then sum() is doing
the most efficient thing under the hood--it appears to do the
equivalent of += without creating unnecessary objects for intermediate
sums.

I think the special-case error message might be a case where
practicality simply beats out purity. It would be nice if sum() were
completely duck-typed-let-you-shoot-yourself-in-foot-if-you-know-what-
you-are-doing, but maybe this was such a pitfall at one time, that
extra safeguards were put into sum(). I wonder how severely sum(),
without the restriction, would underperform join() on modern versions
of Python, though.

>>> sum('1', '2')


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Note that you can easily fake out sum() to get duck typing.

>>> class EmptyStringStarter:
... def __add__(self, other): return other
...
>>> empty = EmptyStringStarter()
>>> sum(['hello ', 'world'], empty)
'hello world'

Steve Howell

unread,
Mar 26, 2010, 10:56:48 AM3/26/10
to

Looking at the code answers my own questions:

http://svn.python.org/view/python/trunk/Python/bltinmodule.c?view=markup

Look for builtin_sum().

First you see the guard against strings:

/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);


Also, Paul's suspicion that sum() works in O(N squared) for lists is
confirmed by this comment:

/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */

It's interesting, though, that you can construct classes pretty
easily, as I did above, that avoid the quadratic behavior, as long as
you do not mind mutating the start value, which I think is usually
fine, since the original start value usually is not useful afterward
anyway.


Steven D'Aprano

unread,
Mar 27, 2010, 6:18:35 AM3/27/10
to
On Fri, 26 Mar 2010 07:31:17 -0700, Steve Howell wrote:

> From a purely academic standpoint, I'm not convinced that sum() is
> inefficient in terms of big-O complexity, though.
>
> showell@showell-laptop:~$ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on
> linux2
> >>> class StupidList:

[...]

But it's not *sum* that is inefficient, it is sum *with a particular data
structure*.

Sure, if you create your own data structure, you can make it as efficient
as you like. Obviously the sum algorithm itself has to perform one
addition per item, or O(N), which scales tolerably well. But each
addition has a cost. If the cost is constant, then sum() as a whole
remains O(N). But if the cost of addition varies with N, sum() degrades
badly.

We can compare the performance of sum with different data structures,
starting with plain integers versus long integers:

>>> from timeit import Timer
>>> setup = 'data = [%d]*%d'
>>> for i in range(6):
... t1 = Timer('sum(data, 0)', setup % (1, 10**i))
... t2 = Timer('sum(data, 0)', setup % (10**50, 10**i))
... print min(t1.repeat(number=1000)),
... print min(t2.repeat(number=1000))
...
0.00179290771484 0.00263810157776
0.00340414047241 0.00854396820068
0.0190401077271 0.0502791404724
0.155302047729 0.645124912262
0.794432878494 2.55748295784
7.97877693176 25.3812758923

Both scale about as well, but the cost varies significantly: arithmetic
on very large longints is expensive.

Strings, with a trick to fool sum into accepting them, versus lists. Note
that I changed the number of iterations from 6 down to 5. The reason why
will become obvious:

>>> class EmptyStringStarter:
... def __add__(self, other):

... return other
...
>>> empty = EmptyStringStarter()
>>> setup = """from __main__ import empty; data = [%r]*%d"""
>>>
>>> for i in range(5):
... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
... t2 = Timer('sum(data, [])', setup % ([1], 10**i))
... print min(t1.repeat(number=1000)),
... print min(t2.repeat(number=1000))
...
0.00849103927612 0.00226998329163
0.0121459960938 0.0082700252533
0.0489149093628 0.186735153198
0.428920030594 5.28623914719
14.6552250385 589.102822065


Strings perform tolerably well, up to a point, but lists perform
terribly. And in fact, the relatively good performance of strings is an
artifact of recent versions of CPython. In Jython and IronPython, and
older versions of CPython, it will behave as poorly as lists.


> I wonder how severely sum(), without
> the restriction, would underperform join() on modern versions of Python,
> though.


Take note that, in order to get an answer in reasonable time, I've
reduced the number of timing iterations drastically:

>>> for i in range(6):
... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
... t2 = Timer('"".join(data)', setup % ('a', 10**i))
... print min(t1.repeat(number=10)),
... print min(t2.repeat(number=10))
...
8.89301300049e-05 1.09672546387e-05
0.000131845474243 2.19345092773e-05
0.000591993331909 9.29832458496e-05
0.0101289749146 0.00082802772522
0.365957021713 0.00884819030762
24.2072279453 0.0421011447906

Note the performance degradation of sum. It gets worse. Much worse:

>>> for i in range(4, 7):
... t1 = Timer('sum(data, empty)', setup % ('a', 10**i))
... t2 = Timer('"".join(data)', setup % ('a', 10**i))
... print min(t1.repeat(number=1)), # note fewer iterations
... print min(t2.repeat(number=1))
...
0.031229019165 0.000817060470581
2.45445990562 0.00365781784058
1024.79705095 0.0398509502411

This is absolutely catastrophic performance degradation.

--
Steven

Steve Howell

unread,
Mar 28, 2010, 10:16:10 AM3/28/10
to
On Mar 27, 3:18 am, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Fri, 26 Mar 2010 07:31:17 -0700, Steve Howell wrote:
> > From a purely academic standpoint, I'm not convinced that sum() is
> > inefficient in terms of big-O complexity, though.
>
> >  showell@showell-laptop:~$ python
> >  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on
> >  linux2
> >  >>> class StupidList:
>
> [...]
>
> But it's not *sum* that is inefficient, it is sum *with a particular data
> structure*.
>

Yep, the implied context was with particular data structures.

> Sure, if you create your own data structure, you can make it as efficient
> as you like. Obviously the sum algorithm itself has to perform one
> addition per item, or O(N), which scales tolerably well. But each
> addition has a cost. If the cost is constant, then sum() as a whole
> remains O(N). But if the cost of addition varies with N, sum() degrades
> ba
>

The surprising part of sum() is not that the outer loop to do the sums
is O(N). It is hard to imagine any other implementation (without
parallelizing it).

The mildly surprising part of sum() is that is does add vs. add-in-
place, which leads to O(N) vs. O(1) for the inner loop calls, for
certain data structures, notably lists, even though none of the
intermediate results get used by the caller. For lists, you could
make a more efficient variant of sum() that clones the start value and
does add-in-place.

I could guess pretty confidently that the reason this optimization was
never tried is that sum() has always been intended to be used on
numerics, since other alternatives exist for strings (join), lists
(chain), and hand-coded data classes that support add-in-place (roll-
your-own loop).

The documentation is pretty clear on the intention that sum() is
intended for numbers:

http://docs.python.org/library/functions.html#sum

Except for strings, the docs are not explicit about efficiency
concerns for other data structures, or the fact that the reference
implementation does add vs. add-in-place under the hood.


http://docs.python.org/library/functions.html#sum

Steven D'Aprano

unread,
Mar 28, 2010, 10:57:08 AM3/28/10
to
On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:

> The mildly surprising part of sum() is that is does add vs. add-in-
> place, which leads to O(N) vs. O(1) for the inner loop calls, for
> certain data structures, notably lists, even though none of the
> intermediate results get used by the caller. For lists, you could make
> a more efficient variant of sum() that clones the start value and does
> add-in-place.

I have no doubt that you could make a version of sum for lists which is
more efficient than the existing one. After all, join more or less does
the same sort of thing, and it's very efficient. But don't think that add-
in-place is necessarily cheap. List appends are amortized O(1) each; if
you are adding M lists of N items each, that gives you O(M*N).

It's possible to improve the performance a tad if you can make multiple
appends in roughly constant time, which is what list.extend (probably?)
does, but only up to a point. Lists are over-allocated, but if you try to
add more items than there is room for, you need to make the list bigger,
and that means reallocating memory, which could easily be O(N**2) or
worse, depending on how good your OS's memory management is. Under Linux,
at least by default, malloc will never fail, but there's no guarantee how
long it will take to return. If the OOM killer has to start shutting down
other applications, and paging more and more memory to disk, eventually
malloc will return (or the system will core dump), but it could take a
while...

--
Steven

Duncan Booth

unread,
Mar 28, 2010, 11:17:46 AM3/28/10
to
Steve Howell <show...@yahoo.com> wrote:

> The mildly surprising part of sum() is that is does add vs. add-in-
> place, which leads to O(N) vs. O(1) for the inner loop calls, for
> certain data structures, notably lists, even though none of the
> intermediate results get used by the caller. For lists, you could
> make a more efficient variant of sum() that clones the start value and
> does add-in-place.

Doing add-in-place isn't the only way to make sum more efficient: if you
assume that addition is associative (which of course the builtin sum can't)
then you can form partial sums. e.g. instead of calculating:

(((((((a + b) + c) + d) + e) + f) + g) + h)

you calculate:

(((a + b) + (c + d)) + ((e + f) + (g + h)))

Obviously this requires more space than the naive sum, but not as much as
you might at first expect: you only need to hold log(n) intermediates
values at any time.

> I could guess pretty confidently that the reason this optimization was
> never tried is that sum() has always been intended to be used on
> numerics, since other alternatives exist for strings (join), lists
> (chain), and hand-coded data classes that support add-in-place (roll-
> your-own loop).

Doing it this way helps summing lists or strings (though not as much as
str.join), but it also helps if you need to sum a long list of similarly
sized floats as you'll get a more accurate answer.

See
http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29
faf92f532e/027cef7d4429aa3a
for an earlier discussion of this, or just Google comp.lang.python for
'pairwise sum'.

Here's the code I posted in that thread:

def sumpairs(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] = tmp[-1] + v
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
else:
tmp.append(v)
while len(tmp) > 1:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
return tmp[0]

and I claimed that my Python coded sumpairs function was faster than the
builtin sum on a list of lists once you had more than about 210 items.
I never did get round to rewriting it in C for a more realistic speed
comparison: summing integers my Python version is about 60 times slower
than the builtin.

Steve Howell

unread,
Mar 28, 2010, 11:55:05 AM3/28/10
to
On Mar 28, 8:17 am, Duncan Booth <duncan.bo...@invalid.invalid> wrote:

> Steve Howell <showel...@yahoo.com> wrote:
> > The mildly surprising part of sum() is that is does add vs. add-in-
> > place, which leads to O(N) vs. O(1) for the inner loop calls, for
> > certain data structures, notably lists, even though none of the
> > intermediate results get used by the caller.  For lists, you could
> > make a more efficient variant of sum() that clones the start value and
> > does add-in-place.
>
> Doing add-in-place isn't the only way to make sum more efficient: if you
> assume that addition is associative (which of course the builtin sum can't)
> then you can form partial sums. e.g. instead of calculating:
>
>   (((((((a + b) + c) + d) + e) + f) + g) + h)
>
> you calculate:
>
>   (((a + b) + (c + d)) + ((e + f) + (g + h)))
>
> Obviously this requires more space than the naive sum, but not as much as
> you might at first expect: you only need to hold log(n) intermediates
> values at any time.
>

Yep, I did mention in my post that the outer loop does not *have* to
be O(N), if you can parallelize it. Apart from reducing
intermediates, the divide-and-conquer method does not reduce overall
computation time unless you have multiple processors, correct?

> > I could guess pretty confidently that the reason this optimization was
> > never tried is that sum() has always been intended to be used on
> > numerics, since other alternatives exist for strings (join), lists
> > (chain), and hand-coded data classes that support add-in-place (roll-
> > your-own loop).
>
> Doing it this way helps summing lists or strings (though not as much as
> str.join), but it also helps if you need to sum a long list of similarly
> sized floats as you'll get a more accurate answer.
>

Interesting! That makes sense. The docs for math.fsum() suggest that
partial sums are used to maintain precision.

http://docs.python.org/library/math.html#math.fsum

> Seehttp://groups.google.com/group/comp.lang.python/browse_thread/thread/...

Steve Howell

unread,
Mar 28, 2010, 12:21:59 PM3/28/10
to
On Mar 28, 7:57 am, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Sun, 28 Mar 2010 07:16:10 -0700, Steve Howell wrote:
> > The mildly surprising part of sum() is that is does add vs. add-in-
> > place, which leads to O(N) vs. O(1) for the inner loop calls, for
> > certain data structures, notably lists, even though none of the
> > intermediate results get used by the caller.  For lists, you could make
> > a more efficient variant of sum() that clones the start value and does
> > add-in-place.
>
> I have no doubt that you could make a version of sum for lists which is
> more efficient than the existing one. After all, join more or less does
> the same sort of thing, and it's very efficient. But don't think that add-
> in-place is necessarily cheap. List appends are amortized O(1) each; if
> you are adding M lists of N items each, that gives you O(M*N).
>

O(M*N) is still cheaper than O(M*N*N).

> It's possible to improve the performance a tad if you can make multiple
> appends in roughly constant time, which is what list.extend (probably?)
> does, but only up to a point. Lists are over-allocated, but if you try to
> add more items than there is room for, you need to make the list bigger,
> and that means reallocating memory, which could easily be O(N**2) or
> worse, depending on how good your OS's memory management is. Under Linux,
> at least by default, malloc will never fail, but there's no guarantee how
> long it will take to return. If the OOM killer has to start shutting down
> other applications, and paging more and more memory to disk, eventually
> malloc will return (or the system will core dump), but it could take a
> while...
>

Even though extend() obviously has to do memory allocations along the
way itself, it is still more efficient than the alternative. No need
to speculate, you can measure these methods on your platform:

M = 10
N = 1000

def in_place(
start = [],
sublists = ([[None] * M]) * N
):
accum = start[:]
for sublist in sublists:
accum.extend(sublist)
return accum

def with_intermediates(
start = [],
sublists = ([[None] * M]) * N
):
accum = start
for sublist in sublists:
accum = accum + sublist
return accum

Patrick Maupin

unread,
Mar 28, 2010, 12:46:07 PM3/28/10
to
On Mar 28, 10:17 am, Duncan Booth <duncan.bo...@invalid.invalid>
wrote:

> Doing add-in-place isn't the only way to make sum more efficient: if you
> assume that addition is associative (which of course the builtin sum can't)
> then you can form partial sums. e.g. instead of calculating:
...

>
> Doing it this way helps summing lists or strings (though not as much as
> str.join), but it also helps if you need to sum a long list of similarly
> sized floats as you'll get a more accurate answer.

Also, partial sums would be a clear winner over add-in-place if
someone were dumb^H^H^H^Hnaive enough to use sum() on a long list of
tuples :-)

Alf P. Steinbach

unread,
Mar 28, 2010, 12:56:26 PM3/28/10
to
* Steve Howell:

> On Mar 28, 8:17 am, Duncan Booth <duncan.bo...@invalid.invalid> wrote:
>> Steve Howell <showel...@yahoo.com> wrote:
>>> The mildly surprising part of sum() is that is does add vs. add-in-
>>> place, which leads to O(N) vs. O(1) for the inner loop calls, for
>>> certain data structures, notably lists, even though none of the
>>> intermediate results get used by the caller. For lists, you could
>>> make a more efficient variant of sum() that clones the start value and
>>> does add-in-place.
>> Doing add-in-place isn't the only way to make sum more efficient: if you
>> assume that addition is associative (which of course the builtin sum can't)
>> then you can form partial sums. e.g. instead of calculating:
>>
>> (((((((a + b) + c) + d) + e) + f) + g) + h)
>>
>> you calculate:
>>
>> (((a + b) + (c + d)) + ((e + f) + (g + h)))
>>
>> Obviously this requires more space than the naive sum, but not as much as
>> you might at first expect: you only need to hold log(n) intermediates
>> values at any time.
>>
>
> Yep, I did mention in my post that the outer loop does not *have* to
> be O(N), if you can parallelize it. Apart from reducing
> intermediates, the divide-and-conquer method does not reduce overall
> computation time unless you have multiple processors, correct?
>

With a single thread of execution it can reduce the time for large integers and
custom numerical types, with n the number of final bits, from O(n^2) to O(n log n).

I don't think the parallelism here is very relevant for Python.

From a more practical point of view, the sum efficiency could be improved by
doing the first addition using '+' and the rest using '+=', without changing the
behavior.


Cheers & hth.,

- Alf

Steve Howell

unread,
Mar 28, 2010, 1:21:08 PM3/28/10
to
On Mar 28, 9:56 am, "Alf P. Steinbach" <al...@start.no> wrote:
> * Steve Howell:
>> (talking about summing a list of lists)

>
>  From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing the
> behavior.

I like that approach, because you don't have to assume that the start
object is clonable, although perhaps you need to make that assumption
anyway for the case where the list is empty.

Here is code that might shed light on Alf's suggested approach:

import timeit
M = 10
N = 8000

def in_place(
start = [],
sublists = ([[None] * M]) * N
):

# only macro-optimized
i = 0
for sublist in sublists:
if i == 0:
accum = start + sublist
i += 1
else:
accum.extend(sublist)
if i == 0:
return 'whatever' # semantics here?
return accum

def with_intermediates(
start = [],
sublists = ([[None] * M]) * N
):
accum = start
for sublist in sublists:
accum = accum + sublist
return accum

t1 = timeit.timeit('in_place()', 'from __main__ import in_place',
number=10)
t2 = timeit.timeit('with_intermediates()', 'from __main__ import
with_intermediates', number=10)
print M, N, t1, t2, int(t2 / t1)

For N=100, you get a significant speedup (x84):

10 1000 0.00102496147156 0.0867249965668 84

More data:

10 1000 0.00102496147156 0.0867249965668 84
10 2000 0.00211381912231 0.172616004944 81
10 4000 0.00491786003113 0.561800956726 114
10 8000 0.0706541538239 19.9019489288 281
10 16000 0.0161759853363 9.64171791077 596
10 32000 0.0351569652557 43.98346591 1251

Patrick Maupin

unread,
Mar 28, 2010, 1:27:38 PM3/28/10
to
On Mar 28, 11:56 am, "Alf P. Steinbach" <al...@start.no> wrote:
>  From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing the
> behavior.

Mod parent up!!!! :-)

Steve Howell

unread,
Mar 28, 2010, 1:34:13 PM3/28/10
to
On Mar 28, 10:21 am, Steve Howell <showel...@yahoo.com> wrote:
>     import timeit
>     M = 10
>     N = 8000
>
>     def in_place(
>         start = [],
>         sublists = ([[None] * M]) * N
>         ):
>         # only macro-optimized
>         i = 0
>         for sublist in sublists:
>             if i == 0:
>                accum = start + sublist
>                i += 1
>             else:
>                 accum.extend(sublist)

FYI I later obtained similar results with the more general:
accum += sublist

Patrick Maupin

unread,
Mar 28, 2010, 2:16:25 PM3/28/10
to
On Mar 28, 12:34 pm, Steve Howell <showel...@yahoo.com> wrote:
> FYI I later obtained similar results with the more general:
>                   accum += sublist

Yeah, but you still have to create an object of the correct type for
accum. And for summing small lists, that will actually increase the
runtime. (Worst case, a list of a single item: you have to create the
accum object based on the type of the start value, then do two += into
it, for the start value and the first item in the list, vs just doing
a single + which automatically creates an object.)

Personally, I think the most general approach, if sum were open to
enhancements, would be to automatically apply Alf's suggestion of "+"
on summing the first item to the start value, and "+=" on subsequent
items. Also, you *could* add a parameter to sum(), for example,
default "partialsums=False" that would allow the use of partial sums
in those cases where that might give better behavior (such as floats
and tuples).

Steve Howell

unread,
Mar 28, 2010, 2:50:09 PM3/28/10
to
On Mar 28, 11:16 am, Patrick Maupin <pmau...@gmail.com> wrote:
> On Mar 28, 12:34 pm, Steve Howell <showel...@yahoo.com> wrote:
>
> > FYI I later obtained similar results with the more general:
> >                   accum += sublist
>
> Yeah, but you still have to create an object of the correct type for
> accum.  

I think you overlooked the surrounding code.

Here is the code again:

def in_place(
start = [],
sublists = ([[None] * M]) * N
):
# only macro-optimized
i = 0
for sublist in sublists:
if i == 0:
accum = start + sublist
i += 1
else:

accum += sublist
if i == 0:
return 'whatever' # semantics here?
return accum

No need to infer the type.

> And for summing small lists, that will actually increase the
> runtime.  (Worst case, a list of a single item: you have to create the
> accum object based on the type of the start value, then do two += into
> it, for the start value and the first item in the list, vs just doing
> a single + which automatically creates an object.)
>

For small lists, the performance of any sane implementation would
probably be so fast as to be negligible, unless you are in a really
tight loop. If you are in a tight loop, then your use case probably
eventually sums large lists. Even if it doesn't, the slowdown is just
a constant.

For M=5, I get these results on my machine:

M N t1 t2 (t2/t1)
5 1 3.50475311279e-05 3.2901763916e-05 0.938775510204
5 2 2.00271606445e-05 1.59740447998e-05 0.797619047619
5 4 6.79492950439e-05 6.31809234619e-05 0.929824561404
5 8 0.000124931335449 0.000126123428345 1.00954198473
5 64 0.000530958175659 0.00226187705994 4.25999101931
5 1024 0.00262904167175 0.27246594429 103.636981953

t1 = time with add only first element
t2 = time with add all elements

Very small penalty for small lists, very large gain for large lists.

> Personally, I think the most general approach, if sum were open to
> enhancements, would be to automatically apply Alf's suggestion of "+"
> on summing the first item to the start value, and "+=" on subsequent
> items.  

See above. That's the approach I would use as well.

Paul Rubin

unread,
Mar 28, 2010, 4:49:32 PM3/28/10
to
Steve Howell <show...@yahoo.com> writes:
> The documentation is pretty clear on the intention that sum() is
> intended for numbers: ...

I've never been big on the idea of duck-typing addition. Who would have
thought that (1,2,3)+(4.5.6) was something other than the vector sum?

I think itertools.chain more directly expresses what the OP was trying
to do, and is preferable for readability, independently of these
performance questions.

Steven D'Aprano

unread,
Mar 28, 2010, 9:34:54 PM3/28/10
to
On Sun, 28 Mar 2010 13:49:32 -0700, Paul Rubin wrote:

> Steve Howell <show...@yahoo.com> writes:
>> The documentation is pretty clear on the intention that sum() is
>> intended for numbers: ...
>
> I've never been big on the idea of duck-typing addition. Who would have
> thought that (1,2,3)+(4.5.6) was something other than the vector sum?

But your arguments are tuples, not vectors.

There are languages which do treat arithmetic operations on vectors as
vector operations, as do (e.g.) H-P scientific calculators. That's a fine
design choice, and it works for languages where the emphasis is on
scientific calculations. But Python is a more generalist language, so in
my mind it is more appropriate to treat lists as generic lists, and not
numeric vectors:

[1, 2, 3] + [4, "a"] => [1, 2, 3, 4, "a"]

just as

"123" + "4a" => "1234a"

--
Steven

Steven D'Aprano

unread,
Mar 28, 2010, 10:45:32 PM3/28/10
to
On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:

> From a more practical point of view, the sum efficiency could be
> improved by doing the first addition using '+' and the rest using '+=',
> without changing the behavior.

But that would change the behaviour. The __iadd__ method is not the same
as the __add__ method, and you can't guarantee that they will behave the
same -- or even similarly.

And what about tuples? And subclasses of list/tuples? How many different
types need to be optimized?

In practical terms, does anyone actually ever use sum on more than a
handful of lists? I don't believe this is more than a hypothetical
problem.

The primary use case for sum is adding numbers when floating point
accuracy is not critical. If you need float accuracy, use math.fsum. If
you need to add more than a handful of small lists, don't use sum: just
calling extend in a loop is probably fast enough, or use itertools.chain.
Trying to turn sum into an all-singing all-dancing function optimised for
an ever-increasing number of objects is a fool's errand. The
implementation of sum in C is already complex enough: 95 lines, excluding
comments, blanks and braces, for something which is a lightweight
function with very simple semantics.

But if anyone wants to submit a patch to the bug tracker, go right ahead.
Without a patch though, I'd say that Python-Dev will consider this a non-
issue.

--
Steven

Alf P. Steinbach

unread,
Mar 29, 2010, 3:34:07 AM3/29/10
to
* Steven D'Aprano:

> On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:
>
>> From a more practical point of view, the sum efficiency could be
>> improved by doing the first addition using '+' and the rest using '+=',
>> without changing the behavior.
>
> But that would change the behaviour. The __iadd__ method is not the same
> as the __add__ method, and you can't guarantee that they will behave the
> same -- or even similarly.

Hm, I don't think it's documented (except if the reference implementation serves
as documentation) which one is currently used.


> And what about tuples? And subclasses of list/tuples? How many different
> types need to be optimized?

Point. One would need to check for availability of '+='.


> In practical terms, does anyone actually ever use sum on more than a
> handful of lists? I don't believe this is more than a hypothetical
> problem.

Agreed.


Cheers,

- Alf

Patrick Maupin

unread,
Mar 29, 2010, 10:40:54 AM3/29/10
to
On Mar 28, 9:45 pm, Steven D'Aprano

<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> And what about tuples? And subclasses of list/tuples? How many different
> types need to be optimized?

One of the beautiful things about Python is that, for most things,
there are few surprises for even new users. "There should be one
obvious way to do it" for the user means that, sometimes, under the
hood, there are a lot of special cases for the implementers.

> In practical terms, does anyone actually ever use sum on more than a
> handful of lists? I don't believe this is more than a hypothetical
> problem.

Right now, it's probably not, because when somebody sums a large list
and gets thwacked on the head by the lack of efficiency, they then
come here and get thwacked because "everybody knows" they should user
itertools or something else; not sum().


>
> The primary use case for sum is adding numbers when floating point
> accuracy is not critical. If you need float accuracy, use math.fsum.

See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."

> But if anyone wants to submit a patch to the bug tracker, go right ahead.
> Without a patch though, I'd say that Python-Dev will consider this a non-
> issue.

Agreed. Wish I had the time to do this sort of cleanup.

Regards,
Pat

Steve Howell

unread,
Mar 29, 2010, 11:12:00 AM3/29/10
to
On Mar 29, 7:40 am, Patrick Maupin <pmau...@gmail.com> wrote:
> On Mar 28, 9:45 pm, Steven D'Aprano
>
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
> > And what about tuples? And subclasses of list/tuples? How many different
> > types need to be optimized?
>
> One of the beautiful things about Python is that, for most things,
> there are few surprises for even new users.  "There should be one
> obvious way to do it" for the user means that, sometimes, under the
> hood, there are a lot of special cases for the implementers.
>

If nothing else, I think it's reasonably for users to expect symmetry.

If you can use "+" to concatentate lists, then it seems reasonable
that something spelled "sum" would concatenate lists as well, and in
reasonable time.

> > In practical terms, does anyone actually ever use sum on more than a
> > handful of lists? I don't believe this is more than a hypothetical
> > problem.
>
> Right now, it's probably not, because when somebody sums a large list
> and gets thwacked on the head by the lack of efficiency, they then
> come here and get thwacked because "everybody knows" they should user
> itertools or something else; not sum().
>

Indeed. It would be nice if the docs for sum() at least pointed to
list(itertools.chain(list_of_lists)), or whatever the most kosher
alternative is supposed to be.

It only takes a handful of sublists, about ten on my box, to expose
the limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's
under the hood. It only takes 200 sublists to start getting a 10x
degradation in performance.

> > The primary use case for sum is adding numbers when floating point
> > accuracy is not critical. If you need float accuracy, use math.fsum.
>
> See, I think the very existence of math.fsum() already violates "there
> should be one obvious way to do it."
>

The nice thing about math.fsum() is that it is at least documented
from sum(), although I suspect some users try sum() without even
consulting the docs.

You could appease all users with an API where the most obvious choice,
sum(), never behaves badly, and where users can still call more
specialized versions (math.fsum() and friends) directly if they know
what they are doing. This goes back to the statement that Patrick
makes--under the hood, this means more special cases for implementers,
but fewer pitfalls for users.

Steven D'Aprano

unread,
Mar 29, 2010, 7:19:28 PM3/29/10
to
On Mon, 29 Mar 2010 07:40:54 -0700, Patrick Maupin wrote:

> On Mar 28, 9:45 pm, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> And what about tuples? And subclasses of list/tuples? How many
>> different types need to be optimized?
>
> One of the beautiful things about Python is that, for most things, there
> are few surprises for even new users. "There should be one obvious way
> to do it" for the user means that, sometimes, under the hood, there are
> a lot of special cases for the implementers.

It never ceases to amaze me how often people simply don't understand this.

"There should be one obvious way to do it" is the opposite of "NO obvious
way", not of "many ways which may or may not be obvious". The complete
quote from the Zen makes that clear:

There should be one-- and preferably ONLY one --obvious way to do it.
[Emphasis added]

And don't forget the next line:

Although that way may not be obvious at first unless you're Dutch.

Python is under no compulsion to make "the obvious way" obvious to anyone
except Guido. It's a bonus if it happens to be obvious to newbies, not a
requirement.

And besides, what is "it" you're talking about?

* Adding integers, decimals or fractions, or floats with a low
requirement for precision and accuracy? Use sum.

* Adding floats with a high requirement for precision and accuracy?
Use math.fsum.

* Concatenating strings? Use ''.join.

* Joining lists? Use [].extend.

* Iterating over an arbitrary sequence of arbitrary sequences?
Use itertools.chain.

That's five different "its", and five obvious ways to do them.


>> In practical terms, does anyone actually ever use sum on more than a
>> handful of lists? I don't believe this is more than a hypothetical
>> problem.
>
> Right now, it's probably not, because when somebody sums a large list
> and gets thwacked on the head by the lack of efficiency, they then come
> here and get thwacked because "everybody knows" they should user
> itertools or something else; not sum().

Exactly. If you're adding a large number of large lists, you're already
doing it wrong. Making sum more efficient is just a band aid.


>> The primary use case for sum is adding numbers when floating point
>> accuracy is not critical. If you need float accuracy, use math.fsum.
>
> See, I think the very existence of math.fsum() already violates "there
> should be one obvious way to do it."

How does the existence of math.fsum contradict the existence of sum?


--
Steven

Steven D'Aprano

unread,
Mar 29, 2010, 7:38:53 PM3/29/10
to
On Mon, 29 Mar 2010 08:12:00 -0700, Steve Howell wrote:

> On Mar 29, 7:40 am, Patrick Maupin <pmau...@gmail.com> wrote:
>> On Mar 28, 9:45 pm, Steven D'Aprano
>>
>> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> > And what about tuples? And subclasses of list/tuples? How many
>> > different types need to be optimized?
>>
>> One of the beautiful things about Python is that, for most things,
>> there are few surprises for even new users.  "There should be one
>> obvious way to do it" for the user means that, sometimes, under the
>> hood, there are a lot of special cases for the implementers.
>>
>>
> If nothing else, I think it's reasonably for users to expect symmetry.

Why? What is symmetry in programming?

Since the + operator takes both numbers and lists, and the - operator
doesn't, does "symmetry" require that we make up some list operation so
that integers and lists are symmetrical?


> If you can use "+" to concatentate lists, then it seems reasonable that
> something spelled "sum" would concatenate lists as well, and in
> reasonable time.

Where do you get the "reasonable time" from? A *single* + on lists can be
slooooow. Try this:

L = [None]*10**9
result = L+L

(assuming you've even got enough memory to do so)

With a very few exceptions (e.g. dict lookup being "usually" O(1), list
append being amortized O(1)), Python makes no promises about performance.
It's not part of the language. If you, the programmer, are making any
assumptions about performance that aren't clearly and explicitly
documented in the official docs, then YOU are at fault, not Python.


>> > In practical terms, does anyone actually ever use sum on more than a
>> > handful of lists? I don't believe this is more than a hypothetical
>> > problem.
>>
>> Right now, it's probably not, because when somebody sums a large list
>> and gets thwacked on the head by the lack of efficiency, they then come
>> here and get thwacked because "everybody knows" they should user
>> itertools or something else; not sum().
>>
>>
> Indeed. It would be nice if the docs for sum() at least pointed to
> list(itertools.chain(list_of_lists)), or whatever the most kosher
> alternative is supposed to be.
>
> It only takes a handful of sublists, about ten on my box, to expose the
> limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's under
> the hood. It only takes 200 sublists to start getting a 10x degradation
> in performance.

And how many times have you needed to add 200 sublists?

How many times have you needed to add 10 sublists?

Nobody has demonstrated that this is anything more than a hypothetical
problem.


>> > The primary use case for sum is adding numbers when floating point
>> > accuracy is not critical. If you need float accuracy, use math.fsum.
>>
>> See, I think the very existence of math.fsum() already violates "there
>> should be one obvious way to do it."
>>
>>
> The nice thing about math.fsum() is that it is at least documented from
> sum(), although I suspect some users try sum() without even consulting
> the docs.
>
> You could appease all users with an API where the most obvious choice,
> sum(), never behaves badly, and where users can still call more
> specialized versions (math.fsum() and friends) directly if they know
> what they are doing. This goes back to the statement that Patrick
> makes--under the hood, this means more special cases for implementers,
> but fewer pitfalls for users.

More special cases for implementers means more bugs in the language,
which means I end up writing my own code and ignoring the built-in
version anyway.

More special cases means I have to pay the cost of high accuracy float
summation even when I don't need, or want, it.

More special cases means I'm fooled into paying the cost of summing lists
when I don't need to, because it's easier than importing itertools:

for item in sum(lots_of_lists):
pass

needlessly creates a large list out of the smaller ones. Even if I don't
fall for the temptation, and write bad code, I still pay the cost in the
libraries and applications written by others.


More special cases isn't free. It's MORE costly than teaching users to
use list.extend or itertools.chain.


--
Steven

Paul Rubin

unread,
Mar 29, 2010, 7:36:17 PM3/29/10
to
Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes:
> * Iterating over an arbitrary sequence of arbitrary sequences?
> Use itertools.chain.

chain is only for finite sequences. For arbitrary sequences,
use chain.from_iterable.

Steve Howell

unread,
Mar 29, 2010, 10:05:30 PM3/29/10
to
On Mar 29, 4:38 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Mon, 29 Mar 2010 08:12:00 -0700, Steve Howell wrote:
> > On Mar 29, 7:40 am, Patrick Maupin <pmau...@gmail.com> wrote:
> >> On Mar 28, 9:45 pm, Steven D'Aprano
>
> >> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
> >> > And what about tuples? And subclasses of list/tuples? How many
> >> > different types need to be optimized?
>
> >> One of the beautiful things about Python is that, for most things,
> >> there are few surprises for even new users.  "There should be one
> >> obvious way to do it" for the user means that, sometimes, under the
> >> hood, there are a lot of special cases for the implementers.
>
> > If nothing else, I think it's reasonably for users to expect symmetry.
>
> Why? What is symmetry in programming?

"Symmetry" is best shown by example.

>>> 3 - 2
1
>>> set([1,2,3]) - set([2,3])
set([1])

>>> 5 * 3
15
>>> [1, 2, 3, 4, 5] * 3
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

>>> 1 + 2 + 3 + 4
10
>>> sum([1,2,3,4])
10

>>> [1,2] + [3,4]
[1, 2, 3, 4]
>>> sum([[1,2], [3,4]], [])
[1, 2, 3, 4]

> Since the + operator takes both numbers and lists, and the - operator
> doesn't, does "symmetry" require that we make up some list operation so
> that integers and lists are symmetrical?
>

No. Nobody is advocating for list subtraction.

> > If you can use "+" to concatentate lists, then it seems reasonable that
> > something spelled "sum" would concatenate lists as well, and in
> > reasonable time.
>
> Where do you get the "reasonable time" from?
>

From common sense. Concatenation of lists should be in O(M*N) time,
not O(M*N*N), because there is no need to build intermediate lists.

>
> With a very few exceptions (e.g. dict lookup being "usually" O(1), list
> append being amortized O(1)), Python makes no promises about performance.
> It's not part of the language. If you, the programmer, are making any
> assumptions about performance that aren't clearly and explicitly
> documented in the official docs, then YOU are at fault, not Python.

I'm not making any assumptions here. I know that sum() is needlessly
slow for lists, even though it is not explicitly documented.

>
> > It only takes a handful of sublists, about ten on my box, to expose the
> > limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's under
> > the hood.  It only takes 200 sublists to start getting a 10x degradation
> > in performance.
>
> And how many times have you needed to add 200 sublists?
>
> How many times have you needed to add 10 sublists?
>

Aggregating lists is a very common task in programming. An example of
a list of lists would be a list of departments, where each department
is a list of employees. If you want to sort the employees, you will
want to aggregate to a list.

> More special cases for implementers means more bugs in the language,
> which means I end up writing my own code and ignoring the built-in
> version anyway.
>

Special cases do not have to introduce bugs in the language.

> More special cases means I have to pay the cost of high accuracy float
> summation even when I don't need, or want, it.
>

Nothing about making sum() work for the general cases precludes more
specific, optimized functions.

> More special cases means I'm fooled into paying the cost of summing lists
> when I don't need to, because it's easier than importing itertools:

You don't have to be fooled.

> for item in sum(lots_of_lists):
>     pass
>
> needlessly creates a large list out of the smaller ones.

Although this is a mostly harmless example, developers with common
sense would not sum lots of lists unless they expected to keep the
resulting list around for multiple operations, or for one operation,
like sort(), where you need to create a list for the subsequent
operation.

> Even if I don't
> fall for the temptation, and write bad code, I still pay the cost in the
> libraries and applications written by others.

You already pay the small price for polymorphism when you use Python.

> More special cases isn't free.

Nobody said they were.

> It's MORE costly than teaching users to
> use list.extend or itertools.chain.
>

Which the docs for sum() don't do.

Patrick Maupin

unread,
Mar 29, 2010, 10:24:42 PM3/29/10
to
On Mar 29, 6:19 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> How does the existence of math.fsum contradict the existence of sum?

You're exceptionally good at (probably deliberately) mis-interpreting
what people write.

Regards,
Pat

Patrick Maupin

unread,
Mar 29, 2010, 10:31:44 PM3/29/10
to
On Mar 29, 6:38 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> With a very few exceptions (e.g. dict lookup being "usually" O(1), list
> append being amortized O(1)), Python makes no promises about performance.
> It's not part of the language. If you, the programmer, are making any
> assumptions about performance that aren't clearly and explicitly
> documented in the official docs, then YOU are at fault, not Python.

It's not about promises, guarantees, quid-pro-quo, etc.

It's about a lack of surprises. Which, 99% of the time, Python excels
at. This is why many of us program in Python. This is why some of us
who would never use sum() on lists, EVEN IF IT WERE FIXED TO NOT BE SO
OBNOXIOUSLY SLOW, advocate that it, in fact, be fixed to not be so
obnoxiously slow.

BTW, it's also not about "fault". There is no shame in writing a
Python program, seeing that it doesn't go fast enough, and then
hammering on it until it does. There is also no shame in not reading
the docs before you write the program, although arguably (and you
obviously work very hard to help see to this) a great deal of shame
attaches to posting to the newsgroup before reading the docs.

Regards,
Pat

Steve Howell

unread,
Mar 29, 2010, 10:53:04 PM3/29/10
to
On Mar 29, 4:19 pm, Steven D'Aprano <st...@REMOVE-THIS-

Let's go through them...

> * Adding integers, decimals or fractions, or floats with a low
> requirement for precision and accuracy? Use sum.
>

Pretty obvious.

>>> sum([1, 2, 3])
6

> * Adding floats with a high requirement for precision and accuracy?
> Use math.fsum.
>

Mostly obvious.

>>> fsum([1.234534665989, 2.987, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'fsum' is not defined
>>> import math
>>> math.fsum([1.234534665989, 2.987, 3])
7.2215346659890001


> * Concatenating strings? Use ''.join.


Common pitfall:

>>> ['abc', 'def', 'ghi'].join()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'
>>> ''.join(['abc', 'def', 'ghi'])
'abcdefghi'


> * Joining lists? Use [].extend.

Obvious, yes. Convenient? Not really.

>>> start = []
>>> for list in [[1, 2], [3, 4]]:
... start.extend(list)
...
>>> start
[1, 2, 3, 4]

> * Iterating over an arbitrary sequence of arbitrary sequences?
> Use itertools.chain.

>>> group1 = ['al', 'bob']
>>> group2 = ['cal']
>>> groups = [group1, group2]

Obvious if you are Dutch...

>>> itertools.chain(groups)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'itertools' is not defined
>>> import itertools
>>> itertools.chain(groups)
<itertools.chain object at 0x9a9658c>
>>> len(itertools.chain(groups))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'itertools.chain' has no len()
>>> len(list(itertools.chain(groups)))
2
>>> len(list(itertools.chain(*groups)))
3

So you have sum, fsum, join, extend, and chain.

Sum is builtin, but you have to import fsum from math and chain from
itertools.

Join is actually a method on strings, not sequences.

Chain can take multiple lists, or you can use the splat operator on a
list of lists.

Extend is actually a method on lists, and it only takes one list, not
multiple ones.

>>> [].extend(*groups)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: extend() takes exactly one argument (2 given)

Just commit all that to memory, and enjoy the productivity of using a
high level language! ;)

Steven D'Aprano

unread,
Mar 29, 2010, 11:01:03 PM3/29/10
to
On Mon, 29 Mar 2010 19:05:30 -0700, Steve Howell wrote:

>> > If nothing else, I think it's reasonably for users to expect
>> > symmetry.
>>
>> Why? What is symmetry in programming?
>
> "Symmetry" is best shown by example.
>
> >>> 3 - 2
> 1
> >>> set([1,2,3]) - set([2,3])
> set([1])


That's a useless example.


>>> 42 - 10
32
>>> set([42]) - set([10])
set([42])


You don't define symmetry. You don't even give a sensible example of
symmetry. Consequently I reject your argument that because sum is the
obvious way to sum a lot of integers, "symmetry" implies that it should
be the obvious way to concatenate a lot of lists.


>>> 1+2
3
>>> [1] + [2]
[1, 2]
>>> set([1]) + set([2])


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

TypeError: unsupported operand type(s) for +: 'set' and 'set'


>> > If you can use "+" to concatentate lists, then it seems reasonable
>> > that something spelled "sum" would concatenate lists as well, and in
>> > reasonable time.
>>
>> Where do you get the "reasonable time" from?
>>
>>
> From common sense. Concatenation of lists should be in O(M*N) time, not
> O(M*N*N), because there is no need to build intermediate lists.

You are correct that building intermediate lists isn't *compulsory*,
there are alternatives, but the alternatives themselves have costs.
Complexity itself is a cost. sum currently has nice simple semantics,
which means you can reason about it: sum(sequence, start) is the same as

total = start
for item in sequence:
total = total + start
return total

You don't have to care what the items in sequence are, you don't have to
make assumptions about what methods sequence and start have (beyond
supporting iteration and addition). Adding special cases to sum means it
becomes more complex and harder to reason about. If you pass some other
sequence type in the middle of a bunch of lists, what will happen? Will
sum suddenly break, or perhaps continue to work but inefficiently?

You still need to ask these questions with existing sum, but it is
comparatively easy to answer them: you only need to consider how the
alternative behaves when added to a list. You don't have to think about
the technicalities of the sum algorithm itself -- sometimes it calls +,
sometimes extend, sometimes +=, sometimes something else... which of the
various different optimized branches will I fall into this time? Who
knows? sum already has two branches. In my opinion, three branches is one
too many.


[...]


>> > It only takes a handful of sublists, about ten on my box, to expose
>> > the limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's
>> > under the hood.  It only takes 200 sublists to start getting a 10x
>> > degradation in performance.
>>
>> And how many times have you needed to add 200 sublists?
>>
>> How many times have you needed to add 10 sublists?
>>
>>
> Aggregating lists is a very common task in programming.

"Aggregating" lists? Not summing them? I think you've just undercut your
argument that sum is the "obvious" way of concatenating lists.

In natural language, we don't talk about "summing" lists, we talk about
joining, concatenating or aggregating them. You have just done it
yourself, and made my point for me. And this very thread started because
somebody wanted to know what the equivalent to sum for sequences.

If sum was the obvious way to concatenate sequences, this thread wouldn't
even exist.


> An example of a
> list of lists would be a list of departments, where each department is a
> list of employees. If you want to sort the employees, you will want to
> aggregate to a list.

I grant you that 10 departments is likely to be a borderline case, but if
you have 200 departments, then you will most likely be querying a
database and getting back a single list of employees. And if you're not,
you probably should be.


>> More special cases for implementers means more bugs in the language,
>> which means I end up writing my own code and ignoring the built-in
>> version anyway.
>>
>>
> Special cases do not have to introduce bugs in the language.

They don't *have* to, but since even Donald Knuth has written code with
bugs in it, it is a safe bet that the number of bugs in code written by
mere mortals will be roughly proportional to the complexity: more complex
means more bugs. You can go to the bank with that.


>> More special cases means I have to pay the cost of high accuracy float
>> summation even when I don't need, or want, it.
>>
>>
> Nothing about making sum() work for the general cases precludes more
> specific, optimized functions.

You have missed my point. If I call your version of sum which all the
special cases, I pay the overhead of the complexity regardless of whether
I need it or not. The existence of other functions which duplicate the
special cases in sum is irrelevant.

[...]


>> It's MORE costly than teaching users to use list.extend or
>> itertools.chain.
>>
>>
> Which the docs for sum() don't do.

Patches are always welcome.


--
Steven

Steven D'Aprano

unread,
Mar 29, 2010, 11:02:25 PM3/29/10
to

Correction noted. Obviously I'm not Dutch.


--
Steven

Steven D'Aprano

unread,
Mar 29, 2010, 11:29:11 PM3/29/10
to

I cannot read your mind, I can only interpret the words you choose to
write. You said

[quote]


See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."

[end quote]


If sum satisfies the existence of one obvious way, how does math.fsum
violate it? sum exists, and is obvious, regardless of whatever other
solutions exist as well.

--
Steven

Patrick Maupin

unread,
Mar 29, 2010, 11:33:14 PM3/29/10
to
On Mar 29, 10:29 pm, Steven D'Aprano

Because sum() is the obvious way to sum floats; now the existence of
math.fsum() means there are TWO obvious ways to sum floats. Is that
really that hard to understand? How can you misconstrue this so badly
that you write something that can be (easily) interpreted to mean that
you think that I think that once math.fsum() exists, sum() doesn't
even exist any more????

Steve Howell

unread,
Mar 29, 2010, 11:34:17 PM3/29/10
to
On Mar 29, 8:01 pm, Steven D'Aprano

<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> You don't define symmetry. You don't even give a sensible example of
> symmetry. Consequently I reject your argument that because sum is the
> obvious way to sum a lot of integers, "symmetry" implies that it should
> be the obvious way to concatenate a lot of lists.
>

You are not rejecting my argument; you are rejecting an improper
paraphrase of my argument.

My argument was that repeated use of "+" is spelled "sum" for
integers, so it's natural to expect the same name for repeated use of
"+" on lists. Python already allows for this symmetry, just SLOWLY.

>
> You are correct that building intermediate lists isn't *compulsory*,
> there are alternatives, but the alternatives themselves have costs.
> Complexity itself is a cost. sum currently has nice simple semantics,
> which means you can reason about it: sum(sequence, start) is the same as
>
> total = start
> for item in sequence:
>     total = total + start
> return total
>

I could just as reasonably expect these semantics:

total = start
for item in sequence:

total += start
return total

Python does not contradict my expectations here:

>>> start = []
>>> x = sum([], start)
>>> x.append(1)
>>> start
[1]

> You don't have to care what the items in sequence are, you don't have to
> make assumptions about what methods sequence and start have (beyond
> supporting iteration and addition).

The only additional assumption I'm making is that Python can take
advantage of in-place addition, which is easy to introspect.

> Adding special cases to sum means it
> becomes more complex and harder to reason about. If you pass some other
> sequence type in the middle of a bunch of lists, what will happen? Will
> sum suddenly break, or perhaps continue to work but inefficiently?

This is mostly a red herring, as I would tend to use sum() on
sequences of homogenous types.

Python already gives me the power to shoot myself in the foot for
strings.

>>> list = [1, 2]
>>> list += "foo"
>>> list
[1, 2, 'f', 'o', 'o']

>>> lst = [1,2]
>>> lst.extend('foo')
>>> lst
[1, 2, 'f', 'o', 'o']

I'd prefer to get an exception for cases where += would do the same.

>>> start = []
>>> bogus_example = [[1, 2], None, [3]]
>>> for item in bogus_example: start += item
...


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

TypeError: 'NoneType' object is not iterable

> You still need to ask these questions with existing sum, but it is
> comparatively easy to answer them: you only need to consider how the
> alternative behaves when added to a list. You don't have to think about
> the technicalities of the sum algorithm itself -- sometimes it calls +,
> sometimes extend, sometimes +=, sometimes something else

I would expect sum() to support the same contract as +=, which already
works for numerics (so no backward incompatibility), and which already
works for lists. For custom-designed classes, I would rely on the
promise that augmented assignment falls back to normal methods.

> ... which of the
> various different optimized branches will I fall into this time? Who
> knows? sum already has two branches. In my opinion, three branches is one
> too many.

As long as it falls into the branch that works, I'm happy. :)

>
> "Aggregating" lists? Not summing them? I think you've just undercut your
> argument that sum is the "obvious" way of concatenating lists.
>
> In natural language, we don't talk about "summing" lists, we talk about
> joining, concatenating or aggregating them. You have just done it
> yourself, and made my point for me.

Nor do you use "chain" or "extend."

> And this very thread started because
> somebody wanted to know what the equivalent to sum for sequences.
>
> If sum was the obvious way to concatenate sequences, this thread wouldn't
> even exist.

This thread is entitled "sum for sequences." I think you just made my
point.

Steven D'Aprano

unread,
Mar 29, 2010, 11:49:46 PM3/29/10
to
On Mon, 29 Mar 2010 19:31:44 -0700, Patrick Maupin wrote:

> It's about a lack of surprises. Which, 99% of the time, Python excels
> at. This is why many of us program in Python. This is why some of us
> who would never use sum() on lists, EVEN IF IT WERE FIXED TO NOT BE SO
> OBNOXIOUSLY SLOW, advocate that it, in fact, be fixed to not be so
> obnoxiously slow.

As I said, patches are welcome. Personally, I expect that it would be
rejected, but that's not my decision to make, and who knows, perhaps I'm
wrong and you'll have some of the Python-Dev people support your idea.

sum is not designed to work with lists. It happens to work because lists
happen to use + for concatenation, and because it is too much trouble for
too little benefit to explicitly exclude lists in the same way sum
explicitly excludes strings. In the Python philosophy, simplicity of
implementation is a virtue: the code that is not there contributes
exactly no bugs and has precisely no overhead.

sum has existed as a Python built-in for many years -- by memory, since
Python 2.2, which was nearly nine years ago. Unlike the serious gotcha of
repeated string concatenation:


# DO NOT DO THIS
result = ""
for s in items:
result += s


which *does* cause real problems in real code, I don't believe that there
have been any significant problems caused by summing lists of lists. As
problems go, it is such a minor one that it isn't worth this discussion,
let alone fixing it. But if anyone disagrees, this is open source, go
ahead and fix it. You don't need my permission.

--
Steven

Steven D'Aprano

unread,
Mar 30, 2010, 2:41:51 AM3/30/10
to
On Mon, 29 Mar 2010 19:53:04 -0700, Steve Howell wrote:

> On Mar 29, 4:19 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:

[...]


>> Python is under no compulsion to make "the obvious way" obvious to
>> anyone except Guido. It's a bonus if it happens to be obvious to
>> newbies, not a requirement.
>>
>> And besides, what is "it" you're talking about?
>>
>> * Adding integers, decimals or fractions, or floats with a low
>>   requirement for precision and accuracy? Use sum.
>>
>> * Adding floats with a high requirement for precision and accuracy?
>>   Use math.fsum.
>>
>> * Concatenating strings? Use ''.join.
>>
>> * Joining lists? Use [].extend.
>>
>> * Iterating over an arbitrary sequence of arbitrary sequences?
>>   Use itertools.chain.
>>
>> That's five different "its", and five obvious ways to do them.
>>
>>
> Let's go through them...

"Obvious" doesn't mean you don't have to learn the tools you use. It
doesn't mean that there's no need to think about the requirements of your
problem. It doesn't even mean that the way to do it has to be a built-in
or pre-built solution in the standard library, or that somebody with no
Python experience could intuit the correct function to use based on
nothing more than a good grasp of English.

It certainly doesn't mean that users shouldn't be expected to know how to
import a module:


> >>> fsum([1.234534665989, 2.987, 3])
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'fsum' is not defined

I called it math.fsum every time I referred to it. Did I need to specify
that you have to import the math module first?


>> * Concatenating strings? Use ''.join.
>
>
> Common pitfall:
>
> >>> ['abc', 'def', 'ghi'].join()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'list' object has no attribute 'join'

Is it really common?

I've been hanging around this newsgroup for many years now, and I don't
believe I've ever seen anyone confused by this. I've seen plenty of
newbies use repeated string concatenation, but never anyone trying to do
a string join and getting it wrong. If you have any links to threads
showing such confusion, I'd be grateful to see them.


>> * Joining lists? Use [].extend.
>
> Obvious, yes. Convenient? Not really.
>
> >>> start = []
> >>> for list in [[1, 2], [3, 4]]:
> ... start.extend(list)
> ...
> >>> start
> [1, 2, 3, 4]


Why isn't that convenient? It is an obvious algorithm written in three
short lines. If you need a one-liner, write a function and call it:

concatenate_lists(sequence_of_lists)



>> * Iterating over an arbitrary sequence of arbitrary sequences?
>> Use itertools.chain.
>
> >>> group1 = ['al', 'bob']
> >>> group2 = ['cal']
> >>> groups = [group1, group2]
>
> Obvious if you are Dutch...

Or are familiar with the itertools module and the Pythonic practice of
iterating over lazy sequences. Iterators and itertools are fundamental to
the Pythonic way of doing things.

> >>> itertools.chain(groups)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'itertools' is not defined

That's the second time you've either mistakenly neglected to import a
module, or deliberately not imported it to make the rhetorical point that
you have to import a module before using it. Yes, you *do* have to import
modules before using them. What's your point? Not everything has to be a
built-in.

[...]


> Sum is builtin, but you have to import fsum from math and chain from
> itertools.
>
> Join is actually a method on strings, not sequences.

Is that supposed to be an argument against them?

[...]


> Just commit all that to memory, and enjoy the productivity of using a
> high level language! ;)

If you don't know your tools, you will spend your life hammering screws
in with the butt of your saw. It will work, for some definition of work.
Giving saws heavier, stronger handles to make it faster to hammer screws
is not what I consider good design.

--
Steven

Paul Rubin

unread,
Mar 30, 2010, 2:55:55 AM3/30/10
to
Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> writes:
>>>> ...
>>> ...
>> ...
> "Obvious" doesn't mean you don't have to learn the tools you use....

Geez you guys, get a room ;-). You're all good programmers with too
much experience for this arguing over stuff this silly.

Steven D'Aprano

unread,
Mar 30, 2010, 6:10:55 AM3/30/10
to

Yes Mum ;)


You're right of course. I spend too much time being this guy:

http://xkcd.com/386/

and not enough this one:

http://xkcd.com/167/


--
Steven

Steve Holden

unread,
Mar 30, 2010, 7:33:37 AM3/30/10
to Steven D'Aprano, pytho...@python.org
Yeah, it can happen to all of us. I've certainly been "that guy" too.
But Steven, surely now it's time to *show us the squirrels*

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Steve Holden

unread,
Mar 30, 2010, 7:33:37 AM3/30/10
to pytho...@python.org, pytho...@python.org

Mel

unread,
Mar 30, 2010, 9:53:52 AM3/30/10
to
Patrick Maupin wrote:

> Because sum() is the obvious way to sum floats; now the existence of
> math.fsum() means there are TWO obvious ways to sum floats. Is that
> really that hard to understand? How can you misconstrue this so badly
> that you write something that can be (easily) interpreted to mean that
> you think that I think that once math.fsum() exists, sum() doesn't
> even exist any more????

floats are nasty -- as evidence the recent thread on comparing floats for
equality. People use floats when they have to. fsum exists because of
this:

mwilson@tecumseth:~$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from math import fsum
>>> a=(1.0e200, 156.0, -1.0e200)
>>> sum(a)
0.0
>>> fsum(a)
156.0


You could generalize sum, but after that, there's a case that even fsum
can't handle:


>>> ni=1.0e200+1.0j
>>> nj=1.0+1.0e200j
>>> ai=(ni, nj, 156.0+651.0j, -ni, -nj)
>>> sum(ai)
(-1+0j)
>>> fsum(ai)


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

TypeError: can't convert complex to float; use abs(z)
>>>


Mel.

Patrick Maupin

unread,
Mar 30, 2010, 11:31:25 AM3/30/10
to
On Mar 30, 8:53 am, Mel <mwil...@the-wire.com> wrote:
> floats are nasty -- as evidence the recent thread on comparing floats for
> equality.  People use floats when they have to.  fsum exists because of
> this:

...

I understand there are technical reasons for why math.fsum() exists.
I still think that whatever math.fsum() does should probably be a part
of sum().

Regards,
Pat

Albert van der Horst

unread,
Apr 6, 2010, 9:39:38 AM4/6/10
to
In article <559a2ee3-fb2c-477f...@r1g2000yqj.googlegroups.com>,
Patrick Maupin <pma...@gmail.com> wrote:
>On Mar 29, 10:29=A0pm, Steven D'Aprano

><ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> On Mon, 29 Mar 2010 19:24:42 -0700, Patrick Maupin wrote:
>> > On Mar 29, 6:19=A0pm, Steven D'Aprano <st...@REMOVE-THIS-

>> > cybersource.com.au> wrote:
>> >> How does the existence of math.fsum contradict the existence of sum?
>>
>> > You're exceptionally good at (probably deliberately) mis-interpreting
>> > what people write.
>>
>> I cannot read your mind, I can only interpret the words you choose to
>> write. You said
>>
>> [quote]
>> See, I think the very existence of math.fsum() already violates "there
>> should be one obvious way to do it."
>> [end quote]
>>
>> If sum satisfies the existence of one obvious way, how does math.fsum
>> violate it? sum exists, and is obvious, regardless of whatever other
>> solutions exist as well.
>
>Because sum() is the obvious way to sum floats; now the existence of
>math.fsum() means there are TWO obvious ways to sum floats. Is that
>really that hard to understand? How can you misconstrue this so badly
>that you write something that can be (easily) interpreted to mean that
>you think that I think that once math.fsum() exists, sum() doesn't
>even exist any more????

To a mathematician sum(set) suggest that the order of summation
doesn't matter. (So I wouldn't use sum for concatenating lists.)
Harshly, sum() should be used only for operator + both associative and
commutative.

Now for floating point numbers the order of summation is crucial,
not commutative (a+b)+c <> a+(b+c).
So the obvious thing for someone versed in numerical computing
do is looking whether sum() gives any guarantees for order and
whether there may be a special sum() for floating point.
(This is not very realistic, because such a person would have
skimmed the math library a long time ago, but anyway.)

Met vriendelijke groeten,
Albert van der Horst

--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst


--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Patrick Maupin

unread,
Apr 6, 2010, 10:31:38 AM4/6/10
to
On Apr 6, 8:39 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:

> To a mathematician sum(set) suggest that the order of summation
> doesn't matter. (So I wouldn't use sum for concatenating lists.)
> Harshly, sum() should be used only for operator + both associative and
> commutative.

That's all well and good, but not every Python user is a
mathematician. As long as Python doesn't surprise mathematicians in a
way that is too negative (I can see the complaint now: "Hey! sum()
kept my lists ordered! I was expecting some randomization!") what is
wrong with it also not surprising the average user in a way that is
too negative?

Regards,
Pat

Neil Cerutti

unread,
Apr 6, 2010, 10:51:27 AM4/6/10
to
On 2010-04-06, Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
> To a mathematician sum(set) suggest that the order of summation
> doesn't matter. (So I wouldn't use sum for concatenating
> lists.) Harshly, sum() should be used only for operator + both
> associative and commutative.
>
> Now for floating point numbers the order of summation is
> crucial, not commutative (a+b)+c <> a+(b+c). So the obvious
> thing for someone versed in numerical computing do is looking
> whether sum() gives any guarantees for order and whether there
> may be a special sum() for floating point. (This is not very
> realistic, because such a person would have skimmed the math
> library a long time ago, but anyway.)

I'm convinced by this argument. I just have to be a mathematician
and a computer scientist skilled in numerical computing. No
problem! Just a *few more years* of education and I'll be ready
for summing things in Python. ;)

--
Neil Cerutti

0 new messages