Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sum and strings

10 views
Skip to first unread message

Paddy

unread,
Aug 18, 2006, 2:47:55 AM8/18/06
to
I was browsing the Voidspace blog item on "Flattening Lists", and
followed up on the use of sum to do the flattening.
A solution was:

>>> nestedList = [[1, 2], [3, 4], [5, 6]]
>>> sum(nestedList,[])
[1, 2, 3, 4, 5, 6]

I would not have thought of using sum in this way. When I did help(sum)
the docstring was very number-centric: It went further, and precluded
its use on strings:

>>> help(sum)
Help on built-in function sum in module __builtin__:

sum(...)
sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the
value
of parameter 'start'. When the sequence is empty, returns start.

The string preclusion would not help with duck-typing (in general), so
I decided to consult the ref doc on sum:

sum( sequence[, start])

Sums start and the items of a sequence, from left to right, and returns
the total. start defaults to 0. The sequence's items are normally
numbers, and are not allowed to be strings. The fast, correct way to
concatenate sequence of strings is by calling ''.join(sequence). Note
that sum(range(n), m) is equivalent to reduce(operator.add, range(n),
m) New in version 2.3.


The above was a lot better description of sum for me, and with an
inquisitive mind, I like to think that I might have come up with using
sum to flatten nestedList :-)
But there was still that warning about using strings.

I therefore tried sum versus their reduce "equivalent" for strings:

>>> import operator
>>> reduce(operator.add, "ABCD".split(), '')
'ABCD'
>>> sum("ABCD".split(), '')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>>

Well, after all the above, there is a question:

Why not make sum work for strings too?

It would remove what seems like an arbitrary restriction and aid
duck-typing. If the answer is that the sum optimisations don't work for
the string datatype, then wouldn't it be better to put a trap in the
sum code diverting strings to the reduce equivalent?

Just a thought,

- Paddy.

Message has been deleted

Fredrik Lundh

unread,
Aug 18, 2006, 7:04:55 AM8/18/06
to pytho...@python.org
Sybren Stuvel wrote:

>> Why not make sum work for strings too?
>

> Because of "there should only be one way to do it, and that way should
> be obvious".

I would have thought that "performance" and "proper use of English" was
more relevant, though.

</F>

Paul Rubin

unread,
Aug 18, 2006, 7:09:05 AM8/18/06
to
Sybren Stuvel <sybr...@YOURthirdtower.com.imagination> writes:
> Because of "there should only be one way to do it, and that way should
> be obvious". There are already the str.join and unicode.join methods,

Those are obvious???

Georg Brandl

unread,
Aug 18, 2006, 8:16:06 AM8/18/06
to

Why would you try to sum up strings? Besides, the ''.join idiom is quite
common in Python.

In this special case, ''.join is much faster than sum() which is why
sum() denies to concat strings.

Georg

Message has been deleted

bearoph...@lycos.com

unread,
Aug 18, 2006, 12:02:58 PM8/18/06
to
Paul Rubin:
> Sybren Stuvel:

> > Because of "there should only be one way to do it, and that way should
> > be obvious". There are already the str.join and unicode.join methods,
>
> Those are obvious???

They aren't fully obvious (because they are methods of the separator
string), but after reading some documentation about string methods, and
after some tests done on the Python shell, you too can probably use
then without much problems.

Bye,
bearophile

Steve Holden

unread,
Aug 18, 2006, 12:26:52 PM8/18/06
to pytho...@python.org
Using a bound method can make it a little more obvious.

>>> cat = "".join
>>> cat(['one', 'two', 'three'])
'onetwothree'
>>> cat([u'one', u'two', u'three'])
u'onetwothree'
>>>

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Paddy

unread,
Aug 18, 2006, 1:13:51 PM8/18/06
to
Sybren Stuvel wrote:
> Paddy enlightened us with:

> > Well, after all the above, there is a question:
> >
> > Why not make sum work for strings too?
>
> Because of "there should only be one way to do it, and that way should
> be obvious". There are already the str.join and unicode.join methods,
> which are more powerful than sum.
>
> Sybren
I get where you are coming from, but in this case we have a function,
sum, that is not as geeral as it could be. sum is already here, and
works for some types but not for strings which seems an arbitrary
limitation that impede duck typing.

- Pad.

P.S. I can see why, and am used to the ''.join method. A newbie
introduced to sum for integers might naturally try and concatenate
strings using sum too.

Paddy

unread,
Aug 18, 2006, 1:30:08 PM8/18/06
to

Sybren Stuvel wrote:
> Paddy enlightened us with:
> > Well, after all the above, there is a question:
> >
> > Why not make sum work for strings too?
>
> Because of "there should only be one way to do it, and that way should
> be obvious". There are already the str.join and unicode.join methods,
> which are more powerful than sum.
>
> Sybren
> --
> The problem with the world is stupidity. Not saying there should be a
> capital punishment for stupidity, but why don't we just take the
> safety labels off of everything and let the problem solve itself?
> Frank Zappa

Here is where I see the break in the 'flow':

>>> 1+2+3
6
>>> sum([1,2,3], 0)
6
>>> [1] + [2] +[3]
[1, 2, 3]
>>> sum([[1],[2],[3]], [])
[1, 2, 3]
>>> '1' + '2' + '3'
'123'
>>> sum(['1','2','3'], '')


Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>>

- Pad.

Georg Brandl

unread,
Aug 18, 2006, 3:06:51 PM8/18/06
to
Paddy wrote:
> Sybren Stuvel wrote:
>> Paddy enlightened us with:
>> > Well, after all the above, there is a question:
>> >
>> > Why not make sum work for strings too?
>>
>> Because of "there should only be one way to do it, and that way should
>> be obvious". There are already the str.join and unicode.join methods,
>> which are more powerful than sum.
>>
>> Sybren
> I get where you are coming from, but in this case we have a function,
> sum, that is not as geeral as it could be. sum is already here, and
> works for some types but not for strings which seems an arbitrary
> limitation that impede duck typing.

Only that it isn't arbitrary.

> - Pad.
>
> P.S. I can see why, and am used to the ''.join method. A newbie
> introduced to sum for integers might naturally try and concatenate
> strings using sum too.

Yes, and he's immediately told what to do instead.

Georg

Fredrik Lundh

unread,
Aug 18, 2006, 3:46:50 PM8/18/06
to pytho...@python.org
Paddy wrote:

> Here is where I see the break in the 'flow':
>
>>>> 1+2+3
> 6
>>>> sum([1,2,3], 0)
> 6
>>>> [1] + [2] +[3]
> [1, 2, 3]
>>>> sum([[1],[2],[3]], [])
> [1, 2, 3]
>>>> '1' + '2' + '3'
> '123'
>>>> sum(['1','2','3'], '')
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> TypeError: sum() can't sum strings [use ''.join(seq) instead]

do you often write programs that "sums" various kinds of data types, and
for which performance issues are irrelevant?

or are you just stuck in a "I have this idea" loop?

</F>

Paddy

unread,
Aug 18, 2006, 9:51:45 PM8/18/06
to

Fredrik Lundh wrote:
> Paddy wrote:
>
> > Here is where I see the break in the 'flow':
> >
> >>>> 1+2+3
> > 6
> >>>> sum([1,2,3], 0)
> > 6
> >>>> [1] + [2] +[3]
> > [1, 2, 3]
> >>>> sum([[1],[2],[3]], [])
> > [1, 2, 3]
> >>>> '1' + '2' + '3'
> > '123'
> >>>> sum(['1','2','3'], '')
> > Traceback (most recent call last):
> > File "<interactive input>", line 1, in ?
> > TypeError: sum() can't sum strings [use ''.join(seq) instead]
>

Hi Frederik,


> do you often write programs that "sums" various kinds of data types,
> and for which performance issues are irrelevant?

I was asking if sum was indeed an optimization that did not work for
strings.
I was pointing out its affects on duck-typing.
I was hoping that someone might know the history and implementation of
sum and could point out the design decisions taken at the time and/or
discuss the merits of making sum accept strings. or indeed any type
that works with operator.add and that has an additive identity value.

Pythons designers seem to know and apply the advantages of having fewer
'themes' that can be applied with less constraints I am curious about
such a constraint on sum.

- Paddy.

Paul Rubin

unread,
Aug 18, 2006, 9:58:00 PM8/18/06
to
Georg Brandl <g.brand...@gmx.net> writes:
> Why would you try to sum up strings? Besides, the ''.join idiom is quite
> common in Python.

Just because it's common doesn't mean it's obvious. In my opinion
it's as ugly as sin, and the fact that it's an idiom shows a
shortcoming in Python. The obvious reason for summing strings is that
it's a long-established feature of Python that a+b concatenates two
strings, so summing a,b,c,d,e should result in a+b+c+d+e.

Paddy

unread,
Aug 18, 2006, 9:59:05 PM8/18/06
to

Georg Brandl wrote:
> Paddy wrote:
<<SNIP>>

> > I get where you are coming from, but in this case we have a function,
> > sum, that is not as geeral as it could be. sum is already here, and
> > works for some types but not for strings which seems an arbitrary
> > limitation that impede duck typing.
>
> Only that it isn't arbitrary.
Hi Georg,
I said it *seemed* arbitrary. I doubt that it is arbitrary, and thought
someone would say why the restriction is necessary.

>
> > - Pad.
> >
> > P.S. I can see why, and am used to the ''.join method. A newbie
> > introduced to sum for integers might naturally try and concatenate
> > strings using sum too.
>
> Yes, and he's immediately told what to do instead.

Yep, thats the what. Now as to the why?

- paddy.

Paul Rubin

unread,
Aug 18, 2006, 10:00:00 PM8/18/06
to
Sybren Stuvel <sybr...@YOURthirdtower.com.imagination> writes:
> > Those are obvious???
> Yup. Just read the string documentation and you're off.

Huh? Just because some obscure weirdness like this is in the manual,
doesn't make it obvious or natural.

Paul Rubin

unread,
Aug 18, 2006, 10:08:37 PM8/18/06
to
"Paddy" <padd...@netscape.net> writes:
> Pythons designers seem to know and apply the advantages of having fewer
> 'themes' that can be applied with less constraints I am curious about
> such a constraint on sum.

The opposing argument is that sum is sort of like reduce, i.e.
sum((a,b,c,d)) could conceivably implemented as

temp = a
temp += b
temp += c
temp += d
return temp

If the args are strings, the above is a quadratic time algorithm and
it's better to throw an error than create such a trap for an unwary user.
The obvious fix is for the manual to specify that the sum function should
do the right thing and use a sensible algorithm.

Semi-relevant quotation:

"Let's take an example. Consider an abstract data type stack. It's not
enough to have Push and Pop connected with the axiom wherein you push
something onto the stack and after you pop the stack you get the same
thing back. It is of paramount importance that pushing the stack is a
constant time operation regardless of the size of the stack. If I
implement the stack so that every time I push it becomes slower and
slower, no one will want to use this stack.

We need to separate the implementation from the interface but not at
the cost of totally ignoring complexity. Complexity has to be and is a
part of the unwritten contract between the module and its user. The
reason for introducing the notion of abstract data types was to allow
interchangeable software modules. You cannot have interchangeable
modules unless these modules share similar complexity behavior. If I
replace one module with another module with the same functional
behavior but with different complexity tradeoffs, the user of this
code will be unpleasantly surprised. I could tell him anything I like
about data abstraction, and he still would not want to use the
code. Complexity assertions have to be part of the interface."

--Alex Stepanov (designer of C++ standard template library)
http://www.sgi.com/tech/stl/drdobbs-interview.html

Paddy

unread,
Aug 18, 2006, 10:37:52 PM8/18/06
to


Thanks Paul.
I also found this from Guido:
http://mail.python.org/pipermail/python-dev/2003-April/034853.html
And this, in the same thread:
http://mail.python.org/pipermail/python-dev/2003-April/034854.html

So,
The upshot is that complexity matters. The algorithm used in sum is
'very wrong' for use with strings, and their is no clean way to switch
to the preferred method for strings.( ''.join() ).


Thanks group,
- Paddy.

Paul Rubin

unread,
Aug 18, 2006, 10:47:07 PM8/18/06
to
"Paddy" <padd...@netscape.net> writes:
> I also found this from Guido:
> http://mail.python.org/pipermail/python-dev/2003-April/034853.html
> And this, in the same thread:
> http://mail.python.org/pipermail/python-dev/2003-April/034854.html
>
> So,
> The upshot is that complexity matters. The algorithm used in sum is
> 'very wrong' for use with strings, and their is no clean way to switch
> to the preferred method for strings.( ''.join() ).

The clean fix is to use cStringIO to build the result string.

Message has been deleted

Georg Brandl

unread,
Aug 19, 2006, 1:39:57 AM8/19/06
to

Which is exactly how I would concatenate five strings.

For concatenating longer sequences of strings, however, if it needs to be
done fast, performance-wise, this approach is not sensible.

Georg

Bill Pursell

unread,
Aug 19, 2006, 3:11:25 AM8/19/06
to

Georg Brandl wrote:
> Paul Rubin wrote:
> > Sybren Stuvel <sybr...@YOURthirdtower.com.imagination> writes:
> >> Because of "there should only be one way to do it, and that way should
> >> be obvious". There are already the str.join and unicode.join methods,
> >
> > Those are obvious???
>
> Why would you try to sum up strings? Besides, the ''.join idiom is quite
> common in Python.

One could extend this argument to dissallow the following:
>>> "foo" + "bar"

Message has been deleted

Georg Brandl

unread,
Aug 19, 2006, 6:27:44 PM8/19/06
to

Fortunately, hypergeneralization is not one of Python's weaknesses.

Georg

Rhamphoryncus

unread,
Aug 19, 2006, 9:42:14 PM8/19/06
to

It's worthwhile to note that the use of + as the concatenation operator
is arbitrary. It could just have well been | or &, and has no
relationship with mathematically addition. Were history different
perhaps Guido would have gone with | or & instead, and we wouldn't be
having this conversation.

It's also worth stressing (not in response to your post, but others)
that sum([[1],[2],[3]], []) is just as bad as attempting to sum
strings, both conceptually (it's not mathematical addition), and
performance-wise. Don't do it. :)

I believe the prefered method to flatten a list of lists is this:

shallow = []
for i in deep:
shallow.extend(i)

Yes, it's three lines. It's also very easy to read. reduce() and
sum() are not.

Carl Banks

unread,
Aug 19, 2006, 10:57:54 PM8/19/06
to
Rhamphoryncus wrote:
> It's also worth stressing (not in response to your post, but others)
> that sum([[1],[2],[3]], []) is just as bad as attempting to sum
> strings, both conceptually (it's not mathematical addition), and
> performance-wise. Don't do it. :)

Interesting. I would have guessed that, internally, sum was
implemented with in-place operations (except for the first operation,
which would have to be non-in-place so that it wouldn't modify a
mutable initial value). But looking at the source, I see that I would
have guessed wrongly: it does not use in-place operations. So,
although lists are optimized for growing, sum would end up creating a
new list each iteration anyways. Thus it does have the same penalty
that strings would have. Good to know.

Obviously, sum can't check for every possible type that would be
inefficient, but at least it could check for common builtins (str,
list, tuple). That it only checks for strings suggests to me that this
really is just a case of protecting the unwary. Concatenating strings
is common enough, and the drawbacks of using sum bad enough, that a
special case was considered justified. Lists and tuples, though
theoretically they have the same issues as strings, probably don't
justify a special case because they're not as common.


> I believe the prefered method to flatten a list of lists is this:
>
> shallow = []
> for i in deep:
> shallow.extend(i)
>
> Yes, it's three lines. It's also very easy to read. reduce() and
> sum() are not.

sum() is really for numerical uses; people ought to just stick to using
it that way.


Carl Banks

Paddy

unread,
Aug 20, 2006, 5:44:03 AM8/20/06
to

Rhamphoryncus wrote:

>
> It's worthwhile to note that the use of + as the concatenation operator
> is arbitrary. It could just have well been | or &, and has no
> relationship with mathematically addition.

The effect of the string concatenation operator is only secondary.
Secondary to the use of the word sum; and what could be 'reasonably'
concieved as sum's effect on non-numeric types.


>
> It's also worth stressing (not in response to your post, but others)
> that sum([[1],[2],[3]], []) is just as bad as attempting to sum
> strings, both conceptually (it's not mathematical addition)

Unfortunately, the words sum and summation are linked to other words
that are commonly used to describe what is happening to the numders,
the lists, and the strings.
Words such as accumulate, concatenate, aggregate, collect, assemble, as
well as add.

> and performance-wise. Don't do it. :)

Amen to that.

> I believe the prefered method to flatten a list of lists is this:
>
> shallow = []
> for i in deep:
> shallow.extend(i)
>
> Yes, it's three lines. It's also very easy to read. reduce() and
> sum() are not.

I'd like to squeeze in the listcomp version, not because it is one line
shorter, but because I, and maybe others prefer short listcomps such as
the folowing:

shallow = []
[shallow.extend(i) for i in deep]

-Pad.

Rhamphoryncus

unread,
Aug 20, 2006, 6:18:44 AM8/20/06
to
Paddy wrote:
> Rhamphoryncus wrote:
>
> >
> > It's worthwhile to note that the use of + as the concatenation operator
> > is arbitrary. It could just have well been | or &, and has no
> > relationship with mathematically addition.
>
> The effect of the string concatenation operator is only secondary.
> Secondary to the use of the word sum; and what could be 'reasonably'
> concieved as sum's effect on non-numeric types.
> >
> > It's also worth stressing (not in response to your post, but others)
> > that sum([[1],[2],[3]], []) is just as bad as attempting to sum
> > strings, both conceptually (it's not mathematical addition)
>
> Unfortunately, the words sum and summation are linked to other words
> that are commonly used to describe what is happening to the numders,
> the lists, and the strings.
> Words such as accumulate, concatenate, aggregate, collect, assemble, as
> well as add.

String concatenation and numeric addition only group together under the
most vague of english terms, "putting things together". String
concatenation violates many mathematical properties of
addition/summation, and simply isn't related.

It is unfortunate that many languages (including python) promote the
confusion by using + for a "put things together" operator, ie both
mathematical addition and string concatenation.


> > I believe the prefered method to flatten a list of lists is this:
> >
> > shallow = []
> > for i in deep:
> > shallow.extend(i)
> >
> > Yes, it's three lines. It's also very easy to read. reduce() and
> > sum() are not.
>
> I'd like to squeeze in the listcomp version, not because it is one line
> shorter, but because I, and maybe others prefer short listcomps such as
> the folowing:
>
> shallow = []
> [shallow.extend(i) for i in deep]

I'm sure this has been mentioned before, but listcomps are for when you
want to store the list and use it for further things, not for when you
want a side effect. TOOWTDI.

And of course, if saving a line was the reason:

Gerhard Fiedler

unread,
Aug 20, 2006, 8:49:03 AM8/20/06
to pytho...@python.org
On 2006-08-20 07:18:44, Rhamphoryncus wrote:

>> shallow = []
>> [shallow.extend(i) for i in deep]
>
> I'm sure this has been mentioned before, but listcomps are for when you
> want to store the list and use it for further things, not for when you
> want a side effect. TOOWTDI.

Can you please explain what you mean with this, and maybe why?

Thanks,
Gerhard

Marc 'BlackJack' Rintsch

unread,
Aug 20, 2006, 9:05:12 AM8/20/06
to
In <mailman.9571.1156078...@python.org>, Gerhard Fiedler
wrote:

You should not abuse list comps just to have a one liner. Only use them
if you really want to build a list and not just for side effects. The
above one-liner builds a list of `None`\s of length ``len(deep)`` for no
reason just to throw them away.

Ciao,
Marc 'BlackJack' Rintsch

Alex Martelli

unread,
Aug 20, 2006, 2:57:24 PM8/20/06
to
Rhamphoryncus <rha...@gmail.com> wrote:
...

> > > I believe the prefered method to flatten a list of lists is this:
> > >
> > > shallow = []
> > > for i in deep:
> > > shallow.extend(i)
> > >
> > > Yes, it's three lines. It's also very easy to read. reduce() and
> > > sum() are not.
> >
> > I'd like to squeeze in the listcomp version, not because it is one line
> > shorter, but because I, and maybe others prefer short listcomps such as
> > the folowing:
> >
> > shallow = []
> > [shallow.extend(i) for i in deep]
>
> I'm sure this has been mentioned before, but listcomps are for when you
> want to store the list and use it for further things, not for when you
> want a side effect. TOOWTDI.
>
> And of course, if saving a line was the reason:
>
> shallow = []
> for i in deep: shallow.extend(i)

Another alternative equivalent to

shallow = sum(deep, [])

but based on the "proper" usage of list comprehensions is

shallow = [ item for sublist in deep for item in sublist ]


In terms of performance, however, the simple loop that you (rhamph)
posted is generally best -- e.g., with Python 2.5c1 on a MacbookPro:

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9'
's=sum(deep,[])'
100000 loops, best of 3: 11.2 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]
> for sublist in deep: s.extend(sublist)'
100000 loops, best of 3: 6.92 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]
[s.extend(sublist) for sublist in deep]'
100000 loops, best of 3: 8.48 usec per loop

brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[item
for sublist in deep for item in sublist]'
100000 loops, best of 3: 17.1 usec per loop


Alex

Steven D'Aprano

unread,
Aug 21, 2006, 5:36:21 AM8/21/06
to
On Fri, 18 Aug 2006 19:08:37 -0700, Paul Rubin wrote:

> If the args are strings, the above is a quadratic time algorithm and
> it's better to throw an error than create such a trap for an unwary user.

That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

>>> s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
>>> s.timeit()
1.3470098972320557
>>> t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
>>> t.timeit()
1.0698421001434326

There's a word for optimizations that actually result in code running 25%
slower.

I applaud that Python's language developers care about efficiency. I
applaud that the documentation warns people about traps and potential
areas of inefficiency. But I think it is a shame that sum() does a
special type-check to avoid something which is only sometimes slow. It
doesn't protect against O(n**2) performance; it merely protects against
just one of an infinite number of possible "traps for the unwary".

I would have thought it would be better for sum() to raise a warning, not
an exception. If we take seriously the argument that sum implies
addition, and that string concatenation isn't really addition, then sum()
should also refuse to operate on lists and tuples and any other
non-numeric class. Either would be better than sum() "protecting" the user
from himself, except when it doesn't.

--
Steven D'Aprano

Georg Brandl

unread,
Aug 21, 2006, 5:57:19 AM8/21/06
to

Well, present that on python-dev.

Georg

Fredrik Lundh

unread,
Aug 21, 2006, 6:41:14 AM8/21/06
to pytho...@python.org
Alex Martelli wrote:

> In terms of performance, however, the simple loop that you (rhamph)
> posted is generally best -- e.g., with Python 2.5c1 on a MacbookPro:
>
> brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9'
> 's=sum(deep,[])'
> 100000 loops, best of 3: 11.2 usec per loop
>
> brain:~/downloads alex$ python -mtimeit -s'deep=[range(9)]*9' 's=[]
>> for sublist in deep: s.extend(sublist)'
> 100000 loops, best of 3: 6.92 usec per loop

at least on this machine, map(s.extend) is slightly faster than the loop:

timeit -s"deep=[range(9)]*9" "s=[]" "for sublist in deep:
s.extend(sublist)"

100000 loops, best of 3: 5.59 usec per loop

timeit -s"deep=[range(9)]*9" "s=[]" "map(s.extend, deep)"
100000 loops, best of 3: 5.26 usec per loop

</F>

Fredrik Lundh

unread,
Aug 21, 2006, 6:55:14 AM8/21/06
to pytho...@python.org
Steven D'Aprano wrote:

> That's a nonsense argument. There is no shortage of slow algorithms, and
> some of them are built into Python. When did O(n**2) become an error
> condition? And overhead matters: if I'm only doing a few concatenations,
> it is significantly faster to add the strings using + than to use join()
> (at least in Python 2.3 -- YMMV):
>
>>>> s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
>>>> s.timeit()
> 1.3470098972320557
>>>> t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
>>>> t.timeit()
> 1.0698421001434326

and what exactly does the fact that Python can do operator-based
dispatch much faster than it can do method-based dispatch have to
do with sum's inability to add strings ?

did you have some kind of "zero overhead for some function calls"
optimization in mind ?

</F>

Carl Banks

unread,
Aug 21, 2006, 12:54:41 PM8/21/06
to
Steven D'Aprano wrote:
>But I think it is a shame that sum() does a
> special type-check to avoid something which is only sometimes slow. It
> doesn't protect against O(n**2) performance; it merely protects against
> just one of an infinite number of possible "traps for the unwary".

I'm not so sure special casing was a good idea myself, but...

If you allow for a small number of special cases in the language to
protect against the most harmful and prone traps, this case would be
one of the top ones, IMHO.


Carl Banks


and, you know, if you really want to concatenate strings with sum, you
can

class noopadd(object):
def __add__(self,other):
return other

sum(["abc","def","ghi"],noopadd())

Paddy

unread,
Aug 21, 2006, 2:29:13 PM8/21/06
to

Carl Banks wrote:
>
> and, you know, if you really want to concatenate strings with sum, you
> can
>
> class noopadd(object):
> def __add__(self,other):
> return other
>
> sum(["abc","def","ghi"],noopadd())

;-)

- Pad.

Steven D'Aprano

unread,
Aug 21, 2006, 10:13:13 PM8/21/06
to
On Mon, 21 Aug 2006 12:55:14 +0200, Fredrik Lundh wrote:

> Steven D'Aprano wrote:
>
>> That's a nonsense argument. There is no shortage of slow algorithms, and
>> some of them are built into Python. When did O(n**2) become an error
>> condition? And overhead matters: if I'm only doing a few concatenations,
>> it is significantly faster to add the strings using + than to use join()
>> (at least in Python 2.3 -- YMMV):
>>
>>>>> s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
>>>>> s.timeit()
>> 1.3470098972320557
>>>>> t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
>>>>> t.timeit()
>> 1.0698421001434326
>
> and what exactly does the fact that Python can do operator-based
> dispatch much faster than it can do method-based dispatch have to
> do with sum's inability to add strings ?

Sorry, I don't get you here. Are you saying that string addition doesn't
call the operands' __add__ methods?

Obviously I would have preferred to have done a direct test of sum()
versus join(), but I can't since sum() goes out of its way to make that
difficult.

Or maybe I can...

>>> setup0 = """class Magic:
... def __add__(self, other): return other
...
... L = ['a', 'b']
... M = Magic()
... """
>>>
>>> t5 = timeit.Timer("sum(L, M)", setup)
>>> t5.timeit()
12.319401025772095

That's bizarrely slow for concatenating two characters. It's an order of
magnitude slower than doing a direct addition of two characters, and about
one third the speed of a pure Python implementation:

>>> setup1 = """def mysum(strlist):
... t = ""
... for s in strlist: t = t+s
... return t
...
... a, b, c, d, = 'a', 'b', 'c', 'd'
... """
>>>
>>> t6 = timeit.Timer("mysum([a,b,c,d])", setup1)
>>> t6.timeit()
4.3943448066711426

What's going on here? What have I missed? I know adding strings is
O(n**2), but these results don't make any sense to me.

--
Steven D'Aprano

Fredrik Lundh

unread,
Aug 22, 2006, 2:05:13 AM8/22/06
to pytho...@python.org
Steven D'Aprano wrote:

>> and what exactly does the fact that Python can do operator-based
>> dispatch much faster than it can do method-based dispatch have to
>> do with sum's inability to add strings ?
>
> Sorry, I don't get you here. Are you saying that string addition doesn't
> call the operands' __add__ methods?

the "+" instruction (BINARY_ADD) special-cases integers and 8-bit
strings, and uses a single C call via a type slot for other types.

in contrast, "join" and "sum" needs to do a full name lookup and a
full Python-level function call. compared to the effort needed to
copy 100 bytes, that's a rather expensive set of operations.

</F>

neophyte

unread,
Aug 23, 2006, 4:34:31 PM8/23/06
to
Fredrik Lundh wrote:
> do you often write programs that "sums" various kinds of data types, and
> for which performance issues are irrelevant?
>
> or are you just stuck in a "I have this idea" loop?

Maybe he's just insisting on the principle of least surprise?
Personally, I'd rather expect sum() to work for strings and Python to
issue a warning instead of raising a type error. That warning might
also be appropriate when using sum() on other builtin sequence types.

Tim Chase

unread,
Aug 23, 2006, 5:24:25 PM8/23/06
to neophyte, pytho...@python.org
> Maybe he's just insisting on the principle of least surprise?
> Personally, I'd rather expect sum() to work for strings and Python to
> issue a warning instead of raising a type error. That warning might
> also be appropriate when using sum() on other builtin sequence types.

In its own way, it makes good sense to not be surprising:

>>> # using regular number
>>> n1 = 1
>>> n2 = 2
>>> sum([n1,n2])
3

>>> # using complex numbers
>>> c1 = 1+2j
>>> c2 = 5+11j
>>> sum([c1,c2])
(6+13j)

>>> # using quaternion-ish objects
>>> class Q(object):
... def __init__(self, n, i,j,k):
... self.n = n
... self.i = i
... self.j = j
... self.k = k
... def __add__(self, other):
... return Q(self.n+other.n,
... self.i+other.i,
... self.j+other.j,
... self.k+other.k)
...
>>> q1 = Q(1,2,3,5)
>>> q2 = Q(7,11,13,17)
>>> q3 = q1 + q2
>>> q3.n, q3.i, q3.j, q3.k
(8, 13, 16, 22)
>>> sum([q1,q2])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'q'


Just because something is slow or sub-optimal doesn't mean it
should be an error. Otherwise calls to time.sleep() should throw
an error...

>>> time.sleep(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
InefficentError: call takes more than 1 second to complete


[scratches head]

+1 regarding principle of least surprise on sum()

-tkc

Fredrik Lundh

unread,
Aug 24, 2006, 2:17:45 AM8/24/06
to pytho...@python.org
Tim Chase wrote:

> >>> q3 = q1 + q2
> >>> q3.n, q3.i, q3.j, q3.k
> (8, 13, 16, 22)
> >>> sum([q1,q2])
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: unsupported operand type(s) for +: 'int' and 'q'
>
> Just because something is slow or sub-optimal doesn't mean it
> should be an error.

that's not an error because it would be "slow or sub-optimal" to add
custom objects, that's an error because you don't understand how "sum"
works.

(hint: sum != reduce)

</F>

Tim Chase

unread,
Aug 24, 2006, 12:12:14 PM8/24/06
to Fredrik Lundh, pytho...@python.org
>> Just because something is slow or sub-optimal doesn't mean it
>> should be an error.
>
> that's not an error because it would be "slow or sub-optimal" to add
> custom objects, that's an error because you don't understand how "sum"
> works.
>
> (hint: sum != reduce)

No, clearly sum!=reduce...no dispute there...

so we go ahead and get the sum([q1,q2]) working by specifying a
starting value sum([q1,q2], Q()):

>>> class Q(object):
... def __init__(self, n=0, i=0,j=0,k=0):


... self.n = n
... self.i = i
... self.j = j
... self.k = k
... def __add__(self, other):
... return Q(self.n+other.n,
... self.i+other.i,
... self.j+other.j,
... self.k+other.k)

... def __repr__(self):
... return "<Q(%i,%i,%i,%i)>" % (
... self.n,
... self.i,
... self.j,
... self.k)


...
>>> q1 = Q(1,2,3,5)
>>> q2 = Q(7,11,13,17)

>>> q1+q2
<Q(8,13,16,22)>


>>> sum([q1,q2])
Traceback (most recent call last):
File "<stdin>", line 1, in ?

TypeError: unsupported operand type(s) for +: 'int' and 'Q'
>>> sum([q1,q2], Q())
<Q(8,13,16,22)>


Thus, sum seems to work just fine for objects containing an
__add__ method. However, strings contain an __add__ method.

>>> hasattr("", "__add__")
True

yet, using the same pattern...

>>> sum(["hello", "world"], "")


Traceback (most recent call last):
File "<stdin>", line 1, in ?

TypeError: sum() can't sum strings [use ''.join(seq) instead]


Which seems like an arbitrary prejudice against strings...flying
in the face of python's duck-typing. If it has an __add__
method, duck-typing says you should be able to provide a starting
place and a sequence of things to add to it, and get the sum.

However, a new sum2() function can be created...

>>> def sum2(seq, start=0):
... for item in seq:
... start += item
... return start
...

which does what one would expect the definition of sum() should
be doing behind the scenes.

>>> # generate the expected error, same as above
>>> sum2([q1,q2])


Traceback (most recent call last):
File "<stdin>", line 1, in ?

File "<stdin>", line 3, in sum2
TypeError: unsupported operand type(s) for +=: 'int' and 'Q'
>>> # employ the same solution of a proper starting point
>>> sum2([q1,q2], Q())
<Q(8,13,16,22)>
>>> # do the same thing for strings
>>> sum2(["hello", "world"], "")
'helloworld'

and sum2() works just like sum(), only it happily takes strings
without prejudice.

From help(sum):
"Returns the sum of a sequence of numbers (NOT strings) plus the
value of parameter 'start'. When the sequence is empty, returns
start."

It would be as strange as if enumerate() didn't take strings, and
instead forced you to use some other method for enumerating strings:

>>> for i,c in enumerate("hello"): print i,c


Traceback (most recent call last):
File "<stdin>", line 1, in ?

TypeError: enumerate() can't enumerate strings [use
"hello".enumerator() instead]

Why the arbitrary breaking of duck-typing for strings in sum()?
Why make them second-class citizens?

The interpreter is clearly smart enough to recognize when the
condition occurs such that it can throw the error...thus, why not
add a few more smarts and have it simply translate it into
"start+''.join(sequence)" to maintain predictable behavior
according to duck-typing?

-tkc


Fredrik Lundh

unread,
Aug 24, 2006, 12:24:19 PM8/24/06
to pytho...@python.org
Tim Chase wrote:

> The interpreter is clearly smart enough to recognize when the
> condition occurs such that it can throw the error...thus, why not
> add a few more smarts and have it simply translate it into
> "start+''.join(sequence)" to maintain predictable behavior
> according to duck-typing?

"join" doesn't use __add__ at all, so I'm not sure in what sense that
would be more predictable. I'm probably missing something, but I cannot
think of any core method that uses a radically different algorithm based
on the *type* of one argument.

besides, in all dictionaries I've consulted, the word "sum" means
"adding numbers". are you sure it wouldn't be more predictable if "sum"
converted strings to numbers ?

(after all, questions about type errors like "cannot concatenate 'str'
and 'int' objects" and "unsupported operand type(s) for +: 'int' and
'str'" are a *lot* more common than questions about sum() on string lists.)

</F>

Gabriel Genellina

unread,
Aug 24, 2006, 12:40:57 PM8/24/06
to Tim Chase, pytho...@python.org
At Thursday 24/8/2006 13:12, Tim Chase wrote:

>The interpreter is clearly smart enough to recognize when the
>condition occurs such that it can throw the error...thus, why not
>add a few more smarts and have it simply translate it into
>"start+''.join(sequence)" to maintain predictable behavior
>according to duck-typing?

sequences don't have to be homogeneous, and iterators cant go back.
But let GvR say that in his own words:
http://mail.python.org/pipermail/python-dev/2003-April/034854.html

Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Fredrik Lundh

unread,
Aug 24, 2006, 1:12:17 PM8/24/06
to pytho...@python.org
Gabriel Genellina wrote:

> sequences don't have to be homogeneous, and iterators cant go back.
> But let GvR say that in his own words:
> http://mail.python.org/pipermail/python-dev/2003-April/034854.html

you could of course dispatch on the type of the second argument (the
start value), but that'd be at least as silly. here's the relevant
pronouncement:

http://mail.python.org/pipermail/python-dev/2003-April/034853.html

and here's an elaboration by the martellibot:

http://mail.python.org/pipermail/python-dev/2003-April/034855.html

(and note that the python-dev consensus is that sum is for numbers and
join is for strings. anyone who cares about writing readable code knows
that names matter; different things should have different names.)

(I still think a "join" built-in would be nice, though. but anyone who
argues that "join" should support numbers too will be whacked with a
great big halibut.)

</F>

Paul Rubin

unread,
Aug 24, 2006, 6:44:58 PM8/24/06
to
Fredrik Lundh <fre...@pythonware.com> writes:
> "join" doesn't use __add__ at all, so I'm not sure in what sense that
> would be more predictable. I'm probably missing something, but I
> cannot think of any core method that uses a radically different
> algorithm based on the *type* of one argument.

3 ** 2, vs. 3 ** 2.5

Actually it's worse than depending on just the type:

(-3.0) ** 2.0 works, while (-3.0) ** 2.5 throws an error.

But you can do

(-3.0) ** (2.5 + 0.0j)

for yet another radically different algorithm.

I'm not sure whether the Python spec requires (-3.0)**2.0 to work, or
if it's just an implementation-specific hack. In most languages, it
wouldn't work.

Steven D'Aprano

unread,
Aug 25, 2006, 12:19:26 AM8/25/06
to
On Thu, 24 Aug 2006 18:24:19 +0200, Fredrik Lundh wrote:

> besides, in all dictionaries I've consulted, the word "sum" means
> "adding numbers". are you sure it wouldn't be more predictable if "sum"
> converted strings to numbers ?

And then on Thu, 24 Aug 2006 19:12:17 +0200 Fredrik also wrote:

> (and note that the python-dev consensus is that sum is for numbers and
> join is for strings. anyone who cares about writing readable code knows
> that names matter; different things should have different names.)

And yet sum() sums lists, tuples, custom classes, and many other things
which are not numbers. Did python-dev not notice this? I doubt it.

I've read all the arguments in favour of type-checking the arguments to
sum() (actually only the second argument). I'm not convinced that this
case is special enough to break the rules by raising an exception for
strings. (Insert usual arguments about consistency, duck typing, least
surprise, etc.)

There is an alternative: raise a warning instead of an exception. That
permits the admirable strategy of educating users that join() is
generally a better way to concatenate a large number of strings, while
still allowing programmers who want to shoot themselves in the foot to do
so. Python generally allows the programmer to shoot himself in the foot,
if the programmer insists, e.g. private attributes are by convention, not
enforced by the language. That's one of the nice things about the
language: it *suggests* what to do, but doesn't *insist* it knows what you
want better than you do.

--
Steven D'Aprano

Paul Rubin

unread,
Aug 25, 2006, 12:40:16 AM8/25/06
to
Steven D'Aprano <st...@REMOVEME.cybersource.com.au> writes:
> There is an alternative: raise a warning instead of an exception. That
> permits the admirable strategy of educating users that join() is
> generally a better way to concatenate a large number of strings, while
> still allowing programmers who want to shoot themselves in the foot to do so.

This is reasonable, but why not just do the right thing and use the
right algorithm? There is, after all, no guarantee in the docs that
''.join() uses the right algorithm either. I've come to the view that
the docs should explicitly specify this type of thing, per the
Stepanov interview mentioned earlier. You may be right that even when
it's not clear what the best algorithm is, using a correct but slow
algorithm is preferable to throwing an error.

But I think for strings, we should rethink how this kind of operation
is done, and build up the sum of strings in terms of some kind of mutable
string object resembling Java's StringBuf or Python's cStringIO objects.

Hendrik van Rooyen

unread,
Aug 25, 2006, 3:15:10 AM8/25/06
to pytho...@python.org
"Fredrik Lundh" <fre...@pythonware.com> Wrote:

8<----------------------------------------

| (I still think a "join" built-in would be nice, though. but anyone who
| argues that "join" should support numbers too will be whacked with a
| great big halibut.)
|
| </F>

Strange this - you don't *LOOK* like a Gaulish Blacksmith...

Bryan Olson

unread,
Aug 25, 2006, 5:18:18 AM8/25/06
to
Fredrik Lundh wrote:
> [...] besides, in all dictionaries I've consulted, the word "sum"
> means "adding numbers".

That's a result of not looking deeply enough.

Fredrik Lundh is partially right, in that "Sum" usually refers
to addition of numbers. Nevertheless, the idea that "sum" must
refer to numbers is demonstrably wrong: Googling "define: sum"
shows several counter-examples, the first of which is:

The final aggregate; "the sum of all our troubles did not
equal the misery they suffered."

Numbers? What numbers? Lundh may be telling the truth about
all the dictionaries he consulted, but anyone with Internet
access could have looked -- and found -- further.

Dictionary definition is not really the issue. Who could miss
the correspondence between the addition operation and the sum
function? Why would we choose a language or type system that
cannot even make "+" and "sum" behave consistently?


> are you sure it wouldn't be more predictable if "sum"
> converted strings to numbers ?

Definitely reject that idea. People so misguided as to want
silent string-to-int conversion can use Perl.

The problem here is unifying the "+" operator and the "sum"
function, The former is one of Python's several 'special'
methods; the latter is one of Python's pre-defined
functions. See:

http://docs.python.org/ref/numeric-types.html
http://docs.python.org/lib/built-in-funcs.html

And herein is the problem: A class may implement "__add__" any
way the programmer chooses. Python should require, or at least
document requirements, on further properties of addition. Python
should insist that addition be symmetric an transitive, and
classes implementing addition provide an additive identity.


> (after all, questions about type errors like "cannot concatenate 'str'
> and 'int' objects" and "unsupported operand type(s) for +: 'int' and
> 'str'" are a *lot* more common than questions about sum() on string lists.)

Right. Duck-typing can work, while sloppy typing is doomed.


--
--Bryan

Paul Rubin

unread,
Aug 25, 2006, 10:47:10 AM8/25/06
to
Bryan Olson <fakea...@nowhere.org> writes:
> And herein is the problem: A class may implement "__add__" any
> way the programmer chooses. Python should require, or at least
> document requirements, on further properties of addition. Python
> should insist that addition be symmetric an transitive, and
> classes implementing addition provide an additive identity.

Are you saying "abc"+"def" should not be concatenation? I guess
that's reasonable. As long as + is string concatenation though, the
principle of least astonishment suggests that "sum" should
conconcatenate several strings.

I'm not sure what you mean by addition being symmetric or transitive;
it is not an equivalence relation. Do you mean commutative and
associative? I'm not sure if floating-point arithmetic has those
properties, strictly speaking.

Neil Cerutti

unread,
Aug 25, 2006, 11:08:31 AM8/25/06
to

The interesting part of the discussion, to me, was that Alex
Martelli's initial implementation included the string
optimization so that a list with a string as its first element
called ''.join(lst) instead. But that optimization turns out to
be valid only if every element of the list is a string.

So there isn't, it seems, a practical way of implementing the
sum(list of strings) -> ''.join(list of strings optimization.

--
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper

Paul Rubin

unread,
Aug 25, 2006, 11:18:25 AM8/25/06
to
Neil Cerutti <hor...@yahoo.com> writes:
> So there isn't, it seems, a practical way of implementing the
> sum(list of strings) -> ''.join(list of strings optimization.

''.join may not be the right way to do it, but obviously there are
other ways. This isn't rocket science.

Bryan Olson

unread,
Aug 25, 2006, 11:29:40 AM8/25/06
to
Paul Rubin wrote:
> Are you saying "abc"+"def" should not be concatenation? I guess
> that's reasonable.

No, I'm definitely not saying that, or at least I didn't mean
that.

> As long as + is string concatenation though, the
> principle of least astonishment suggests that "sum" should
> conconcatenate several strings.

Absolutely.

> I'm not sure what you mean by addition being symmetric or transitive;
> it is not an equivalence relation. Do you mean commutative and
> associative?

Oops, yes, of course. Posting too late at night.


--
--Bryan

Paddy

unread,
Aug 25, 2006, 12:17:23 PM8/25/06
to

Paddy wrote:
<<SNIP>>
> Why not make sum work for strings too?
>
> It would remove what seems like an arbitrary restriction and aid
> duck-typing. If the answer is that the sum optimisations don't work for
> the string datatype, then wouldn't it be better to put a trap in the
> sum code diverting strings to the reduce equivalent?
>
> Just a thought,
>
> - Paddy.

from __no_future__ import sum

assert "Not likely" == sum(["Not ", "likely"], "", least_surprise=True)

:-)

Fredrik Lundh

unread,
Aug 25, 2006, 6:08:28 PM8/25/06
to pytho...@python.org
Hendrik van Rooyen wrote:

> | (I still think a "join" built-in would be nice, though. but anyone who
> | argues that "join" should support numbers too will be whacked with a
> | great big halibut.)
>

> Strange this - you don't *LOOK* like a Gaulish Blacksmith...

no, but I have a nice safari outfit.

(hint: this is comp.lang.python, not comp.lang.asterix)

</F>

0 new messages