[Python-ideas] Why does `sum` use a default for the `start` parameter?

1 view
Skip to first unread message

Ram Rachum

unread,
Dec 5, 2009, 6:55:37 AM12/5/09
to python...@python.org
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
adding any start value if none is specified?

This current behavior is preventing me from using `sum` to add up a bunch of non-
number objects.

Ram.

_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

MRAB

unread,
Dec 5, 2009, 11:43:15 AM12/5/09
to python...@python.org
Ram Rachum wrote:
> I noticed that `sum` tries to add zero to your iterable. Why? Why not
> just skip adding any start value if none is specified?
>
> This current behavior is preventing me from using `sum` to add up a
> bunch of non-number objects.
>
Sometimes you might find that the list you're summing is empty. Because
'sum' is most often used with numbers, the default sum of a list is 0.
If you want to sum a list of non-numbers, provide a suitable start
value. For example, to sum a list of lists a suitable start value is []:

>>> sum([[0, 1], [2, 3]], [])
[0, 1, 2, 3]

I agree that it would be nice if the start value could just be omitted,
but then what should 'sum' return if the list is empty?

If sum([1, 2]) returned 3, then I'd want sum([]) to return 0.

If sum([[1], [2]]) returned [1, 2], then I'd want sum([]) to return [].

Unfortunately, I can't have it both ways.

Andre Engels

unread,
Dec 5, 2009, 11:45:33 AM12/5/09
to Ram Rachum, python...@python.org
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
> adding any start value if none is specified?
>
> This current behavior is preventing me from using `sum` to add up a bunch of non-
> number objects.

In your proposed implementation, sum([]) would be undefined.


--
André Engels, andre...@gmail.com

Ram Rachum

unread,
Dec 5, 2009, 11:56:09 AM12/5/09
to python...@python.org
> Sometimes you might find that the list you're summing is empty. Because
> 'sum' is most often used with numbers, the default sum of a list is 0.
> If you want to sum a list of non-numbers, provide a suitable start
> value. For example, to sum a list of lists a suitable start value is []:
>
> >>> sum([[0, 1], [2, 3]], [])
> [0, 1, 2, 3]
>
> I agree that it would be nice if the start value could just be omitted,
> but then what should 'sum' return if the list is empty?


I see the problem. I think a good solution would be to tell the user, "If you
want `sum` to be able to handle a non-empty list, you must supply `start`."
Users that want to add up a (possibly empty) sequence of numbers will have to
specify `start`.

If start is supplied, it will work like it does now. If start isn't supplied, it
will add up all the elements without adding any `start` to them.

What do you think?

George Sakkis

unread,
Dec 5, 2009, 12:01:01 PM12/5/09
to python-ideas
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:

> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>> adding any start value if none is specified?
>>
>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>> number objects.
>
> In your proposed implementation, sum([]) would be undefined.

Which would make it consistent with min/max.

George

Vitor Bosshard

unread,
Dec 5, 2009, 12:23:19 PM12/5/09
to George Sakkis, python...@python.org
2009/12/5 George Sakkis <george...@gmail.com>:

> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>
>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>> adding any start value if none is specified?
>>>
>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>> number objects.
>>
>> In your proposed implementation, sum([]) would be undefined.
>
> Which would make it consistent with min/max.


And in that case the special string handling could also be dropped?

>>> sum(["a","b"], "start")
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
sum(["a","b"], "start")
TypeError: sum() can't sum strings [use ''.join(seq) instead]


This behaviour is quite bothersome. Sum can handle arbitrary objects
in theory (as long as they define the correct special methods, etc.),
but it gratuitously raises an exception on strings. This behaviour is
also inconsistent with the following:

>>> sum(["a","b"])
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
sum(["a","b"])
TypeError: unsupported operand type(s) for +: 'int' and 'str'


Where sum actually tries to add "a" to the default value of 0.

Georg Brandl

unread,
Dec 5, 2009, 12:33:13 PM12/5/09
to python...@python.org
Ram Rachum schrieb:

>> Sometimes you might find that the list you're summing is empty. Because
>> 'sum' is most often used with numbers, the default sum of a list is 0.
>> If you want to sum a list of non-numbers, provide a suitable start
>> value. For example, to sum a list of lists a suitable start value is []:
>>
>> >>> sum([[0, 1], [2, 3]], [])
>> [0, 1, 2, 3]
>>
>> I agree that it would be nice if the start value could just be omitted,
>> but then what should 'sum' return if the list is empty?
>
>
> I see the problem. I think a good solution would be to tell the user, "If you
> want `sum` to be able to handle a non-empty list, you must supply `start`."
> Users that want to add up a (possibly empty) sequence of numbers will have to
> specify `start`.
>
> If start is supplied, it will work like it does now. If start isn't supplied, it
> will add up all the elements without adding any `start` to them.
>
> What do you think?

There is a choice between these two variants:

a) require start for non-numerical sequences
b) require start for possibly empty sequences

I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a). It also stands to reason that case b)

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

Georg Brandl

unread,
Dec 5, 2009, 12:35:07 PM12/5/09
to python...@python.org
Ram Rachum schrieb:

>> Sometimes you might find that the list you're summing is empty. Because
>> 'sum' is most often used with numbers, the default sum of a list is 0.
>> If you want to sum a list of non-numbers, provide a suitable start
>> value. For example, to sum a list of lists a suitable start value is []:
>>
>> >>> sum([[0, 1], [2, 3]], [])
>> [0, 1, 2, 3]
>>
>> I agree that it would be nice if the start value could just be omitted,
>> but then what should 'sum' return if the list is empty?
>
>
> I see the problem. I think a good solution would be to tell the user, "If you
> want `sum` to be able to handle a non-empty list, you must supply `start`."
> Users that want to add up a (possibly empty) sequence of numbers will have to
> specify `start`.
>
> If start is supplied, it will work like it does now. If start isn't supplied, it
> will add up all the elements without adding any `start` to them.
>
> What do you think?

(sorry, pressed wrong key)

There is a choice between these two variants:

a) require start for non-numerical sequences
b) require start for possibly empty sequences

I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a). It also stands to reason that

buggy usage in case b) is harder to detect, since the common case will
not uncover the bug (the sequence being nonempty), while for case a) it does.

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

_______________________________________________

Georg Brandl

unread,
Dec 5, 2009, 12:36:32 PM12/5/09
to python...@python.org
Vitor Bosshard schrieb:

> 2009/12/5 George Sakkis <george...@gmail.com>:
>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>>
>>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
>>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>>> adding any start value if none is specified?
>>>>
>>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>>> number objects.
>>>
>>> In your proposed implementation, sum([]) would be undefined.
>>
>> Which would make it consistent with min/max.
>
>
> And in that case the special string handling could also be dropped?
>
>>>> sum(["a","b"], "start")
> Traceback (most recent call last):
> File "<pyshell#0>", line 1, in <module>
> sum(["a","b"], "start")
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
>
> This behaviour is quite bothersome. Sum can handle arbitrary objects
> in theory (as long as they define the correct special methods, etc.),
> but it gratuitously raises an exception on strings.

This seems to be an instance where the "practicality" Zen rule beats the
"special cases" rule :)

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

_______________________________________________

Vitor Bosshard

unread,
Dec 5, 2009, 1:04:42 PM12/5/09
to Georg Brandl, python...@python.org
2009/12/5 Georg Brandl <g.br...@gmx.net>:

> Vitor Bosshard schrieb:
>> 2009/12/5 George Sakkis <george...@gmail.com>:
>>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>>>
>>>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
>>>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>>>> adding any start value if none is specified?
>>>>>
>>>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>>>> number objects.
>>>>
>>>> In your proposed implementation, sum([]) would be undefined.
>>>
>>> Which would make it consistent with min/max.
>>
>>
>> And in that case the special string handling could also be dropped?
>>
>>>>> sum(["a","b"], "start")
>> Traceback (most recent call last):
>>   File "<pyshell#0>", line 1, in <module>
>>     sum(["a","b"], "start")
>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>
>>
>> This behaviour is quite bothersome. Sum can handle arbitrary objects
>> in theory (as long as they define the correct special methods, etc.),
>> but it gratuitously raises an exception on strings.
>
> This seems to be an instance where the "practicality" Zen rule beats the
> "special cases" rule :)
>


It might be more accurate to say "hand-holding" instead of
practicality (and it doesn't even catch all errors it's meant to). I'm
not so sure that's special enough ;-)


Vitor

Stephen J. Turnbull

unread,
Dec 5, 2009, 1:10:51 PM12/5/09
to George Sakkis, python-ideas
George Sakkis writes:
> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:

> > In your proposed implementation, sum([]) would be undefined.
>
> Which would make it consistent with min/max.

There's no justification for trying to make 'min' and 'sum'
consistent. The sum of an empty list of numbers is a well-defined
*number*, namely 0, but the max of an empty list of numbers is a
well-defined *non-number*, namely "minus infinity".

The real question is "what harm is done by preferring the
(well-defined) sum of an empty list of numbers over the (well-defined)
empty sums of lists and/or strings?" Then, if there is any harm, "can
the situation be improved by having no useful default for empty lists
of any type?" Finally, "is it worth breaking existing code to ensure
equal treatment of different types?"

My guess is that the answers are "very little", "hardly at all", and
"emphatically no."<wink>

Ram Rachum

unread,
Dec 5, 2009, 1:05:59 PM12/5/09
to python...@python.org
> There is a choice between these two variants:
>
> a) require start for non-numerical sequences
> b) require start for possibly empty sequences
>
> I don't have a preference for either, so for compatibility's sake I would
> vote to keep the current one, which is a). It also stands to reason that
> buggy usage in case b) is harder to detect, since the common case will
> not uncover the bug (the sequence being nonempty), while for case a) it does.


I prefer (b). The problem with requiring `start` for sequences of non-numerical
objects is that you now have to go out and create a "zero object" of the same
type as your other objects. The object class might not even have a concept of a
"zero object".

Ram.

MRAB

unread,
Dec 5, 2009, 1:12:31 PM12/5/09
to python...@python.org
True, providing start will ensure that the result is of the correct
class, instead of it sometimes being an int, causing a TypeError later
on.

MRAB

unread,
Dec 5, 2009, 1:18:08 PM12/5/09
to python...@python.org
Ram Rachum wrote:
>> There is a choice between these two variants:
>>
>> a) require start for non-numerical sequences
>> b) require start for possibly empty sequences
>>
>> I don't have a preference for either, so for compatibility's sake I would
>> vote to keep the current one, which is a). It also stands to reason that
>> buggy usage in case b) is harder to detect, since the common case will
>> not uncover the bug (the sequence being nonempty), while for case a) it does.
>
>
> I prefer (b). The problem with requiring `start` for sequences of non-numerical
> objects is that you now have to go out and create a "zero object" of the same
> type as your other objects. The object class might not even have a concept of a
> "zero object".
>
If the objects can be summed, shouldn't there also be a zero object?
Does anyone have an example when that's not possible?

George Sakkis

unread,
Dec 5, 2009, 1:23:35 PM12/5/09
to Stephen J. Turnbull, python-ideas
On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:

> George Sakkis writes:
>  > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>
>  > > In your proposed implementation, sum([]) would be undefined.
>  >
>  > Which would make it consistent with min/max.
>
> There's no justification for trying to make 'min' and 'sum'
> consistent.  The sum of an empty list of numbers is a well-defined
> *number*, namely 0, but the max of an empty list of numbers is a
> well-defined *non-number*, namely "minus infinity".
>
> The real question is "what harm is done by preferring the
> (well-defined) sum of an empty list of numbers over the (well-defined)
> empty sums of lists and/or strings?"  Then, if there is any harm, "can
> the situation be improved by having no useful default for empty lists
> of any type?"  Finally, "is it worth breaking existing code to ensure
> equal treatment of different types?"
>
> My guess is that the answers are "very little", "hardly at all", and
> "emphatically no."<wink>

Agreed that there is little harm in preferring numbers over other
types when it comes to empty sequences, but the more important
question is "should the start argument be used even if the sequence is
*not* empty?". The OP doesn't think so and I agree.

George

Vitor Bosshard

unread,
Dec 5, 2009, 1:39:39 PM12/5/09
to George Sakkis, python-ideas
2009/12/5 George Sakkis <george...@gmail.com>:

>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.
>

In that case, "default" would be a more appropriate name than "start".
That change of concept is a potential break in compatibility. How
often is the start argument given as a non-zero value? Not all that
often I suppose, but it's still a valid use-case. Ergo, the start
argument should never be omitted if it was explicitly set.

Bill Janssen

unread,
Dec 5, 2009, 1:40:21 PM12/5/09
to George Sakkis, python-ideas
George Sakkis <george...@gmail.com> wrote:

> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:
>
> > George Sakkis writes:
> >  > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
> >
> >  > > In your proposed implementation, sum([]) would be undefined.
> >  >
> >  > Which would make it consistent with min/max.
> >
> > There's no justification for trying to make 'min' and 'sum'
> > consistent.  The sum of an empty list of numbers is a well-defined
> > *number*, namely 0, but the max of an empty list of numbers is a
> > well-defined *non-number*, namely "minus infinity".
> >
> > The real question is "what harm is done by preferring the
> > (well-defined) sum of an empty list of numbers over the (well-defined)
> > empty sums of lists and/or strings?"  Then, if there is any harm, "can
> > the situation be improved by having no useful default for empty lists
> > of any type?"  Finally, "is it worth breaking existing code to ensure
> > equal treatment of different types?"
> >
> > My guess is that the answers are "very little", "hardly at all", and
> > "emphatically no."<wink>
>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.

Or perhaps, the *default* start value should not be used if it doesn't
match in type the first element of a non-empty sequence. An explicitly
specified start value should still be used even if the sequence is *not*
empty.

Bill

Ram Rachum

unread,
Dec 5, 2009, 1:42:31 PM12/5/09
to python...@python.org
MRAB <python@...> writes:

> > I prefer (b). The problem with requiring `start` for sequences of non-
numerical
> > objects is that you now have to go out and create a "zero object" of the
same
> > type as your other objects. The object class might not even have a concept
of a
> > "zero object".
> >
> If the objects can be summed, shouldn't there also be a zero object?
> Does anyone have an example when that's not possible?

You're right MRAB, probably almost every object type that has a concept of
"addition" will have a concept of a zero element.

BUT, that zero object has to be created by the user of `sum`, and that has two
problems:

1. The user might not know from beforehand which type of object he's adding.
Even within the same type there might be problems. What happens when the user is
using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
the dimensions of the vectors are? How will he know if his zero element should
be Vector([0, 0]) or Vector([0, 0, 0])

2. A smaller problem: The user has to actually create that zero object now, and
for some objects the definition might be lengthy, adding needless complexity to
the code.

Also, using the `start` has some overhead, for creating the zero object and
calling __add__.

Ram.

Adam Olsen

unread,
Dec 5, 2009, 1:48:52 PM12/5/09
to Vitor Bosshard, python...@python.org
On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algo...@gmail.com> wrote:
> And in that case the special string handling could also be dropped?
>
>>>> sum(["a","b"], "start")
> Traceback (most recent call last):
>  File "<pyshell#0>", line 1, in <module>
>    sum(["a","b"], "start")
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
>
> This behaviour is quite bothersome. Sum can handle arbitrary objects
> in theory (as long as they define the correct special methods, etc.),
> but it gratuitously raises an exception on strings. This behaviour is
> also inconsistent with the following:
>
>>>> sum(["a","b"])
> Traceback (most recent call last):
>  File "<pyshell#1>", line 1, in <module>
>    sum(["a","b"])
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>
>
> Where sum actually tries to add "a" to the default value of 0.

sum is defined by repeatedly adding each number in a sequence. As
each number is usually constant, and the size of total grows
logarithmically, this is O(n log n) (but due to implementation
coarseness it usually isn't distinguished from O(n)).

Concatenation however grows the total's size very quickly. You
instead get a performance of O(n**2). Same result, wrong algorithm.

It would be possible to special case strings, but why? The programmer
should know what algorithm they're using and what complexity class it
has, so they can pick the right one (''.join(seq) in this case). IOW,
handling arbitrary objects is an illusion.

For an another example on why the programmer needs to understand the
algorithmic complexity of the operations they're using, and that the
language should value performance consistency and not just correct
output, see ABC's usage of rational numbers:
http://python-history.blogspot.com/2009/03/problem-with-integer-division.html


--
Adam Olsen, aka Rhamphoryncus

Vitor Bosshard

unread,
Dec 5, 2009, 1:55:53 PM12/5/09
to Ram Rachum, python...@python.org
2009/12/5 Ram Rachum <coo...@cool-rr.com>:

> MRAB <python@...> writes:
>
>> > I prefer (b). The problem with requiring `start` for sequences of non-
> numerical
>> > objects is that you now have to go out and create a "zero object" of the
> same
>> > type as your other objects. The object class might not even have a concept
> of a
>> > "zero object".
>> >
>> If the objects can be summed, shouldn't there also be a zero object?
>> Does anyone have an example when that's not possible?
>
> You're right MRAB, probably almost every object type that has a concept of
> "addition" will have a concept of a zero element.
>
> BUT, that zero object has to be created by the user of `sum`, and that has two
> problems:
>
> 1. The user might not know from beforehand which type of object he's adding.
> Even within the same type there might be problems. What happens when the user is
> using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
> the dimensions of the vectors are? How will he know if his zero element should
> be Vector([0, 0]) or Vector([0, 0, 0])

Ugly, but works:

itr = iter(sequence)
sum(itr, itr.next())

This is actually a good example in favor of not requiring a start value.

Adam Olsen

unread,
Dec 5, 2009, 2:03:02 PM12/5/09
to George Sakkis, python-ideas
On Sat, Dec 5, 2009 at 11:23, George Sakkis <george...@gmail.com> wrote:
> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:
>
>> George Sakkis writes:
>>  > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>>
>>  > > In your proposed implementation, sum([]) would be undefined.
>>  >
>>  > Which would make it consistent with min/max.
>>
>> There's no justification for trying to make 'min' and 'sum'
>> consistent.  The sum of an empty list of numbers is a well-defined
>> *number*, namely 0, but the max of an empty list of numbers is a
>> well-defined *non-number*, namely "minus infinity".
>>
>> The real question is "what harm is done by preferring the
>> (well-defined) sum of an empty list of numbers over the (well-defined)
>> empty sums of lists and/or strings?"  Then, if there is any harm, "can
>> the situation be improved by having no useful default for empty lists
>> of any type?"  Finally, "is it worth breaking existing code to ensure
>> equal treatment of different types?"
>>
>> My guess is that the answers are "very little", "hardly at all", and
>> "emphatically no."<wink>
>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.

Only sometimes adding the start value makes it more fragile. If you
have Foo() objects that aren't compatible with int and you do
sum([Foo(), Foo()]) you get a Foo() back. If your sequence then
happens to be empty you do sum([]) and get an int back. The result is
likely to be used in a context that's not compatible with int either.
Better always fail and require an explicit start if you need it.


--
Adam Olsen, aka Rhamphoryncus

George Sakkis

unread,
Dec 5, 2009, 2:07:22 PM12/5/09
to Vitor Bosshard, python-ideas
On Sat, Dec 5, 2009 at 8:39 PM, Vitor Bosshard <algo...@gmail.com> wrote:
> 2009/12/5 George Sakkis <george...@gmail.com>:
>>
>> Agreed that there is little harm in preferring numbers over other
>> types when it comes to empty sequences, but the more important
>> question is "should the start argument be used even if the sequence is
>> *not* empty?". The OP doesn't think so and I agree.
>>
>
> In that case, "default" would be a more appropriate name than "start".
> That change of concept is a potential break in compatibility. How
> often is the start argument given as a non-zero value? Not all that
> often I suppose, but it's still a valid use-case. Ergo, the start
> argument should never be omitted if it was explicitly set.

Ok I see the different semantics between 'start' and 'default' and the
use cases for each but at the end of the day there should be a way
(preferably the default) that given a sequence [x1, ..., xN] one can
compute "x1+...+xN" instead of "start+x1+...+xN".

George

Vitor Bosshard

unread,
Dec 5, 2009, 2:19:06 PM12/5/09
to Adam Olsen, python...@python.org
2009/12/5 Adam Olsen <rha...@gmail.com>:


I think you misunderstood my point. Sorry if I wasn't clear enough in
my original message. I understand the performance characteristics of
repeated concatenation vs str.join. I just wonder why the language
goes out of its way to catch this particular occurrence of bad code,
given there are plenty of ways to misuse sum or any other builtin for
that matter. A newbie is more likely to get n**2 performance by using
a for loop than sum:

final = ""
for s in strings:
final += s

Should python refuse to compile the above snippet? The answer is an
emphatic "no".

Raymond Hettinger

unread,
Dec 5, 2009, 2:31:14 PM12/5/09
to python...@python.org, Ram Rachum

[Ram Rachum]

>I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
> adding any start value if none is specified?

Once the API has been released, it is difficult to change without breaking code.


> This current behavior is preventing me from using `sum` to add up a bunch of non-
> number objects.

You have plenty of options:
* use sum() as designed and supply your own Zero object as a start (see below)
* use reduce(operator.add, s)
* write a simple for-loop to do summing

It's not like summing is a hard task. There's nothing in you situation that
would warrant changing the behavior of a published API where sum(s)
is defined even when s is of length zero or one.


Raymond

------------------------------------

>>> class Zero:
... 'universal zero for addition'
... def __add__(self, other):
... return other
... def __radd__(self, other):
... return other
...
>>> Zero() + 'xyz'
'xyz'
>>> sum(['xyz pdq'], Zero())
'xyz pdq'

MRAB

unread,
Dec 5, 2009, 2:34:44 PM12/5/09
to python-ideas
Currently if start is None then the result is None if the sequence is
empty, but raises a TypeError otherwise.

Would it break any existing code if was this instead:

sum(sequence, start=0)

If start is None then it's omitted from the summation, unless the
sequence is empty, in which case the result is None.

Georg Brandl

unread,
Dec 5, 2009, 3:59:36 PM12/5/09
to python...@python.org
Vitor Bosshard schrieb:

> 2009/12/5 Ram Rachum <coo...@cool-rr.com>:
>> MRAB <python@...> writes:
>>
>>> > I prefer (b). The problem with requiring `start` for sequences of non-
>> numerical
>>> > objects is that you now have to go out and create a "zero object" of the
>> same
>>> > type as your other objects. The object class might not even have a concept
>> of a
>>> > "zero object".
>>> >
>>> If the objects can be summed, shouldn't there also be a zero object?
>>> Does anyone have an example when that's not possible?
>>
>> You're right MRAB, probably almost every object type that has a concept of
>> "addition" will have a concept of a zero element.
>>
>> BUT, that zero object has to be created by the user of `sum`, and that has two
>> problems:
>>
>> 1. The user might not know from beforehand which type of object he's adding.
>> Even within the same type there might be problems. What happens when the user is
>> using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
>> the dimensions of the vectors are? How will he know if his zero element should
>> be Vector([0, 0]) or Vector([0, 0, 0])
>
> Ugly, but works:
>
> itr = iter(sequence)
> sum(itr, itr.next())

Or, for sequences:

sum(islice(seq, 1), seq[0])

which clearly communicates the need for a non-empty sequence.

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

_______________________________________________

Raymond Hettinger

unread,
Dec 5, 2009, 4:13:45 PM12/5/09
to Georg Brandl, python...@python.org

>>>> > I prefer (b). The problem with requiring `start` for sequences of non-
>>> numerical
>>>> > objects is that you now have to go out and create a "zero object" of the
>>> same
>>>> > type as your other objects. The object class might not even have a concept
>>> of a
>>>> > "zero object".
>>>> >
>>>> If the objects can be summed, shouldn't there also be a zero object?


Use a single univeral zero object that works for everything.
Here's an example from my earlier post:

>>> class Zero:
... 'universal zero for addition'
... def __add__(self, other):
... return other
... def __radd__(self, other):
... return other
...
>>> Zero() + 'xyz'
'xyz'

>>> sum(['xyz', 'pdq'], Zero())
'xyzpdq'


Raymond

Nick Coghlan

unread,
Dec 5, 2009, 6:49:53 PM12/5/09
to Ram Rachum, python...@python.org
Ram Rachum wrote:
> I prefer (b). The problem with requiring `start` for sequences of non-numerical
> objects is that you now have to go out and create a "zero object" of the same
> type as your other objects. The object class might not even have a concept of a
> "zero object".

class _AdditiveIdentity(object):
def __add__(self, other):
return other
__radd__ = __add__

AdditiveIdentity = _AdditiveIdentity()


total = sum(itr, start=AdditiveIdentity)
if total is AdditiveIdentity:
# Iterable was empty
else:
# we got a real result


(Raymond already posted along these lines, but I wanted to point out
that by making the identity object a singleton you can save the cost of
repeated instantiation and simplify the after-the-fact check for an
empty iterable)

The other philosophical point here is one Guido has expressed several
times in the past: "In general, the type of a return value should not
depend on the *value* of an argument" (although the different numeric
types tend to blur together a bit in this specific context)

With only a default value, sum() could return entirely different types
based on whether or not the sequence was empty.

With a start value, on the other hand, the type returned must at least
be one that is compatible under addition with the start value. You can
subvert that a bit through the use of a universal additive identity, but
it holds short of that.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Terry Reedy

unread,
Dec 5, 2009, 7:57:38 PM12/5/09
to python...@python.org

I would not have expected this to work, as it does not match "The
iterable‘s items are normally numbers, and are not allowed to be strings."

It appears that it is the start value that may not be a string.
I suggest a doc fix in
http://bugs.python.org/issue7447

FWIW, sum was designed for summing numbers at C speed. I think it
probably is as good a compromise as we can get. It is easy to program
any other exact behavior one wants, and summing user objects is going to
go at Python speed anyway. Certainly, none of the suggested alterations
strike me as worth breaking code.

Terry Jan Reedy

Raymond Hettinger

unread,
Dec 5, 2009, 8:18:38 PM12/5/09
to python...@python.org, Terry Reedy

["Terry Reedy"]

> FWIW, sum was designed for summing numbers at C speed. I think it
> probably is as good a compromise as we can get. It is easy to program
> any other exact behavior one wants, and summing user objects is going to
> go at Python speed anyway. Certainly, none of the suggested alterations
> strike me as worth breaking code.

Wisely spoken.


Raymond

Adam Olsen

unread,
Dec 6, 2009, 1:29:05 AM12/6/09
to Vitor Bosshard, python...@python.org
On Sat, Dec 5, 2009 at 12:19, Vitor Bosshard <algo...@gmail.com> wrote:
> I think you misunderstood my point. Sorry if I wasn't clear enough in
> my original message. I understand the performance characteristics of
> repeated concatenation vs str.join. I just wonder why the language
> goes out of its way to catch this particular occurrence of bad code,
> given there are plenty of ways to misuse sum or any other builtin for
> that matter. A newbie is more likely to get n**2 performance by using
> a for loop than sum:
>
> final = ""
> for s in strings:
>    final += s
>
> Should python refuse to compile the above snippet? The answer is an
> emphatic "no".

All the individual operations there are fine. It's the composition
that's wrong. Adding a sanity check would require recognizing that
pattern, and changing the semantics of an individual operation based
on what surrounds it. Not a nice thing to do.

sum() is already a single operation (regardless of how it's
implemented), so it doesn't have that problem.


--
Adam Olsen, aka Rhamphoryncus

Reply all
Reply to author
Forward
0 new messages