This current behavior is preventing me from using `sum` to add up a bunch of non-
number objects.
Ram.
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
>>> sum([[0, 1], [2, 3]], [])
[0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted,
but then what should 'sum' return if the list is empty?
If sum([1, 2]) returned 3, then I'd want sum([]) to return 0.
If sum([[1], [2]]) returned [1, 2], then I'd want sum([]) to return [].
Unfortunately, I can't have it both ways.
In your proposed implementation, sum([]) would be undefined.
--
André Engels, andre...@gmail.com
I see the problem. I think a good solution would be to tell the user, "If you
want `sum` to be able to handle a non-empty list, you must supply `start`."
Users that want to add up a (possibly empty) sequence of numbers will have to
specify `start`.
If start is supplied, it will work like it does now. If start isn't supplied, it
will add up all the elements without adding any `start` to them.
What do you think?
> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <coo...@cool-rr.com> wrote:
>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>> adding any start value if none is specified?
>>
>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>> number objects.
>
> In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
George
And in that case the special string handling could also be dropped?
>>> sum(["a","b"], "start")
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
sum(["a","b"], "start")
TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects
in theory (as long as they define the correct special methods, etc.),
but it gratuitously raises an exception on strings. This behaviour is
also inconsistent with the following:
>>> sum(["a","b"])
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
sum(["a","b"])
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Where sum actually tries to add "a" to the default value of 0.
There is a choice between these two variants:
a) require start for non-numerical sequences
b) require start for possibly empty sequences
I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a). It also stands to reason that case b)
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
(sorry, pressed wrong key)
There is a choice between these two variants:
a) require start for non-numerical sequences
b) require start for possibly empty sequences
I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a). It also stands to reason that
buggy usage in case b) is harder to detect, since the common case will
not uncover the bug (the sequence being nonempty), while for case a) it does.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
_______________________________________________
This seems to be an instance where the "practicality" Zen rule beats the
"special cases" rule :)
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
_______________________________________________
It might be more accurate to say "hand-holding" instead of
practicality (and it doesn't even catch all errors it's meant to). I'm
not so sure that's special enough ;-)
Vitor
> > In your proposed implementation, sum([]) would be undefined.
>
> Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum'
consistent. The sum of an empty list of numbers is a well-defined
*number*, namely 0, but the max of an empty list of numbers is a
well-defined *non-number*, namely "minus infinity".
The real question is "what harm is done by preferring the
(well-defined) sum of an empty list of numbers over the (well-defined)
empty sums of lists and/or strings?" Then, if there is any harm, "can
the situation be improved by having no useful default for empty lists
of any type?" Finally, "is it worth breaking existing code to ensure
equal treatment of different types?"
My guess is that the answers are "very little", "hardly at all", and
"emphatically no."<wink>
I prefer (b). The problem with requiring `start` for sequences of non-numerical
objects is that you now have to go out and create a "zero object" of the same
type as your other objects. The object class might not even have a concept of a
"zero object".
Ram.
> George Sakkis writes:
> > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
>
> > > In your proposed implementation, sum([]) would be undefined.
> >
> > Which would make it consistent with min/max.
>
> There's no justification for trying to make 'min' and 'sum'
> consistent. The sum of an empty list of numbers is a well-defined
> *number*, namely 0, but the max of an empty list of numbers is a
> well-defined *non-number*, namely "minus infinity".
>
> The real question is "what harm is done by preferring the
> (well-defined) sum of an empty list of numbers over the (well-defined)
> empty sums of lists and/or strings?" Then, if there is any harm, "can
> the situation be improved by having no useful default for empty lists
> of any type?" Finally, "is it worth breaking existing code to ensure
> equal treatment of different types?"
>
> My guess is that the answers are "very little", "hardly at all", and
> "emphatically no."<wink>
Agreed that there is little harm in preferring numbers over other
types when it comes to empty sequences, but the more important
question is "should the start argument be used even if the sequence is
*not* empty?". The OP doesn't think so and I agree.
George
In that case, "default" would be a more appropriate name than "start".
That change of concept is a potential break in compatibility. How
often is the start argument given as a non-zero value? Not all that
often I suppose, but it's still a valid use-case. Ergo, the start
argument should never be omitted if it was explicitly set.
> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <ste...@xemacs.org> wrote:
>
> > George Sakkis writes:
> > > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andre...@gmail.com> wrote:
> >
> > > > In your proposed implementation, sum([]) would be undefined.
> > >
> > > Which would make it consistent with min/max.
> >
> > There's no justification for trying to make 'min' and 'sum'
> > consistent. The sum of an empty list of numbers is a well-defined
> > *number*, namely 0, but the max of an empty list of numbers is a
> > well-defined *non-number*, namely "minus infinity".
> >
> > The real question is "what harm is done by preferring the
> > (well-defined) sum of an empty list of numbers over the (well-defined)
> > empty sums of lists and/or strings?" Then, if there is any harm, "can
> > the situation be improved by having no useful default for empty lists
> > of any type?" Finally, "is it worth breaking existing code to ensure
> > equal treatment of different types?"
> >
> > My guess is that the answers are "very little", "hardly at all", and
> > "emphatically no."<wink>
>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.
Or perhaps, the *default* start value should not be used if it doesn't
match in type the first element of a non-empty sequence. An explicitly
specified start value should still be used even if the sequence is *not*
empty.
Bill
> > I prefer (b). The problem with requiring `start` for sequences of non-
numerical
> > objects is that you now have to go out and create a "zero object" of the
same
> > type as your other objects. The object class might not even have a concept
of a
> > "zero object".
> >
> If the objects can be summed, shouldn't there also be a zero object?
> Does anyone have an example when that's not possible?
You're right MRAB, probably almost every object type that has a concept of
"addition" will have a concept of a zero element.
BUT, that zero object has to be created by the user of `sum`, and that has two
problems:
1. The user might not know from beforehand which type of object he's adding.
Even within the same type there might be problems. What happens when the user is
using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
the dimensions of the vectors are? How will he know if his zero element should
be Vector([0, 0]) or Vector([0, 0, 0])
2. A smaller problem: The user has to actually create that zero object now, and
for some objects the definition might be lengthy, adding needless complexity to
the code.
Also, using the `start` has some overhead, for creating the zero object and
calling __add__.
Ram.
sum is defined by repeatedly adding each number in a sequence. As
each number is usually constant, and the size of total grows
logarithmically, this is O(n log n) (but due to implementation
coarseness it usually isn't distinguished from O(n)).
Concatenation however grows the total's size very quickly. You
instead get a performance of O(n**2). Same result, wrong algorithm.
It would be possible to special case strings, but why? The programmer
should know what algorithm they're using and what complexity class it
has, so they can pick the right one (''.join(seq) in this case). IOW,
handling arbitrary objects is an illusion.
For an another example on why the programmer needs to understand the
algorithmic complexity of the operations they're using, and that the
language should value performance consistency and not just correct
output, see ABC's usage of rational numbers:
http://python-history.blogspot.com/2009/03/problem-with-integer-division.html
--
Adam Olsen, aka Rhamphoryncus
Ugly, but works:
itr = iter(sequence)
sum(itr, itr.next())
This is actually a good example in favor of not requiring a start value.
Only sometimes adding the start value makes it more fragile. If you
have Foo() objects that aren't compatible with int and you do
sum([Foo(), Foo()]) you get a Foo() back. If your sequence then
happens to be empty you do sum([]) and get an int back. The result is
likely to be used in a context that's not compatible with int either.
Better always fail and require an explicit start if you need it.
--
Adam Olsen, aka Rhamphoryncus
Ok I see the different semantics between 'start' and 'default' and the
use cases for each but at the end of the day there should be a way
(preferably the default) that given a sequence [x1, ..., xN] one can
compute "x1+...+xN" instead of "start+x1+...+xN".
George
I think you misunderstood my point. Sorry if I wasn't clear enough in
my original message. I understand the performance characteristics of
repeated concatenation vs str.join. I just wonder why the language
goes out of its way to catch this particular occurrence of bad code,
given there are plenty of ways to misuse sum or any other builtin for
that matter. A newbie is more likely to get n**2 performance by using
a for loop than sum:
final = ""
for s in strings:
final += s
Should python refuse to compile the above snippet? The answer is an
emphatic "no".
Once the API has been released, it is difficult to change without breaking code.
> This current behavior is preventing me from using `sum` to add up a bunch of non-
> number objects.
You have plenty of options:
* use sum() as designed and supply your own Zero object as a start (see below)
* use reduce(operator.add, s)
* write a simple for-loop to do summing
It's not like summing is a hard task. There's nothing in you situation that
would warrant changing the behavior of a published API where sum(s)
is defined even when s is of length zero or one.
Raymond
------------------------------------
>>> class Zero:
... 'universal zero for addition'
... def __add__(self, other):
... return other
... def __radd__(self, other):
... return other
...
>>> Zero() + 'xyz'
'xyz'
>>> sum(['xyz pdq'], Zero())
'xyz pdq'
Would it break any existing code if was this instead:
sum(sequence, start=0)
If start is None then it's omitted from the summation, unless the
sequence is empty, in which case the result is None.
Or, for sequences:
sum(islice(seq, 1), seq[0])
which clearly communicates the need for a non-empty sequence.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
_______________________________________________
Use a single univeral zero object that works for everything.
Here's an example from my earlier post:
>>> class Zero:
... 'universal zero for addition'
... def __add__(self, other):
... return other
... def __radd__(self, other):
... return other
...
>>> Zero() + 'xyz'
'xyz'
>>> sum(['xyz', 'pdq'], Zero())
'xyzpdq'
Raymond
class _AdditiveIdentity(object):
def __add__(self, other):
return other
__radd__ = __add__
AdditiveIdentity = _AdditiveIdentity()
total = sum(itr, start=AdditiveIdentity)
if total is AdditiveIdentity:
# Iterable was empty
else:
# we got a real result
(Raymond already posted along these lines, but I wanted to point out
that by making the identity object a singleton you can save the cost of
repeated instantiation and simplify the after-the-fact check for an
empty iterable)
The other philosophical point here is one Guido has expressed several
times in the past: "In general, the type of a return value should not
depend on the *value* of an argument" (although the different numeric
types tend to blur together a bit in this specific context)
With only a default value, sum() could return entirely different types
based on whether or not the sequence was empty.
With a start value, on the other hand, the type returned must at least
be one that is compatible under addition with the start value. You can
subvert that a bit through the use of a universal additive identity, but
it holds short of that.
Cheers,
Nick.
--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------
I would not have expected this to work, as it does not match "The
iterable‘s items are normally numbers, and are not allowed to be strings."
It appears that it is the start value that may not be a string.
I suggest a doc fix in
http://bugs.python.org/issue7447
FWIW, sum was designed for summing numbers at C speed. I think it
probably is as good a compromise as we can get. It is easy to program
any other exact behavior one wants, and summing user objects is going to
go at Python speed anyway. Certainly, none of the suggested alterations
strike me as worth breaking code.
Terry Jan Reedy
Wisely spoken.
Raymond
All the individual operations there are fine. It's the composition
that's wrong. Adding a sanity check would require recognizing that
pattern, and changing the semantics of an individual operation based
on what surrounds it. Not a nice thing to do.
sum() is already a single operation (regardless of how it's
implemented), so it doesn't have that problem.
--
Adam Olsen, aka Rhamphoryncus