>>> sum([])
0
Because that [] may be an empty sequence of someobject:
>>> sum(s for s in ["a", "b"] if len(s) > 2)
0
In a statically typed language in that situation you may answer the
initializer value of the type of the items of the list, as I do in the
sum() in D.
This sounds like a more correct/clean thing to do:
>>> max([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
So it may be better to make the sum([]) too raise a ValueError, in
Python 3/3.1 (if this isn't already true). On the other hand often
enough I have code like this:
>>> max(fun(x) for x in iterable if predicate(x))
This may raise the ValueError both if iterable is empty of if the
predicate on its items is always false, so instead of catching
exceptions, that I try to avoid, I usually end with a normal loop,
that's readable and fast:
max_value = smallvalue
for x in iterable:
if predicate(x):
max_value = max(max_value, fun(x))
Where running speed matters, I may even replace that max(max_value,
fun(x)) with a more normal if/else.
A possible alternative is to add a default to max(), like the next()
built-in of Python 2.6:
>>> max((fun(x) for x in iterable if predicate(x)), default=smallvalue)
This returns smallvalue if there are no items to compute the max of.
Bye,
bearophile
>>> help(sum)
sum(...)
sum(sequence, start=0) -> value
>>> sum(range(x) for x in range(5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'
>>> sum((range(x) for x in range(5)), [])
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
... so the list might not know what type it contains, but sum
does. And if you don't tell it, it makes a sensible guess. And
it *is* a case where refusing the temptation to guess is the
wrong thing: how many times would you use sum to do anything
other than sum numeric values? And how tedious would it be to
have to write sum(..., 0) for every other case? Particularly
bearing in mind:
>>> sum(["a", "b"], "")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
--
\S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.
Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.
Of course this is not good for summing other types of objects. But how
clumsy would it be to use
sum( L +[0] )
or
if L:
value = sum(L)
else:
value = 0
instead of sum(L).
Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.
Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.
But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.
Best,
Laszlo
sum([1, 2, 3]) => 6
sum(["a", "b", "c"]) => "abc"
For backward compatibility, if the sequence is empty and the start
value is None then return 0.
I see. But note that my post is mostly about the max()/min()
functions :-)
Bye,
bearophile
No it isn't. Nothing is not 0, check with MS-Access, for instance:
Null + 1 returns Null. Any arithmetic expression involving a
Null evaluates to Null. Adding something to an unknown returns
an unknown, as it should.
It is a logical fallacy to equate unknown with 0.
For example, the water table elevation in ft above Mean Sea Level
is WTE = TopOfCasing - DepthToWater.
TopOfCasing is usually known and constant (until resurveyed).
But DepthToWater may or may not exist for a given event (well
may be covered with fire ants, for example).
Now, if you equate Null with 0, then the WTE calculation says
the water table elevation is flush with the top of the well,
falsely implying that the site is underwater.
And, since this particular site is on the Mississippi River,
it sometimes IS underwater, but this is NEVER determined by
water table elevations, which, due to the CORRECT treatment
of Nulls by Access, never returns FALSE calculations.
>>> sum([])
0
is a bug, just as it's a bug in Excel to evaluate blank cells
as 0. It should return None or throw an exception like sum([None,1])
does.
Two thoughts:
1/ 'Reduce' has a 'default' argument-- they call it 'initial'.
>>> reduce( max, [ 0, 1, 2, 3 ] )
3
>>> reduce( max, [ 0, 1, 2, 'a' ] )
'a'
>>> reduce( max, [ 0, 1, 2, 'a', 'b' ] )
'b'
2/ Introduce a 'max' class object that takes a default type or default
argument. Query the default for an 'additive' identity, or query for
a 'comparitive' identity, comparisons to which always return true; or
call the constructor with no arguments to construct one.
>>>> sum([])
> 0
>
> is a bug, just as it's a bug in Excel to evaluate blank cells as 0. It
> should return None or throw an exception like sum([None,1]) does.
You're wrong, because 99.9% of the time when users leave a blank cell in
Excel, they want it to be treated as zero. Spreadsheet sum() is not the
same as mathematician's sum, which doesn't have a concept of "blank
cells". (But if it did, it would treat them as zero, since that's the
only useful thing and mathematicians are just as much pragmatists as
spreadsheet users.) The Excel code does the right thing, and your "pure"
solution would do the unwanted and unexpected thing and is therefore
buggy.
Bugs are defined by "does the code do what the user wants it to do?", not
"is it mathematically pure?". The current behaviour of sum([]) does the
right thing for the 99% of the time when users expect an integer. And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.
The only time it does the wrong thing[1] is when you forget to pass an
initial value but expect a non-numeric result. And that's the
programmer's error, not a function bug.
[1] I believe it also does the wrong thing by refusing to sum strings,
but that's another story.
--
Steven
> bearoph...@lycos.com wrote:
> > Empty Python lists [] don't know the type of the items it will
> > contain, so this sounds strange:
> >
> >
> >>>> sum([])
> >>>>
> > 0
> >
> > Because that [] may be an empty sequence of someobject:
> >
>
> You are right in that sum could be used to sum arbitrary objects.
> However, in 99.99% of the cases, you will be summing numerical values.
> When adding real numbers, the neutral element is zero. ( X + 0 = X) It
> is very logical to return zero for empty sequences.
Even better:
help(sum) shows
===
sum(...)
sum(sequence, start=0) -> value
Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
===
so the fact that sum([]) returns zero is just because the start value is zero...
sum([],object()) would return an object().
BTW, the original code:
>>> sum(s for s in ["a", "b"] if len(s) > 2)
wouldn't work anyway... it seems that sum doesn't like to sum strings:
>>> sum(['a','b'],'')
<type 'exceptions.TypeError'>: sum() can't sum strings [use ''.join(seq) instead]
Cheers,
--
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie
Then 99.9% of users want the wrong thing. Microsoft knows that
this is a bug but refuses to fix it to prevent breaking legacy
documents (probably dating back to VisiCalc). When graphimg data,
a missing value should be interpreted as a hole in the graph
+------+ +--+------+------+-----+
and not evaluated as 0
+------+ +--+------+------+-----+
\ /
\ /
\ /
\ /
\ /
\+/
(depending on the context of the graph, of course).
And Microsoft provides a workaround for graphs to make 0's
appear as holes. Of course, this will cause legitimate 0
values to disappear, so the workaround is inconsistent.
> Spreadsheet sum() is not the
> same as mathematician's sum, which doesn't have a concept of "blank
> cells". (But if it did, it would treat them as zero, since that's the
> only useful thing and mathematicians are just as much pragmatists as
> spreadsheet users.) The Excel code does the right thing, and your "pure"
> solution would do the unwanted and unexpected thing and is therefore
> buggy.
Apparently, you don't use databases or make surface contours.
Contour programs REQUIRE that blanks are null, not 0, so that
the Kriging algorithm interpolates around the holes rather than
return false calculations. Excel's treatment of blank cells is
inconsistent with Access' treatment of Nulls and therefore wrong,
anyway you slice it. Math isn't a democracy, what most people want
is irrelevant.
I don't pull these things out of my ass, it's real world stuff
I observe when I help CAD operators and such debug problems.
Maybe you want to say a bug is when it doesn't do what the
author intended, but I say if what the intention was is wrong,
then a perfect implentation is still a bug because it doesn't
do what it's supposed to do.
>
> Bugs are defined by "does the code do what the user wants it to do?", not
> "is it mathematically pure?".
ReallY? So you think math IS a democracy? There is no reason to
violate
mathematical purity. If I don't get EXACTLY the same answer from
Excel,
Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
shame if that somebody was Python.
> The current behaviour of sum([]) does the
> right thing for the 99% of the time when users expect an integer.
Why shouldn't the users expect an exception? Isn't that why we have
try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to
be able to distinguish an empty list from [4,-4].
> And the
> rest of the time, they have to specify a starting value for the sum
> anyway, and so sum([], initial_value) does the right thing *always*.
So if you really want [] to be 0, why not say sum([],0)?
Why shouldn't nothing added to nothing return nothing?
Having it evaluate to 0 is wrong 99.9% of the time.
I just checked and I mis-remembered how this works.
The option is for blanks to plot as holes or 0 or
be interpolated. 0 always plots as 0. The inconsistency
is that blanks are still evaluated as 0 in formulae
and macros.
> > Steven- Hide quoted text -
>
> - Show quoted text -
> No it isn't. Nothing is not 0, check with MS-Access, for instance:
>
> Null + 1 returns Null. Any arithmetic expression involving a
> Null evaluates to Null. Adding something to an unknown returns
> an unknown, as it should.
>
> It is a logical fallacy to equate unknown with 0.
http://en.wikipedia.org/wiki/Empty_sum
"In mathematics, the empty sum, or nullary sum, is the result of adding
no numbers, in summation for example. Its numerical value is zero."
</F>
> On Sep 3, 8:30�pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Wed, 03 Sep 2008 16:20:39 -0700, Mensanator wrote:
>> >>>> sum([])
>> > 0
>>
>> > is a bug, just as it's a bug in Excel to evaluate blank cells as 0.
>> > It should return None or throw an exception like sum([None,1]) does.
>>
>> You're wrong, because 99.9% of the time when users leave a blank cell
>> in Excel, they want it to be treated as zero.
>
> Then 99.9% of users want the wrong thing.
It is to laugh.
> Microsoft knows that this is a bug
Says you.
> but refuses to fix it to prevent breaking legacy documents (probably
> dating back to VisiCalc). When graphimg data, a missing value should be
> interpreted as a hole in the graph
"Graphing data" is not sum(). I don't expect graphing data to result in
the same result as sum(), why would I expect them to interpret input the
same way?
> +------+ +--+------+------+-----+
Why should the graphing application ignore blanks ("missing data"), but
sum() treat missing data as an error? That makes no sense at all.
> and not evaluated as 0
>
> And Microsoft provides a workaround for graphs to make 0's appear as
> holes. Of course, this will cause legitimate 0 values to disappear, so
> the workaround is inconsistent.
I'm not aware of any spreadsheet that treats empty cells as zero for the
purpose of graphing, and I find your claim that Excel can't draw graphs
with zero in them implausible, but I don't have a copy of Excel to test
it.
>> Spreadsheet sum() is not the
>> same as mathematician's sum, which doesn't have a concept of "blank
>> cells". (But if it did, it would treat them as zero, since that's the
>> only useful thing and mathematicians are just as much pragmatists as
>> spreadsheet users.) The Excel code does the right thing, and your
>> "pure" solution would do the unwanted and unexpected thing and is
>> therefore buggy.
>
> Apparently, you don't use databases or make surface contours.
Neither databases nor surface contours are sum(). What possible relevance
are they to the question of what sum() should do?
Do you perhaps imagine that there is only "ONE POSSIBLE CORRECT WAY" to
deal with missing data, and every function and program must deal with it
the same way?
> Contour programs REQUIRE that blanks are null, not 0
Lucky for them that null is not 0 then.
> so that the Kriging
> algorithm interpolates around the holes rather than return false
> calculations. Excel's treatment of blank cells is inconsistent with
> Access' treatment of Nulls and therefore wrong, anyway you slice it.
No no no, you messed that sentence up. What you *really* meant was:
"Access' treatment of Nulls is inconsistent with Excel's treatment of
blank cells and therefore wrong, anyway you slice it."
No of course not. That would be stupid, just as stupid as your sentence.
Excel is not Access. They do different things. Why should they
necessarily interpret data the same way?
> Maybe you want to say a bug is when it doesn't do what the author
> intended, but I say if what the intention was is wrong, then a perfect
> implentation is still a bug because it doesn't do what it's supposed to
> do.
Who decides what it is supposed to do if not the author? You, in your
ivory tower who doesn't care a fig for what people want the software to
do?
Bug report: "Software does what users want it to do."
Fix: "Make the software do something that users don't want."
Great.
>> Bugs are defined by "does the code do what the user wants it to do?",
>> not "is it mathematically pure?".
>
> ReallY? So you think math IS a democracy? There is no reason to violate
> mathematical purity.
You've given a good example yourself: the Kriging algorithm needs a Null
value which is not zero. There is no mathematical "null" which is
distinct from zero, so there's an excellent violation of mathematical
purity right there.
If I am given the job of adding up the number of widgets inside a box,
and the box is empty, I answer that there are 0 widgets inside it. If I
were to follow your advice and declare that "An error occurred, can't
determine the number of widgets inside an empty box!" people would treat
me as an idiot, and rightly so.
> If I don't get EXACTLY the same answer from Excel,
> Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
> shame if that somebody was Python.
Well Excel, Python agree that the sum of an empty list is 0. What do
Access and Mathematica do?
>> The current behaviour of sum([]) does the right thing for the 99% of
>> the time when users expect an integer.
>
> Why shouldn't the users expect an exception? Isn't that why we have
> try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to be
> able to distinguish an empty list from [4,-4].
The way to distinguish lists is NOT to add them up and compare the sums:
>>> sum([4, -4]) == sum([0]) == sum([1, 2, 3, -6]) == sum([-1, 2, -1])
True
The correct way is by comparing the lists themselves:
>>> [] == [4, -4]
False
>> And the
>> rest of the time, they have to specify a starting value for the sum
>> anyway, and so sum([], initial_value) does the right thing *always*.
>
> So if you really want [] to be 0, why not say sum([],0)?
I don't want [] == 0. That's foolish. I want the sum of an empty list to
be 0, which is a very different thing.
And I don't need to say sum([],0) because the default value for the
second argument is 0.
> Why shouldn't nothing added to nothing return nothing? Having it
> evaluate to 0 is wrong 99.9% of the time.
It is to laugh.
What's the difference between having 0 widgets in a box and having an
empty box with, er, no widgets in it?
--
Steven
Maybe it's important to know data is missing. You can see
the holes in a graph. You can't see the holes in a sum.
>
> > and not evaluated as 0
>
> > And Microsoft provides a workaround for graphs to make 0's appear as
> > holes. Of course, this will cause legitimate 0 values to disappear, so
> > the workaround is inconsistent.
>
> I'm not aware of any spreadsheet that treats empty cells as zero for the
> purpose of graphing, and I find your claim that Excel can't draw graphs
> with zero in them implausible, but I don't have a copy of Excel to test
> it.
That was a mistake. I made a followup correction, but
you probably didn't see it.
>
> >> Spreadsheet sum() is not the
> >> same as mathematician's sum, which doesn't have a concept of "blank
> >> cells". (But if it did, it would treat them as zero, since that's the
> >> only useful thing and mathematicians are just as much pragmatists as
> >> spreadsheet users.) The Excel code does the right thing, and your
> >> "pure" solution would do the unwanted and unexpected thing and is
> >> therefore buggy.
>
> > Apparently, you don't use databases or make surface contours.
>
> Neither databases nor surface contours are sum(). What possible relevance
> are they to the question of what sum() should do?
Because a sum that includes Nulls isn't valid. If you treated
Nulls as 0, then not only would your sum be wrong, but so
would your count and the average based on those. Now you
can EXPLICITLY tell the database to only consider non-Null
values, which doesn't change the total, but DOES change
the count.
>
> Do you perhaps imagine that there is only "ONE POSSIBLE CORRECT WAY" to
> deal with missing data, and every function and program must deal with it
> the same way?
But that's what sum() is doing now, treating sum([]) the same
as sum([],0). Why isn't sum() defined such that "...if list
is empty, return start, IF SPECIFIED, otherwise raise exception."
Then, instead of "ONE POSSIBLE CORRECT WAY", the user could
specify whether he wants Excel compatible behaviour or
Access compatible behaviour.
>
> > Contour programs REQUIRE that blanks are null, not 0
>
> Lucky for them that null is not 0 then.
No, but blank cells are 0 as far as Excel is concerned.
That behaviour causes nothing but trouble and I am
saddened to see Python emulate such nonsense.
>
> > so that the Kriging
> > algorithm interpolates around the holes rather than return false
> > calculations. Excel's treatment of blank cells is inconsistent with
> > Access' treatment of Nulls and therefore wrong, anyway you slice it.
>
> No no no, you messed that sentence up. What you *really* meant was:
>
> "Access' treatment of Nulls is inconsistent with Excel's treatment of
> blank cells and therefore wrong, anyway you slice it."
>
> No of course not. That would be stupid, just as stupid as your sentence.
> Excel is not Access. They do different things. Why should they
> necessarily interpret data the same way?
Because you want consistent results?
>
> > Maybe you want to say a bug is when it doesn't do what the author
> > intended, but I say if what the intention was is wrong, then a perfect
> > implentation is still a bug because it doesn't do what it's supposed to
> > do.
>
> Who decides what it is supposed to do if not the author?
The author can't change math on a whim.
> You, in your ivory tower who doesn't care a fig for
> what people want the software to do?
True, I could care less what peole want to do...
...as long as they do it consistently.
>
> Bug report: "Software does what users want it to do."
> Fix: "Make the software do something that users don't want."
What the users want doesn't carry any weight with respect
to what the database wants. The user must conform to the
needs of the database because the other way ain't ever gonna
happen.
>
> Great.
If only. But then, I probably wouldn't have a job.
>
> >> Bugs are defined by "does the code do what the user wants it to do?",
> >> not "is it mathematically pure?".
>
> > ReallY? So you think math IS a democracy? There is no reason to violate
> > mathematical purity.
>
> You've given a good example yourself: the Kriging algorithm needs a Null
> value which is not zero. There is no mathematical "null" which is
> distinct from zero, so there's an excellent violation of mathematical
> purity right there.
Hey, I was talking databases, you brought up mathematical purity.
>
> If I am given the job of adding up the number of widgets inside a box,
> and the box is empty, I answer that there are 0 widgets inside it.
Right. it has a known quantity and that quantity is 0.
Just because the box is empty doesn't mean the quantity
is Null.
> If I
> were to follow your advice and declare that "An error occurred, can't
> determine the number of widgets inside an empty box!" people would treat
> me as an idiot, and rightly so.
Right. But a better analogy is when a new shipment is due
but hasn't arrived yet so the quantity is unknown. Now the
boss comes up and says he needs to ship 5 widgets tomorrow
and asks how many you have. You say 0. Now the boss runs
out to Joe's Widget Emporium and pays retail only to discover
when he gets back that the shipment has arrived containing
12 widgets. Because you didn't say "I don't know, today's
shipment isn't here yet", the boss not only thinks you're
an idiot, but he fires you as well.
>
> > If I don't get EXACTLY the same answer from Excel,
> > Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
> > shame if that somebody was Python.
>
> Well Excel, Python agree that the sum of an empty list is 0. What do
> Access and Mathematica do?
I don't know abaout Mathmatica, but if you EXPLICITLY
tell Access to sum only the non-Null values, you'll get the
same answer Excel does. Otherwise, any expression that
includes a Null evaluates to Null, which certainly isn't
the same answer Excel gives.
>
> >> The current behaviour of sum([]) does the right thing for the 99% of
> >> the time when users expect an integer.
>
> > Why shouldn't the users expect an exception? Isn't that why we have
> > try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to be
> > able to distinguish an empty list from [4,-4].
>
> The way to distinguish lists is NOT to add them up and compare the sums:
>
> >>> sum([4, -4]) == sum([0]) == sum([1, 2, 3, -6]) == sum([-1, 2, -1])
>
> True
>
> The correct way is by comparing the lists themselves:
>
> >>> [] == [4, -4]
>
> False
>
> >> And the
> >> rest of the time, they have to specify a starting value for the sum
> >> anyway, and so sum([], initial_value) does the right thing *always*.
>
> > So if you really want [] to be 0, why not say sum([],0)?
>
> I don't want [] == 0. That's foolish. I want the sum of an empty list to
> be 0, which is a very different thing.
In certain circumstances. In others, an empty list summing
to 0 is just as foolish. That's why sum([]) should be an
error, so you can have it either way.
Isn't one of Python's slogans "Explicit is better than implicit"?
>
> And I don't need to say sum([],0) because the default value for the
> second argument is 0.
That's the problem. There is no justification for assuming
that unknown quantities are 0.
>
> > Why shouldn't nothing added to nothing return nothing? Having it
> > evaluate to 0 is wrong 99.9% of the time.
>
> It is to laugh.
>
> What's the difference between having 0 widgets in a box and having an
> empty box with, er, no widgets in it?
There are no "empty" boxes. There are only boxes with
known quantities and those with unknown quantities.
I hope that's not too ivory tower.
>
> --
> Steven
> No, but blank cells are 0 as far as Excel is concerned.
> That behaviour causes nothing but trouble and I am
> saddened to see Python emulate such nonsense.
Then you should feel glad that the Python sum() function *does*
signal an error for the closest equivalent of "blank cells" in
a list:
>>> sum([1, 2, 3, None, 5, 6])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Summing the elements of an empty list is *not* the same thing as
summing elements of a list where one element is None.
> There are no "empty" boxes. There are only boxes with
> known quantities and those with unknown quantities.
> I hope that's not too ivory tower.
The sum() function in Python requires exactly one box. That box
can be empty, can contain "known quantities" (numbers, presumably),
or "unknown quantities" (non-numbers, e.g., None). But you can't
give it zero boxes, or three boxes.
I don't have a strong view of whether sum([]) should return 0 or
raise an error, but please do not mix that question up with what
a sum over empty cells or over NULL values should yield. They
are very different questions.
As it happens, the SQL sum() function (at least in MySQL; I don't
have any other database easily available, nor any SQL standard to
read) does return NULL for a sum over the empty sequence, so you
could argue that that would be the correct behaviour for the
Python sum() function as well, but you can't argue that because a
sum *involving* a NULL value returns NULL.
--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"This isn't right. This isn't even wrong." ! bellman @ lysator.liu.se
-- Wolfgang Pauli ! Make Love -- Nicht Wahr!
> On Sep 3, 2:18 pm, Laszlo Nagy <gand...@shopzeus.com> wrote:
> > bearophileH...@lycos.com wrote:
> > > Empty Python lists [] don't know the type of the items it will
> > > contain, so this sounds strange:
> >
> > >>>> sum([])
> >
> > > 0
> >
> > > Because that [] may be an empty sequence of someobject:
> >
> > You are right in that sum could be used to sum arbitrary objects.
> > However, in 99.99% of the cases, you will be summing numerical values.
> > When adding real numbers, the neutral element is zero. ( X + 0 = X) It
> > is very logical to return zero for empty sequences.
>
> No it isn't. Nothing is not 0, check with MS-Access, for instance:
>
> Null + 1 returns Null. Any arithmetic expression involving a
> Null evaluates to Null. Adding something to an unknown returns
> an unknown, as it should.
>
> It is a logical fallacy to equate unknown with 0.
Which has nothing to do with the "right" value for an
empty sum. If they hear about what you said here in
sci.math they're gonna kick you out - what do you
imagine the universally accepted value of \sum_{j=1}^0
is?
--
David C. Ullrich
> Empty Python lists [] don't know the type of the items it will
> contain, so this sounds strange:
>
> >>> sum([])
> 0
>
> Because that [] may be an empty sequence of someobject:
>
> >>> sum(s for s in ["a", "b"] if len(s) > 2)
> 0
>
> In a statically typed language in that situation you may answer the
> initializer value of the type of the items of the list, as I do in the
> sum() in D.
>
> This sounds like a more correct/clean thing to do:
>
> >>> max([])
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: max() arg is an empty sequence
>
> So it may be better to make the sum([]) too raise a ValueError,
I don't see why you feel the two should act the same.
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
And both for good reason:
(i) If A and B are disjoint sets we certainly want to
have sum(A union B) = sum(A) + sum(B). This requires
sum(empty set) = 0.
(ii) If A is a subset of B then we should have
max(A) <= max(B). This requires that max(empty set)
be something that's smaller than everything else.
So we give up on that.
> in
> Python 3/3.1 (if this isn't already true). On the other hand often
> enough I have code like this:
>
> >>> max(fun(x) for x in iterable if predicate(x))
>
> This may raise the ValueError both if iterable is empty of if the
> predicate on its items is always false, so instead of catching
> exceptions, that I try to avoid, I usually end with a normal loop,
> that's readable and fast:
>
> max_value = smallvalue
> for x in iterable:
> if predicate(x):
> max_value = max(max_value, fun(x))
>
> Where running speed matters, I may even replace that max(max_value,
> fun(x)) with a more normal if/else.
>
> A possible alternative is to add a default to max(), like the next()
> built-in of Python 2.6:
>
> >>> max((fun(x) for x in iterable if predicate(x)), default=smallvalue)
>
> This returns smallvalue if there are no items to compute the max of.
>
> Bye,
> bearophile
--
David C. Ullrich
I'm less concerned about the "right" value than a consistent
value. I'm fairly certain you can't get 0 from a query that
returns no records, so I don't like seeing empty being
treated as 0, even if it means that in set theory because
databases aren't sets.
> If they hear about what you said here in
> sci.math they're gonna kick you out
They usually don't kick me out, just kick me.
> - what do you
> imagine the universally accepted value of \sum_{j=1}^0
> is?
I can't follow your banter, so I'm not sure what it should be.
Yes, I am in fact happy to see that behaviour.
>
> Summing the elements of an empty list is *not* the same thing as
> summing elements of a list where one element is None.
So,
>>> sum([1, 2, 3, None, 5, 6])
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
sum([1, 2, 3, None, 5, 6])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
gives me an error.
As does
>>> sum([None, None, None, None, None, None])
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
sum([None, None, None, None, None, None])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Why then, doesn't
>>> sum([A for A in [None, None, None, None, None, None] if A != None])
0
give me an error?
Ok, it's not a bug.
"This behaviour is by design." - Microsoft Knowledge Base
I don't like it, but I guess I'll just have to live with it.
>
> > There are no "empty" boxes. There are only boxes with
> > known quantities and those with unknown quantities.
> > I hope that's not too ivory tower.
>
> The sum() function in Python requires exactly one box. That box
> can be empty, can contain "known quantities" (numbers, presumably),
> or "unknown quantities" (non-numbers, e.g., None). But you can't
> give it zero boxes, or three boxes.
>
> I don't have a strong view of whether sum([]) should return 0 or
> raise an error, but please do not mix that question up with what
> a sum over empty cells or over NULL values should yield. They
> are very different questions.
Ok, but I don't understand why an empty list is a valid sum
whereas a list containing None is not.
>
> As it happens, the SQL sum() function (at least in MySQL; I don't
> have any other database easily available, nor any SQL standard to
> read) does return NULL for a sum over the empty sequence, so you
> could argue that that would be the correct behaviour for the
> Python sum() function as well, but you can't argue that because a
> sum *involving* a NULL value returns NULL.
I'm not following that. Are you saying a query that returns no
records doesn't have a specific field containg a Null so there
are no Nulls to poison the sum? ...tap...tap...tap. Ok, I can see
that, but you don't get 0 either.
> Why then, doesn't
>
>>>> sum([A for A in [None, None, None, None, None, None] if A != None])
> 0
>
> give me an error?
Because "[A for A in [None, None, None, None, None, None] if A != None]"
returns an empty list, and sum([]) doesn't return an error. What did you
expect?
--
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?
Bye,
bearophile
For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S. The reason
sum([]) is 0 is that sum( [ x ] ) - x = 0.
Sometimes that can be done, but in many other situations it's less
easy, like in the example I have shown in my first post:
max((fun(x) for x in iterable if predicate(x)))
There are some ways to add the max there, for example using an
itertools.chain to chan the default value to the end of the iterable,
but most of the time I just write a for loop.
Bye,
bearophile
> Ok, but I don't understand why an empty list is a valid sum
> whereas a list containing None is not.
You can't conclude the behaviour of the one from the behaviour
of the other, because the two situations have nothing at all in
common.
>> As it happens, the SQL sum() function (at least in MySQL; I don't
>> have any other database easily available, nor any SQL standard to
>> read) does return NULL for a sum over the empty sequence, so you
>> could argue that that would be the correct behaviour for the
>> Python sum() function as well, but you can't argue that because a
>> sum *involving* a NULL value returns NULL.
> I'm not following that. Are you saying a query that returns no
> records doesn't have a specific field containg a Null so there
> are no Nulls to poison the sum? ...tap...tap...tap. Ok, I can see
> that,
Exactly.
> but you don't get 0 either.
That's because the SQL sum() has a special case for "no rows
returned". A *different* special case than the one that taint's
the sum when encountering a NULL. It does the equivalent of
if len(rows_returned) == 0:
# Special case for no rows returned
return NULL
total = 0
for row in rows_returned:
value = row[column]
if value is NULL:
# Special case for encountering a NULL value
return NULL
total += value
return total
Two different special cases for the two different situations. If
you were to remove the special case for no rows returned, you
would get zero when the SELECT statement finds no rows, but the
sum would still be tainted when a NULL value is encountered..
The definition of sum in mathematics *does* do away with that
special case. The sum of zero terms is zero. And the Python
sum() function follows the mathematics definition in this
respect, not the SQL definition.
You can argue that Python sum() should have special cased the
empty sequence. It's not an illogical stance to take. It's just
a totally different issue from encountering a non-numeric element
in the sequence. In some cases it might actually make sense to
treat the empty sequence as an error, but just ignore non-numeric
elements (i.e, treat them as if they were zero). And in some
cases both should be an error, and in some neither should be an
error.
--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"You are in a twisty little passage of ! bellman @ lysator.liu.se
standards, all conflicting." ! Make Love -- Nicht Wahr!
I wouldn't say they have nothing in common. After all, neither []
nor [None,None] contain any integers, yet summing the first gives
us an integer result whereas summing the second does not. Yes, for
different unrelated reasons, but sometimes reasons aren't as important
as results.
Too bad. I brought this up because I use Python a lot with
database work and rarely for proving theorms in ZFC.
Guess I have to work around it. Just one more 'gotcha' to
keep track of.
>
> You can argue that Python sum() should have special cased the
> empty sequence.
I did.
> It's not an illogical stance to take.
I didn't think so.
> It's just
> a totally different issue from encountering a non-numeric element
> in the sequence.
Ok, I was paying more attention to the outcome.
> In some cases it might actually make sense to
> treat the empty sequence as an error, but just ignore non-numeric
> elements (i.e, treat them as if they were zero).
Ouch. Sounds like Excel and we don't want to go there.
It makes sense.
>The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
It doesn't make sense to me. What do you set x to?
I suppose the following is accepted by statisticians. Here,
for reference, here is the what the 'R' statistic package
says on the subject 9if you type 'help(sum)'
<quote>
Sum of Vector Elements
Description
sum returns the sum of all the values present in its arguments.
Usage
sum(..., na.rm = FALSE)
Arguments
... numeric or complex or logical vectors.
na.rm logical. Should missing values be removed?
Details
This is a generic function: methods can be defined for it directly or
via the Summary group generic. For this to work properly, the arguments
... should be unnamed, and dispatch is on the first argument.
If na.rm is FALSE an NA value in any of the arguments will cause a value
of NA to be returned, otherwise NA values are ignored.
Logical true values are regarded as one, false values as zero. For
historical reasons, NULL is accepted and treated as if it were integer(0).
Value
The sum. If all of ... are of type integer or logical, then the sum is
integer, and in that case the result will be NA (with a warning) if
integer overflow occurs. Otherwise it is a length-one numeric or complex
vector.
NB: the sum of an empty set is zero, by definition.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S
Language. Wadsworth & Brooks/Cole.
>
> I don't see why you feel the two should act the same.
> At least in mathematics, the sum of the elements of
> the empty set _is_ 0, while the maximum element of the
> empty set is undefined.
>
> And both for good reason:
>
> (i) If A and B are disjoint sets we certainly want to
> have sum(A union B) = sum(A) + sum(B). This requires
> sum(empty set) = 0.
>
> (ii) If A is a subset of B then we should have
> max(A) <= max(B). This requires that max(empty set)
> be something that's smaller than everything else.
> So we give up on that.
Do we give up? Really ?
From wikipedia: http://en.wikipedia.org/wiki/Empty_set
(Uses wikipedia's LaTeX notation -- I hope those interested
are OK with that )
<quote>
Mathematics
[edit] Extended real numbers
Since the empty set has no members, when it is considered as a subset of
any ordered set, then any member of that set will be an upper bound and
lower bound for the empty set. For example, when considered as a subset
of the real numbers, with its usual ordering, represented by the real
number line, every real number is both an upper and lower bound for the
empty set.[3] When considered as a subset of the extended reals formed
by adding two "numbers" or "points" to the real numbers, namely negative
infinity, denoted -\infty\!\,, which is defined to be less than every
other extended real number, and positive infinity, denoted +\infty\!\,,
which is defined to be greater than every other extended real number, then:
\sup\varnothing=\min(\{-\infty, +\infty \} \cup \mathbb{R})=-\infty,
and
\inf\varnothing=\max(\{-\infty, +\infty \} \cup \mathbb{R})=+\infty.
That is, the least upper bound (sup or supremum) of the empty set is
negative infinity, while the greatest lower bound (inf or infimum) is
positive infinity. By analogy with the above, in the domain of the
extended reals, negative infinity is the identity element for the
maximum and supremum operators, while positive infinity is the identity
element for minimum and infimum.
> David C. Ullrich wrote:
>
> >
> > I don't see why you feel the two should act the same.
> > At least in mathematics, the sum of the elements of
> > the empty set _is_ 0, while the maximum element of the
> > empty set is undefined.
> >
> > And both for good reason:
> >
> > (i) If A and B are disjoint sets we certainly want to
> > have sum(A union B) = sum(A) + sum(B). This requires
> > sum(empty set) = 0.
> >
> > (ii) If A is a subset of B then we should have
> > max(A) <= max(B). This requires that max(empty set)
> > be something that's smaller than everything else.
> > So we give up on that.
>
> Do we give up? Really ?
Erm, thanks. I was aware of all that below. If we're
being technical what's below is talking about the sup
and inf, which are not the same as max and min. More
relevant to the present context, I didn't mention what's
below because it doesn't seem likely that saying max([])
= -infinity and min([]) = +infinity is going to make the
OP happy...
--
David C. Ullrich
How the Python max and min functions should work has to
do with how people want them to work and how people expect
them to work. I wouldn't know about most people, but I
would have been surprised if min([]) was not an error,
and I would have been disappointed if sum([]) was not 0.
From a mathematical point of view, not that that's directly
relevant, it doesn't make much sense to me to add that default
argument. The max of a set is supposed to be the largest
element of that set. If the set is empty there's no such
thing.
In Python you'd better make sure that S is nonempty before
asking for max(S). That's not just Python - in math you need
to make certain that S is nonempty and also other conditions
before you're allowed to talk about max(S). That's just the
way it is.
Think about all the previously elected female or black
presidents of the US. Which one was the tallest?
Well, it sounds cute having Neginfinite and Infinite as built-int
objects that can be compared to any other type and are < of or > of
everything else but themselves. Probably they can be useful as
sentinels, but in Python I nearly never use sentinels anymore, and
they can probably give some other problems...
Bye,
bearophile
Sure, and in most cases I use Visual Basic for Applications
when I need functionality I can't get directly from SQL.
But anybody who's used VBA with Access must know what a PITA
it is. And even when you get it working, you sometimes wish you
hadn't. I have a Mann-Kendall trend analysis that must be done
quarterly on over 150 combinations of well:analyte. It takes
over 6 hours to process this (and I don't know how much is due to
VBA, Access, server, network, etc.). It's something I'd love to
try in Python (if I can find the time to translate it).
But I'm wary of things that Python might do (such as return 0
when summing an empty list) that SQL/VBA does not.
> --
> Wulfraed Dennis Lee Bieber KD6MOG
> wlfr...@ix.netcom.com wulfr...@bestiaria.com
> HTTP://wlfraed.home.netcom.com/
> (Bestiaria Support Staff: web-a...@bestiaria.com)
> HTTP://www.bestiaria.com/
Of course you were aware, I have seen enough of your posts
to know that. And I agree that, whatever Wikipedia seems to
imply, max and supremum should be distiguished.
It was your prelude, "At least in mathematics ..." that
made me prick up my ears. So I couldn't resist responding,
without _any_ malice I assure you.
Cheers,
Ken.
> Think about all the previously elected female or black presidents of the
> US. Which one was the tallest?
I know the answer to that one:
All of them!
--
Steven
But then how can you conclude sum([]) = 0 from there? It's way far
from obvious.
> On Fri, Sep 5, 2008 at 1:04 PM, castironpi <casti...@gmail.com> wrote:
...
>>> >The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
>>>
>>> It doesn't make sense to me. What do you set x to?
>>
>> For all x.
>
> But then how can you conclude sum([]) = 0 from there? It's way far from
> obvious.
I think Castironpi's reasoning is to imagine taking sum([x])-x for *any*
possible x (where subtraction and addition is defined). Naturally you
always get 0.
Now replace x by *nothing at all* and you get:
sum([]) "subtract nothing at all" = 0
I think that this is a reasonable way to *informally* think about the
question, but it's not mathematically sound, because if you replace x
with "nothing at all" you either get:
sum([]) - = 0
which is invalid (only one operand to the subtraction operator), or you
get:
sum([0]) - 0 = 0
which doesn't involve an empty list. What castironpi seems to be doing is
replacing "nothing at all" with, er, nothing at all in one place, and
zero in the other. And that's what makes it unsound and only suitable as
an informal argument.
[The rest of this is (mostly) aimed at Mensanator, so others can stop
reading if they like.]
Fundamentally, the abstract function "sum" and the concrete Python
implementation of sum() are both human constructs. It's not like there is
some pure Platonic[1] "Ideal Sum" floating in space that we can refer to.
Somewhere, sometime, some mathematician had to *define* sum(), and other
mathematicians had to agree to use the same definition.
They could have decided that sum must take at least two arguments,
because addition requires two arguments and it's meaningless to talk
about adding a single number without talking about adding it to something
else. But they didn't. Similarly, they might have decided that sum must
take at least one argument, and therefore prohibit sum([]), but they
didn't: it's more useful for sum of the empty list to give zero than it
is for it to be an error. As I mentioned earlier, mathematicians are
nothing if not pragmatists.
[1] Or was it Aristotle who believed in Ideal Forms? No, I'm sure it was
Plato.
--
Steven
Actually it's even more natural to state sum([x]) = x, and this way
you can never conclude that sum([]) = 0 from there.
You can define sum([a1,a2,...,aN]) recursively as
sum([a1,a2,...a(N-1)])+aN. Call the sum sum([a1,a2,...,aN]) "X", then
subtract aN.
sum([a1,a2,...a(N-1)])+aN=X
sum([a1,a2,...a(N-1)])+aN-aN=X-aN
For N=2, we have:
sum([a1,a2])=X
sum([a1,a2])-a2=X-a2
sum([a1,a2])-a2-a1=X-a2-a1
Since X= a1+ a2, replace X.
sum([a1,a2])-a2-a1=(a1+a2)-a2-a1
Or,
sum([a1,a2])-a2-a1=0
Apply the recursive definition:
sum([a1])+a2-a2-a1=0
And again:
sum([])+a1+a2-a2-a1=0
And we have:
sum([])=0.
> Actually it's even more natural to state sum([x]) = x, and this way you
> can never conclude that sum([]) = 0 from there.
But what you can say is that for any list L, sum(L) = sum(L + [0]).
Therefore sum([]) = sum([] +[0]) = 0
--
Steven
It makes more sense now, I just wanted to point out that only with
sum([x]) = x, you can't get sum([]) = 0.
This is not necessarily so.
The flaw is that you provide a recursive definition with no start value,
which is to say it is not a recursive definition at all.
A recursive definition should be (for lists where elements
can be added, and ignoring pythonic negative indexing):
Define 'sum(L)' by
a. sum(L[0]) = L[0]
b. sum(L[0:i]) = sum(L[0:i-1]) + L[i] ... if i > 0
From this you can prove the reverse recursion
sum{L[0:k]) = sum(L[0:k+1]) - L[k+1]
__only__ if k >= 0
It says nothing about the empty list.
You could add, as part of the definition, that sum{[]) = 0, or any other
value.
A rather different approach, not quite simple recursion, would be to
start with
A. a slicing axiom, something like:
for all non-negative integers, a,b,c with a <=b <= c:
sum(L[a:c]) = sum(L[a:b]) + sum(L[b:c])
B. a singleton axiom:
for all integers a where L[a] exists:
sum(L[a:a]) = L[a]
2a. sum{
This is not necessarily so.
The flaw is that you provide a recursive definition with no start value,
which is to say it is not a recursive definition at all.
A recursive definition should be (for lists where elements
can be added, and ignoring pythonic negative indexing):
Define 'sum(L)' by
a. sum(L[0:1]) = L[0]
b. sum(L[0:i]) = sum(L[0:i-1]) + L[i] ... if i > 1
Yep. The way it is preserves the distributive property
sum(a+b) = sum(a) + sum(b)
This would matter in cases like (untested code..)
suvsales = sum (sum (s.price for s in d.sales if s.class='suv') for d in
districts)
Mel.
<snip>
> [The rest of this is (mostly) aimed at Mensanator,
Ok, I see where you're coming from.
> Fundamentally, the abstract function "sum" and the concrete Python
> implementation of sum() are both human constructs. It's not like there is
> some pure Platonic[1] "Ideal Sum" floating in space that we can refer to.
> Somewhere, sometime, some mathematician had to *define* sum(), and other
> mathematicians had to agree to use the same definition.
>
> They could have decided that sum must take at least two arguments,
> because addition requires two arguments and it's meaningless to talk
> about adding a single number without talking about adding it to something
> else. But they didn't.
Ok. But the problem is they DID in SQL: x + Null = Null.
Earlier, you said that an empty box contains 0 widgets.
Fine, empty means 0. But Null doesn't mean empty. Say
your widget supplier just delivers a box and you haven't
opened it yet. Is the box likely to be empty? Probably
not, or they wouldn't have shipped it. In this case,
Null means "unknown", not 0. The number of widgets you
have on hand is Null (unknown) because inventory + Null = Null.
SQL will correctly tell you that the amount on hand is unknown,
whereas Python will tell you the amount on hand is inventory,
which is incorrect.
> Similarly, they might have decided that sum must
> take at least one argument, and therefore prohibit sum([]), but they
> didn't: it's more useful for sum of the empty list to give zero than it
> is for it to be an error. As I mentioned earlier, mathematicians are
> nothing if not pragmatists.
>
Here's a real world example (no ivory tower stuff):
An oil refinery client has just excavated a big pile of
dirt to lay a new pipeline. Due to the volume of the
pipe, there's dirt left over. Ideally, the client
would like to use that dirt as landfill (free), but it
must be tested for HAPS (by summing the concentrations of
organic constituents) to see whether it is considered
hazardous waste, it which cas it must be taken off site
and incinerated (costly).
In MOST cases, a HAPS sum of 0 would be illegal because
0's generally cannot be reported in analytical tests,
you can't report a result less than it's legal reporting
limit. If ALL the consituents were undetected, the sum
should be that of the sum of the reporting limits, thus,
it cannot be 0.
Can't I just use a sum of 0 to tell me when data is missing?
No, because in some cases the reporting limit of undetected
compounds is set to 0.
In which case, a 0 HAPS score means we can confidently
reccomend that the dirt is clean and can be freely reused.
But if the analysis information is missing (hasn'r arrived
yet or still pending validation) we WANT the result to be
UNKNOWN so that we don't reccomend to the client that he take
an illegal course of action.
In this case, SQL does the correct thing and Python would
return a false result.
> --
> Steven
>Think about all the previously elected female or black
>presidents of the US. Which one was the tallest?
The current King of France?
- Hendrik
[...]
>> They could have decided that sum must take at least two arguments,
>> because addition requires two arguments and it's meaningless to talk
>> about adding a single number without talking about adding it to
>> something else. But they didn't.
>
> Ok. But the problem is they DID in SQL: x + Null = Null.
Sheesh. That's not a problem, because Python is not trying to be a
dialect of SQL.
If you want a NULL object, then there are recipes on the web that will
give you one. Then all you need to do is call sum(alist or [NULL]) and it
will give you the behaviour you want.
[...]
> Here's a real world example (no ivory tower stuff):
>
> An oil refinery client has just excavated a big pile of dirt to lay a
> new pipeline.
[snip details]
> Can't I just use a sum of 0 to tell me when data is missing? No, because
> in some cases the reporting limit of undetected compounds is set to 0.
You can't use a sum of 0 to indicate when data is missing, full stop. The
data may require 15 tests when only 3 have actually been done:
sum([1.2e-7, 9.34e-6, 2.06e-8])
Missing data and a non-zero sum. How should sum() deal with that?
The answer is that sum() can't deal with that. You can't expect sum() to
read your mind, know that there should be 15 items instead of 3, and
raise an error. So why do you expect sum() to read your mind and
magically know that zero items is an error, especially when for many
applications it is NOT an error?
The behaviour you want for this specific application is unwanted,
unnecessary and even undesirable for many other applications. The
solution is for *you* to write application-specific code to do what your
application needs, instead of relying on a general purpose function
magically knowing what you want.
--
Steven
And yet, they added a Sqlite3 module.
>
> If you want a NULL object, then there are recipes on the web that will
> give you one. Then all you need to do is call sum(alist or [NULL]) and it
> will give you the behaviour you want.
Actualy, I already get the behaviour I want. sum([1,None])
throws an exception. I don't see why sum([]) doesn't throw
an exception also (I understand that behaviour is by design,
I'm merely pointing out that the design doesn't cover every
situation).
>
> [...]
>
> > Here's a real world example (no ivory tower stuff):
>
> > An oil refinery client has just excavated a big pile of dirt to lay a
> > new pipeline.
> [snip details]
> > Can't I just use a sum of 0 to tell me when data is missing? No, because
> > in some cases the reporting limit of undetected compounds is set to 0.
>
> You can't use a sum of 0 to indicate when data is missing, full stop.
Exactly. That's why I would prefer sum([]) to raise an
exception instead of giving a false positive.
> The
> data may require 15 tests when only 3 have actually been done:
>
> sum([1.2e-7, 9.34e-6, 2.06e-8])
Biggest problem here is that it is often unknown just
how many records you're supposed to get from the query,
so we can't tell that a count of 3 is supposed to be 15.
>
> Missing data and a non-zero sum. How should sum() deal with that?
That's a seperate issue and I'm not saying it should as
long as the list contains actual numbers to sum.
sum([1.2e-7, 9.34e-6, 2.06e-8, None]) will raise an
exception, as it should. But what types are contained
in []?
>
> The answer is that sum() can't deal with that. You can't expect sum() to
> read your mind, know that there should be 15 items instead of 3, and
> raise an error. So why do you expect sum() to read your mind and
> magically know that zero items is an error, especially when for many
> applications it is NOT an error?
For the simple reason it doesn't have to read your mind,
a mechanism has already been built into the function: start
value. For those situations where an empty list is desired
to sum to 0, you could use sum(alist,0) and use sum(alist) for
those cases where summing an empty list is meaningless.
Shouldn't you have to explicitly tell sum() how deal with
situations like empty lists rather than have it implicitly
assume a starting value of 0 when you didn't ask for it?
>
> The behaviour you want for this specific application is unwanted,
> unnecessary and even undesirable for many other applications. The
> solution is for *you* to write application-specific code to do what your
> application needs, instead of relying on a general purpose function
> magically knowing what you want.
Does division magically know what you want? No, it raises an
exception when you do something like divide by 0. Isn't it
Pythonic to not write a litany of tests to cover every
possible case, but instead use try:except?
But try:except only works if the errors are recognized.
And sum() says that summing an empty list is NEVER an error
under ANY circumstance. That may be true in MOST cases, but
it certainly isn't true in ALL cases.
>
> --
> Steven
Does that mean that, because there is an 'os' module, Python is trying
to compete with Linux and Windows?
This is starting to feel like a troll, but JUST IN CASE you are really
serious about wanting to get work done with Python, rather than
complaining about how it is not perfect, I offer the following snippet
which will show you how you can test the results of a sum() to see if
there were any items in the list:
>>> class MyZero(int):
... pass
...
>>> zero = MyZero()
>>> x=sum([], zero)
>>> isinstance(x,MyZero)
True
>>> x = sum([1,2,3], zero)
>>> isinstance(x,MyZero)
False
>>>
> Actualy, I already get the behaviour I want. sum([1,None])
> throws an exception. I don't see why sum([]) doesn't throw
> an exception also (I understand that behaviour is by design,
> I'm merely pointing out that the design doesn't cover every
> situation).
[...]
> Exactly. That's why I would prefer sum([]) to raise an
> exception instead of giving a false positive.
The built in behavior can't be good for every usage. Nobody prevents you from defining yoru own function tailored to your own specs, like this:
def strict_sum(items):
items = iter(items)
try:
first = items.next()
except StopIteration:
raise ValueError, "strict_sum with empty argument"
return sum(items, first)
Tweak as needed. Based on other posts I believe your Python skills are enough to write it on your own, so I don't see why you're complaining so hard about the current behavior.
--
Gabriel Genellina
I'm not complaining about the behaviour anymore, I just don't like
being told I'm wrong when I'm not.
But I think I've made my point, so there's no point in harping on
this anymore.
>
> --
> Gabriel Genellina
I wasn't thinking "compete", rather "complement". Python obviously
wants to be a player in the SQL market, so you would think it
would be in Python's interest to know how SQL behaves, just as it's in
Python's interest for the os module to know how BOTH Linnux and
Windows work.
>
> This is starting to feel like a troll,
It wasn't intended to be.
> but JUST IN CASE you are really
> serious about wanting to get work done with Python, rather than
> complaining about how it is not perfect,
Things never change if no one ever speaks up.
> I offer the following snippet
> which will show you how you can test the results of a sum() to see if
> there were any items in the list:
Thanks. I'll drop this from this point on.
>
> >>> class MyZero(int):
>
> ... � � pass
> ...
>
>
>
> >>> zero = MyZero()
> >>> x=sum([], zero)
> >>> isinstance(x,MyZero)
> True
> >>> x = sum([1,2,3], zero)
> >>> isinstance(x,MyZero)
> False- Hide quoted text -
>
> - Show quoted text -
But that's only half the story. The other half is data returned
as a result of SQL queries. And that's something Python DOES process.
And sometimes that processed data has to be inserted back into the
database. We certainly don't want Python to process the data in a way
that the database doesn't expect.
When I see a potential flaw (such as summing an empty list to 0),
should I just keep quiet about it, or let everyone know?
Well, now they know, so I'll shut up about this from now on, ok?
Er, what about instances of variations/elaborations on
class Smaller(object) : __cmp__ = lambda *_ : -1
?
Cheers, BB
You still don't have the property max(X) is in X.
And it's the equivalent of a special builtin constant for max on the
empty set.
Frankly, I would favor order-independence over that property.
compare max(X) for
1) X = [set([1]),set([2])]
and
2) X = [set([2]),set([1])]
Shouldn't then max and min in fact return lub and glb, despite their names ? In
the case X is a non-empty finite set/list of totally ordered values,
max(X)==lub(X) and min(X)=glb(X) in any case.
>
> And it's the equivalent of a special builtin constant for max on the
> empty set.
Of course (except the object might have other uses, who knows). So what ?
Cheers, BB
> David C. Ullrich:
> > I didn't mention what's below because it doesn't seem
> > likely that saying max([]) = -infinity and
> > min([]) = +infinity is going to make the OP happy...
>
> Well, it sounds cute having Neginfinite and Infinite as built-int
> objects that can be compared to any other type and are < of or > of
> everything else but themselves.
Like I said, I'm not going to say anything about how Python
should be. If I were going to comment on that I'd say it would
be cute but possibly silly to actually add to the core.
But in the math library I made some time ago there was an
AbsoluteZero with the property that when you added it to
x you got x for any x whatever (got used as the default
additive identity for classes that didn't have an
add_id defined...)
> Probably they can be useful as
> sentinels, but in Python I nearly never use sentinels anymore, and
> they can probably give some other problems...
>
> Bye,
> bearophile
--
David C. Ullrich
Heh. Mysteries of the empty set.
--
David C. Ullrich
> Actualy, I already get the behaviour I want. sum([1,None])
> throws an exception. I don't see why sum([]) doesn't throw
> an exception also
If you take a "start value" and add to it every element of a list, should the
process fail if the list is empty? If you don't add anything to the start value,
you should get back the start value.
Python's sum is defined as sum(sequence, start=0). If sum were to throw an
exception with sum([]), it should also throw it with sum([], start=0), wich
makes no sense.
--
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie
No.
> If you don't add anything to the start value,
> you should get back the start value.
Agree.
>
> Python's sum is defined as sum(sequence, start=0).
That's the issue.
> If sum were to throw an
> exception with sum([]), it should also throw it with sum([], start=0), wich
> makes no sense.
Given that definition, yes. But is the definition correct
in ALL cases? Are there situations where the sum of an empty
list should NOT be 0? Of course there are.
Can sum() handle those cases? No, it can't, I have to write
my own definition if I want that behaviour. There's no reason
why sum([]) and sum([],0) have to mean the same thing at the
exclusion of a perfectly valid alternative definition.
But that's the way it is, so I have to live with it.
But that's not conceeding that I'm wrong.
Mensanator wrote:
> Are there situations where the sum of an empty
> list should NOT be 0? Of course there are.
Python Philosopy (my version, for this discussion):
Make normal things easy; make unusual or difficult things possible.
Application:
Sum([]) == 0 is normal (90+% of cases). Make that easy (as it is).
For anything else:
if seq: s = sum(s, base)
else: <whatever, including like raise your desired exception>
which is certainly pretty easy.
> Can sum() handle those cases?
The developers choose what they thought would be most useful across the
spectrum of programmers and programs after some non-zero amount of
debate and discussion.
> No, it can't, I have to write
> my own definition if I want that behaviour.
Or wrap your calls. In any case, before sum was added as a convenience
for summing numbers, *everyone* has to write their own or use reduce.
Sum(s) replaces reduce(lambda x,y: x+y, s, 0), which was thought to be
the most common use of reduce. Sum(s,start) replaces the much less
common reduce(lambda x,y: x+y, s, start).
Reduce(S, s), where S = sum function, raises an exception on empty s.
So use that and you are no worse off than before.
However, a problem with reduce(S,s) is that it is *almost* the same as
reduce(S,s,0). So people are sometimes tempted to omit 0, especially if
they are not sure if the call might be reduce(S,0,s) (as one argument
says it should be -- but that is another post). But if they do, the
program fails, even if it should not, if and when s happens to be empty.
> There's no reason
> why sum([]) and sum([],0) have to mean the same thing at the
> exclusion of a perfectly valid alternative definition.
'Have to', no reason. 'Should', yes there are at least three reasons.
1. Python functions generally return an answer rather than raise an
exception where there is a perfectly valid answer to return.
2. As a general principle, something that is almost always true should
not need to be stated over and over again. This is why, for instance,
we have default args.
3. As I remember, part of the reason for adding sum was to eliminate the
need (with reduce) to explicitly say 'start my sum at 0' in order to
avoid buggy code. In other words, I believe part of the reason for
sum's existence is to avoid the very bug-inviting behavior you want.
Terry Jan Reedy
What am I doing wrong?
>>> S = sum
>>> S
<built-in function sum>
>>> s = [1,2,3]
>>> type(s)
<type 'list'>
>>> reduce(S,s)
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
reduce(S,s)
TypeError: 'int' object is not iterable
>>> reduce(S,s,0)
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
reduce(S,s,0)
TypeError: 'int' object is not iterable
>>> reduce(lambda x,y:x+y,s)
6
>>> s=[]
>>> reduce(lambda x,y:x+y,s)
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
reduce(lambda x,y:x+y,s)
TypeError: reduce() of empty sequence with no initial value
This is supposed to happen. But doesn't reduce(S,s) work
when s isn't empty?
Mensanator wrote:
> On Sep 10, 5:36 pm, Terry Reedy <tjre...@udel.edu> wrote:
>> Sum(s) replaces reduce(lambda x,y: x+y, s, 0), which was thought to be
>> the most common use of reduce. Sum(s,start) replaces the much less
>> common reduce(lambda x,y: x+y, s, start).
>>
>> Reduce(S, s), where S = sum function, raises an exception on empty s.
>> So use that and you are no worse off than before.
> What am I doing wrong?
>>>> S = sum
[snip]
Taking me too literally out of context. I meant the sum_of_2 function
already given in the example above, as you eventually tried.
def S(x,y): return x+y
Sorry for the confusion.
...
>>>> reduce(lambda x,y:x+y,s)
> 6
>
>>>> s=[]
>>>> reduce(lambda x,y:x+y,s)
> Traceback (most recent call last):
> File "<pyshell#17>", line 1, in <module>
> reduce(lambda x,y:x+y,s)
> TypeError: reduce() of empty sequence with no initial value
These two are exactly what I meant.
> This is supposed to happen. But doesn't reduce(S,s) work
> when s isn't empty?
It did. You got 6 above. The built-in 'sum' takes an iterable, not a
pair of numbers.
tjr
+1
''.join is horrible. And it adds insult to injury that S.join(S.split(T)) != T
as a rule. The interpreter has no business to patronize us into this shamefully
contorted neighborhood while it understands what we want.
Cheers, BB
What makes ''.join particularly horrible is that we find ourselves forced to use
it not only for concatenating arbitrary-length strings in a list, but also to
convert to a str what's already a sequence of single characters. IOW string
types fail to satisfy a natural expectation for any S of sequence type :
S == type(S)(item for item in S) == type(S)(list(S))
And this, even though strings are sequence types deep-down-ly enough that they
achieve to act as such in far-fetched corner cases like
(lambda *x : x)(*'abc')==('a','b','c')
...and even though strings offer not one but two distinct constructors that play
nicely in back-and-forth conversions with types to which they are much less
closely related, ie.
'1j' == repr(complex('1j') == str(complex('1j'))
1j == complex(repr(1j)) == complex(str(1j))
Not-so-cheerfully-yours, BB
_and_, as it turns out, sets of cardinality 1.
--Scott David Daniels (pleased about the change in cardinality)
Scott....@Acm.Org