On Wed, Sep 19, 2012 at 10:41 AM, Franck Ditter <fra...@ditter.org> wrote:
> Hello,
> I wonder why sum does not work on the string sequence in Python 3 :
>>>> sum((8,5,9,3))
> 25
>>>> sum([5,8,3,9,2])
> 27
>>>> sum('rtarze')
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
> And the error message is somewhat cryptic...
Help on built-in function sum in module __builtin__:
sum(...)
sum(sequence[, start]) -> value
Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.
~
On Wed, Sep 19, 2012 at 8:41 AM, Franck Ditter <fra...@ditter.org> wrote:
> Hello,
> I wonder why sum does not work on the string sequence in Python 3 :
>>>> sum((8,5,9,3))
> 25
>>>> sum([5,8,3,9,2])
> 27
>>>> sum('rtarze')
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
> And the error message is somewhat cryptic...
It notes in the doc string that it does not work on strings:
sum(...)
sum(sequence[, start]) -> value
Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.
I think this restriction is mainly for efficiency. sum(['a', 'b',
'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' +
'e', which is an inefficient way to add together strings. You should
use ''.join instead:
On 2012-09-19, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> It notes in the doc string that it does not work on strings:
> sum(...)
> sum(sequence[, start]) -> value
> Returns the sum of a sequence of numbers (NOT strings) plus
> the value of parameter 'start' (which defaults to 0). When
> the sequence is empty, returns start.
> I think this restriction is mainly for efficiency. sum(['a',
> 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
> + 'd' + 'e', which is an inefficient way to add together
> strings. You should use ''.join instead:
While the docstring is still useful, it has diverged from the
documentation a little bit.
sum(iterable[, start])
Sums start and the items of an iterable from left to right and
returns the total. start defaults to 0. The iterable‘s items
are normally numbers, and the start value is not allowed to be
a string.
For some use cases, there are good alternatives to sum(). The
preferred, fast way to concatenate a sequence of strings is by
calling ''.join(sequence). To add floating point values with
extended precision, see math.fsum(). To concatenate a series of
iterables, consider using itertools.chain().
Are iterables and sequences different enough to warrant posting a
bug report?
On Wed, 19 Sep 2012 16:41:20 +0200, Franck Ditter wrote:
> Hello,
> I wonder why sum does not work on the string sequence in Python 3 :
>>>> sum((8,5,9,3))
> 25
>>>> sum([5,8,3,9,2])
> 27
>>>> sum('rtarze')
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
> And the error message is somewhat cryptic...
> franck
Summation is a mathematical function that works on numbers
Concatenation is the process of appending 1 string to another
although they are not related to each other they do share the same operator(+) which is the cause of confusion.
attempting to duck type this function would cause ambiguity for example what would you expect from
sum ('a','b',3,4)
'ab34' or 'ab7' ?
even 'A' + 7 would return this error for same reason.
-- It is the nature of extreme self-lovers, as they will set an house on fire,
and it were but to roast their eggs.
-- Francis Bacon
On Wed, Sep 19, 2012 at 9:06 AM, Neil Cerutti <ne...@norwich.edu> wrote:
> Are iterables and sequences different enough to warrant posting a
> bug report?
The glossary is specific about the definitions of both, so I would say yes.
> On 2012-09-19, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> > It notes in the doc string that it does not work on strings:
> > sum(...)
> > sum(sequence[, start]) -> value
> > Returns the sum of a sequence of numbers (NOT strings) plus
> > the value of parameter 'start' (which defaults to 0). When
> > the sequence is empty, returns start.
> > I think this restriction is mainly for efficiency. sum(['a',
> > 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
> > + 'd' + 'e', which is an inefficient way to add together
> > strings. You should use ''.join instead:
> While the docstring is still useful, it has diverged from the
> documentation a little bit.
> sum(iterable[, start])
> Sums start and the items of an iterable from left to right and
> returns the total. start defaults to 0. The iterable‘s items
> are normally numbers, and the start value is not allowed to be
> a string.
> For some use cases, there are good alternatives to sum(). The
> preferred, fast way to concatenate a sequence of strings is by
> calling ''.join(sequence). To add floating point values with
> extended precision, see math.fsum(). To concatenate a series of
> iterables, consider using itertools.chain().
> Are iterables and sequences different enough to warrant posting a
> bug report?
Sequences are iterables, so I'd say the docs are technically correct,
but maybe I'm misunderstanding what you would be trying to clarify.
On Wed, 19 Sep 2012 09:03:03 -0600, Ian Kelly wrote:
> I think this restriction is mainly for efficiency. sum(['a', 'b', 'c',
> 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' + 'e', which
> is an inefficient way to add together strings.
It might not be obvious to some people why repeated addition is so inefficient, and in fact if people try it with modern Python (version 2.3 or better), they may not notice any inefficiency.
But the example given, 'a' + 'b' + 'c' + 'd' + 'e', potentially ends up creating four strings, only to immediately throw away three of them:
* first it concats 'a' to 'b', giving the new string 'ab'
* then 'ab' + 'c', creating a new string 'abc'
* then 'abc' + 'd', creating a new string 'abcd'
* then 'abcd' + 'e', creating a new string 'abcde'
Each new string requires a block of memory to be allocated, potentially requiring other blocks of memory to be moved out of the way (at least for large blocks).
With only five characters in total, you won't really notice any slowdown, but with large enough numbers of strings, Python could potentially spend a lot of time building, and throwing away, intermediate strings. Pure wasted effort.
I say "could" because starting in about Python 2.3, there is a nifty optimization in Python (CPython only, not Jython or IronPython) that can *sometimes* recognise repeated string concatenation and make it less inefficient. It depends on the details of the specific strings used, and the operating system's memory management. When it works, it can make string concatenation almost as efficient as ''.join(). When it doesn't work, repeated concatenation is PAINFULLY slow, hundreds or thousands of times slower than join.
On Wed, 19 Sep 2012 15:07:04 +0000, Alister wrote:
> Summation is a mathematical function that works on numbers Concatenation
> is the process of appending 1 string to another
> although they are not related to each other they do share the same
> operator(+) which is the cause of confusion. attempting to duck type
> this function would cause ambiguity for example what would you expect
> from
> sum ('a','b',3,4)
> 'ab34' or 'ab7' ?
Neither. I would expect sum to do exactly what the + operator does if given two incompatible arguments: raise an exception.
And in fact, that's exactly what it does.
py> sum ([1, 2, 'a'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <showel...@yahoo.com> wrote:
> Sequences are iterables, so I'd say the docs are technically correct,
> but maybe I'm misunderstanding what you would be trying to clarify.
The doc string suggests that the argument to sum() must be a sequence,
when in fact any iterable will do. The restriction in the docs should
be relaxed to match the reality.
On Sep 19, 11:34 am, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <showel...@yahoo.com> wrote:
> > Sequences are iterables, so I'd say the docs are technically correct,
> > but maybe I'm misunderstanding what you would be trying to clarify.
> The doc string suggests that the argument to sum() must be a sequence,
> when in fact any iterable will do. The restriction in the docs should
> be relaxed to match the reality.
Ah. The docstring looks to be fixed in 3.1.3, but not in Python 2.
Python 3.1.3 (r313:86834, Mar 13 2011, 00:40:38)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sum.__doc__
"sum(iterable[, start]) -> value\n\nReturns the sum of an iterable of
numbers (NOT strings) plus the value\nof parameter 'start' (which
defaults to 0). When the iterable is\nempty, returns start."
Python 2.6.6 (r266:84292, Mar 13 2011, 00:35:19)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sum.__doc__
"sum(sequence[, start]) -> value\n\nReturns the sum of a sequence of
numbers (NOT strings) plus the value\nof parameter 'start' (which
defaults to 0). When the sequence is\nempty, returns start."
> Summation is a mathematical function that works on numbers
> Concatenation is the process of appending 1 string to another
> although they are not related to each other they do share the same
> operator(+) which is the cause of confusion.
If one represents counts in unary, as a sequence or tally of 1s (or other markers indicating 'successor' or 'increment'), then count addition is sequence concatenation. I think Guido got it right.
It happens that when the members of all sequences are identical, there is a much more compact exponential place value notation that enables more efficient addition and other operations. When not, other tricks are needed to avoid so much copying that an inherently O(N) operation balloons into an O(N*N) operation.