Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why it is so dramatical?

0 views
Skip to first unread message

Bo M. Maryniuck

unread,
Sep 2, 2002, 5:07:25 AM9/2/02
to
The task:
Adding a text to a text.

The problem:
It is tragically slow to add string to huge string.

The source:
-----------------8<---------------------------
import time

_body = ''
_data = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

START_TIME = time.time()
for a in xrange(0, 0xfff):
_body += _data
print "'+=' operator:", time.time() - START_TIME, "sek."

_body = []
START_TIME = time.time()
for a in xrange(0, 0xfff):
_body.append(_data)
print "append() method:", time.time() - START_TIME, "sek."
-----------------8<---------------------------

The usage result:
-----------------8<---------------------------
*bo@oberon:(~/work/spike/benchmarks) python str_vs_list.py
'+=' operator: 6.96335601807 sek.
append() method: 0.0111969709396 sek.
-----------------8<---------------------------

The question: Why?

The other question:
Is it true, that I should through all the Python life
use lists and then join it into single text instead to use += operator?

--
Regards, Bogdan

It is a mess, pure and simple. Sucks to be away from Unix, huh?
-- man perlfaq3


Duncan Booth

unread,
Sep 2, 2002, 5:57:55 AM9/2/02
to
"Bo M. Maryniuck" <b.mar...@forbis.lt> wrote in
news:mailman.103095774...@python.org:

> The task:
> Adding a text to a text.
>
> The problem:
> It is tragically slow to add string to huge string.
>

> The question: Why?

Because the text in strings is stored in a single block of memory, so
extending a string requires making a copy of all the data. Also strings are
immutable so you cannot extend them 'in-place'. Like all things it is a
tradeoff. If you make strings mutable then extending them might be a bit
faster, but many other operations would then require copying strings in
case they were modified, so you lose out in general more than you gain.

None of this is specific to Python, the same issues arise in almost any
programming language (although not always the same solutions).

>
> The other question:
> Is it true, that I should through all the Python life
> use lists and then join it into single text instead to use +=
> operator?
>

Using lists and joining them should come to hand as the first solution in
most cases where you are building up a string.

Using cStringIO, array or mmap should also be in your toolbox for those
cases where they are the most appropriate.

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

holger krekel

unread,
Sep 2, 2002, 6:03:44 AM9/2/02
to
Bo M. Maryniuck wrote:
> The task:
> Adding a text to a text.
>
> The problem:
> It is tragically slow to add string to huge string.

That's because strings are immutable objects. When you issue

somestring += otherstring

a new string object is allocated where the contents of
somestring and otherstring are copied in. Then the
name 'somestring' is bound to this new string.

If you have something like this in a loop it's often
*much* better to work with a list and later do a

"".join(stringlist)

As you observed yourself this is around 100-1000 times faster.

And consider: if strings were not immutable but modified in-place
(like lists) you couldn't use them as dictionary keys.

if-it-hurts-don't-do-it-ly, yours Holger

holger krekel

unread,
Sep 2, 2002, 6:45:05 AM9/2/02
to
Bo M. Maryniuck wrote:

> On Monday 02 September 2002 11:57, Duncan Booth wrote:
> > None of this is specific to Python, the same issues arise in almost any
> > programming language (although not always the same solutions).
> Almost... Keep in mind, that here is the range 0-0xffffff, what is quite
> bigger that 0-0xfff:
> ----------------------8<---------------------------
> #!/usr/bin/perl
>
> $b = '';
> $data = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
> for $a (0..0xffffff) {
> $b += $data;
> }

That should result in a 1.3 GIGABYTE string.
Watching the perl-process i see that perl doesn't use more than 1MB.
So probably perl does some optimization for this in-fact meaningless code.

It's much more interesting to look at real performance limitations
instead of some theoretical pusing-the-limits loops.

holger

holger krekel

unread,
Sep 2, 2002, 7:23:48 AM9/2/02
to
Bo M. Maryniuck wrote:

> On Monday 02 September 2002 12:45, holger krekel wrote:
> > It's much more interesting to look at real performance limitations
> > instead of some theoretical pusing-the-limits loops.
> Hmmm... I've raised this, `coz I have a problem with a perfomance in Python
> app, which generates XML for XSLT. I found, that
>
> string1 += string2
>
> ...sucks and just for experiment I got rewrote XML generating piece of code to
> Perl and I've found, that Perl does the same *much* faster than Python. :(

Did you try using cStringIO? With your first example

[hpk@cobra /tmp]$ python slow.py
<function func1 at 0x80fda84> took 4.68154907227 seconds
<function func2 at 0x810be84> took 0.00871503353119 seconds

where func2 is:

def func2():
_body = cStringIO.StringIO()
_data = 'A'*81


for a in xrange(0, 0xfff):

_body.write(_data)
return _body.getvalue()

(you need to do 'import cStringIO' at the top of the file.)
and func1 is your code.

regards,

holger


P.S: the full script

import cStringIO

_data = 'A'*81

def func1():
_body = ''
_data = 'A'*81

for a in xrange(0, 0xfff):
_body += _data

return _body

def func2():
_body = cStringIO.StringIO()
_data = 'A'*81


for a in xrange(0, 0xfff):

_body.write(_data)
return _body.getvalue()

def drive(func):
import time
start = time.time()
res = func()
end = time.time()
print func, "took",end-start,"seconds"
return res


print len(drive(func1))
print len(drive(func2))


Bo M. Maryniuck

unread,
Sep 2, 2002, 6:28:58 AM9/2/02
to
On Monday 02 September 2002 11:57, Duncan Booth wrote:
> None of this is specific to Python, the same issues arise in almost any
> programming language (although not always the same solutions).
Almost... Keep in mind, that here is the range 0-0xffffff, what is quite
bigger that 0-0xfff:
----------------------8<---------------------------
#!/usr/bin/perl

$b = '';
$data = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
for $a (0..0xffffff) {
$b += $data;
}

----------------------8<---------------------------

Since time() always show me zero, I've used UNIX time command:
----------------------8<---------------------------
*bo@oberon:(~/work/spike/benchmarks) time perl str_vs_list.pl

real 0m8.734s
user 0m8.710s
sys 0m0.010s
----------------------8<---------------------------

As you see, adding string to the string isn't a huge problem for Perl as for Python.
What's "wrong" with Perl?

--
Regards, Bogdan

Un*x admins know what they are doing by definition.
-- Bernd Petrovitsch


Bo M. Maryniuck

unread,
Sep 2, 2002, 6:31:58 AM9/2/02
to
On Monday 02 September 2002 12:28, Bo M. Maryniuck wrote:
> Since time() always show me zero, I've used UNIX time command:
Well, not exactly... ;-)

--
Regards, Bogdan

Microsoft is not the answer.
Microsoft is the question.
"No" is the answer.


Jay O'Connor

unread,
Sep 1, 2002, 11:47:08 PM9/1/02
to
In article <mailman.103096266...@python.org>, "Bo M.
Maryniuck" <b.mar...@forbis.lt> wrote:

-----
>
> As you see, adding string to the string isn't a huge problem for Perl as
> for Python. What's "wrong" with Perl?

Because in Perl, a string is mutable, in Python, a string is immutable.
This causes a performance tradeoff for some operations. Mutable strings
allow you to extend a string easily so Perl is very fast at that kind of
operation; immutable strings allow you to 'intern' the string for faster
dictionary lookup when using strings as keys in a dictionary, as an
example

Neither is better, there are just tradeoffs between mutable and immutable
strings which the designers of both Perl and Python took into account
when they decided which to use based on what the language is intended to
do. As a result Perl is much faster at some kinds of operations than
Python, and vice-versa.


--
Jay O'Connor
joco...@cybermesa.com
http://www.r4h.org/r4hsoftware

Bernard Delmée

unread,
Sep 2, 2002, 3:59:57 PM9/2/02
to
> > $b = '';
> > $data =
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAA';
> > for $a (0..0xffffff) {
> > $b += $data;
> > }

> That should result in a 1.3 GIGABYTE string.
> Watching the perl-process i see that perl doesn't use more than 1MB.
> So probably perl does some optimization for this in-fact meaningless code.

Nothing that magic: '+' does not concatenate strings. You need to use '.'
or '.=' for that. Otherwise $b keeps getting the numerical value 0 when
perl wants to use it with the arithmetic '+' addition.


djw

unread,
Sep 2, 2002, 6:04:13 PM9/2/02
to

Stupid question:

If ""join() is 100-1000 X faster, why doesn't the Python interperter
translate s = s1 + s2 + ... into s = "".join( [ s1, s2, ... ] )
automatically?

Just curious...

D

Peter Hansen

unread,
Sep 2, 2002, 6:52:01 PM9/2/02
to
djw wrote:

> holger krekel wrote:
>
>> If you have something like this in a loop it's often
>> *much* better to work with a list and later do a
>> "".join(stringlist)
>>
>> As you observed yourself this is around 100-1000 times faster.
>> And consider: if strings were not immutable but modified in-place
>> (like lists) you couldn't use them as dictionary keys.
>> if-it-hurts-don't-do-it-ly, yours Holger
>>
>
> Stupid question:

It's only a stupid question if you didn't expect this relatively common
answer to such questions: "feel free to contribute a patch"... :-)

> If ""join() is 100-1000 X faster, why doesn't the Python interperter
> translate s = s1 + s2 + ... into s = "".join( [ s1, s2, ... ] )
> automatically?

In cases like this, you need to know that Python is so dynamic that
the compiler cannot know that the items being added are strings until
runtime. The "".join() convention does not work well if the items in
the sequence are not already strings.

If you feel compelled to suggest that the compiler could at least do
that for constant strings, see my first point above. :-)

-Peter

Bruce Ramsay

unread,
Sep 2, 2002, 8:35:19 PM9/2/02
to
"Bo M. Maryniuck" <b.mar...@forbis.lt> wrote in message news:<mailman.103096266...@python.org>...

> On Monday 02 September 2002 11:57, Duncan Booth wrote:
> > None of this is specific to Python, the same issues arise in almost any
> > programming language (although not always the same solutions).
> Almost... Keep in mind, that here is the range 0-0xffffff, what is quite
> bigger that 0-0xfff:
> ----------------------8<---------------------------
> #!/usr/bin/perl
>
> $b = '';
> $data = 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAA';
> for $a (0..0xffffff) {
> $b += $data;
> }
> ----------------------8<---------------------------
>
> Since time() always show me zero, I've used UNIX time command:
> ----------------------8<---------------------------
> *bo@oberon:(~/work/spike/benchmarks) time perl str vs list.pl
>
> real 0m8.734s
> user 0m8.710s
> sys 0m0.010s
> ----------------------8<---------------------------
>
> As you see, adding string to the string isn't a huge problem for Perl as
> for Python.
> What's "wrong" with Perl?

The Perl += operator does not concatenate strings.

Bugs aside, I suspect that Perl has an optimisation for this case, or
a special string data structure. Perl run times seem to grow linearly
for loop counts from 1e5 to 1e7. Python run times go from 0.053s to
18.2s when changing the loop count from 1e3 to 1e4. This is on Python
2.1 on an Athlon 1600.

Christopher A. Craig

unread,
Sep 3, 2002, 1:04:45 AM9/3/02
to
djw <dwel...@nospam.attbi.com> writes:

> Stupid question:
>
> If ""join() is 100-1000 X faster, why doesn't the Python interperter
> translate s = s1 + s2 + ... into s = "".join( [ s1, s2, ... ] )
> automatically?

You won't see that kind of dramatic results if you did that. In fact,
my testing shows it would be slower.

s = "".join(s1, s2) is slower than s=s1+s2. The reason you see that
kind of speed up is that list.append(s1) is a ton faster than s+=s1,
and "".join() is moved outside the loop. Compare the following
(presume 'a' contains some string):

method a)
s=""
for x in xrange(10000):
s += a

method b)
l = []
for x in xrange(10000):
l.append(a)
s = "".join(l)

method c)
s=""
for x in xrange(10000):
s = "".join([s, a])


method a creates 10,000 individual string objects
method b creates 1 list object (with a bunch of appends) and 1 string
object
method c creates 10,000 list objects (each of two elements s and a) and
10,000 string objects.

On my system, as I would expect, method a takes about .40 seconds,
method b takes .048, and method c takes .51.

--
Christopher A. Craig <list-...@ccraig.org>
"Imagination is more important than knowledge" Albert Einstein

Bo M. Maryniuck

unread,
Sep 3, 2002, 3:52:31 AM9/3/02
to
On Monday 02 September 2002 05:47, Jay O'Connor wrote:
> As a result Perl is much faster at some kinds of operations than
> Python, and vice-versa.

Yeah, actually, but I just would like to find some optimal Python solution :)

--
Regards, Bogdan

We did it for smallpox, we'll also win over on ISO 8859-1 ... ;-)
-- Markus Kuhn after eradicating one more ISO 8859-1 file from his disk


Max M

unread,
Sep 3, 2002, 5:05:23 AM9/3/02
to
Bo M. Maryniuck wrote:
> On Monday 02 September 2002 05:47, Jay O'Connor wrote:
>
>>As a result Perl is much faster at some kinds of operations than
>>Python, and vice-versa.
>
>
> Yeah, actually, but I just would like to find some optimal Python solution :)


Actually it is possible to make it as readable as concatenation with:

result = []
a = result.append

a('some string')
a('another string')
a('third string')
for letter in 'abcdefghijklmnop':
a(letter)

result = ''.join(result)


rather than:

result = ''
result += 'some string'
result += 'another string'
result += 'third string'
for letter in 'abcdefghijklmnop':
result += letter


regards Max M

holger krekel

unread,
Sep 3, 2002, 6:03:19 AM9/3/02
to
Christopher A. Craig wrote:

> djw <dwel...@nospam.attbi.com> writes:
>
> > Stupid question:
> >
> > If ""join() is 100-1000 X faster, why doesn't the Python interperter
> > translate s = s1 + s2 + ... into s = "".join( [ s1, s2, ... ] )
> > automatically?
>
> You won't see that kind of dramatic results if you did that. In fact,
> my testing shows it would be slower.
>
> s = "".join(s1, s2) is slower than s=s1+s2.

That's not really the question. The OP wanted to build up a huge string
(for XML/XSTL-processing) in single steps. Using

something += otherstring

in *loops* is usually slow. Doing

stringlist.append(otherstring)

and at the end

"".join(stringlist)

is orders of magnitude faster (depending on the size of the strings).

Another speedy option is to use cStringIO. See my example in another
part of this discussion.

holger

Fearless Freep

unread,
Sep 3, 2002, 11:53:42 AM9/3/02
to
"Bo M. Maryniuck" <b.mar...@forbis.lt> wrote in message news:<mailman.103103970...@python.org>...

> On Monday 02 September 2002 05:47, Jay O'Connor wrote:
> > As a result Perl is much faster at some kinds of operations than
> > Python, and vice-versa.
>
> Yeah, actually, but I just would like to find some optimal Python solutio
> n :)

Well..is it more efficient to treat it as a list of characters?

>>> x = " " * 10
>>> x = list (x)
>>> y = 'test str'
>>> x[0:len(y)] = list (y)
>>> z = 'another test string'
>>> x.extend(list(z))
>>> s =''.join (x)
>>> print s

'test str another test string'

Treating a string as a list of character elements would allow you to
append to it and change the internals. May be an improvement in
performance.

This assumes you are having a noticeable performance problem with
concatenating strings. Beware of premature optimization. Just
because an fairly simple operation in one language is slower than the
equivalent in another language doesn't mean that the program as a
whole will be slower enough to matter

Take care,
Jay

0 new messages