Google Grupper understøtter ikke længere nye Usenet-opslag eller -abonnementer. Tidligere indhold er fortsat synligt.

String += time trials. (was: Re: Speeding up: s += "string")

0 visninger
Gå til det første ulæste opslag

Francis Avila

ulæst,
9. maj 2003, 16.30.1809.05.2003
til
"Erik Max Francis" <m...@alcyone.com> wrote in message
news:3EBAD05C...@alcyone.com...
> As Lulu pointed out, the better solution is to build a list -- because
> appending to a list is amortized O(1) -- and then joining the string all
> in one go.

Is it the best solution?
(all tests done on a Pentium 133 (i586), 32mb, Debian 3.0r1, Linux 2.4.18)
(Quick observation: list operations seem to have gotten slower between 2.1
and 2.2, especially extend.)

Unless I'm missing something, it seems that by the time things get slow
enough to use list joining rather than string catenation, you're better off
using cStringIO anyway.

Feature requests, based on this:
*StringIO could be a drop-in replacement for strings, if += were overloaded
to mean append-to-buffer and str(<*StringIO object>) returned <*StringIO
object>.getvalue() instead of repr(). Thus one can start out using strings,
and once the number of catenations is found to be a bottleneck, just change
the first s = '' to s = cStringIO.StringIO(). No other changes would be
necessary.


[ Oh, and for the below, 's/Loop/List/'. I don't know what I was thinking. ]

$ python2.1 stringcattrials.py 1 10 100 1000 10000 100000
----------------------------------------
1 catenations per trial:
String catenation took 0.000119
Loop extension took 0.000170
Loop appending took 0.000144
cStringIO writing took 0.000141
StringIO writing took 0.000390
----------------------------------------
10 catenations per trial:
String catenation took 0.000237
Loop extension took 0.000578
Loop appending took 0.000461
cStringIO writing took 0.000363
StringIO writing took 0.001774
----------------------------------------
100 catenations per trial:
String catenation took 0.001938
Loop extension took 0.004385
Loop appending took 0.003403
cStringIO writing took 0.002447
StringIO writing took 0.015654
----------------------------------------
1000 catenations per trial:
String catenation took 0.021864
Loop extension took 0.043683
Loop appending took 0.032654
cStringIO writing took 0.045036
StringIO writing took 0.156615
----------------------------------------
10000 catenations per trial:
String catenation took 1.469739
Loop extension took 0.455751
Loop appending took 0.332568
cStringIO writing took 0.238696
StringIO writing took 1.577647
----------------------------------------
100000 catenations per trial:
String catenation took 166.015264
Loop extension took 4.530110
Loop appending took 3.125364
cStringIO writing took 2.413575
StringIO writing took 15.582441


$ python2.2 stringcattrials.py 1 10 100 1000 10000 100000
----------------------------------------
1 catenations per trial:
String catenation took 0.000153
Loop extension took 0.000243
Loop appending took 0.000169
cStringIO writing took 0.000190
StringIO writing took 0.000491
----------------------------------------
10 catenations per trial:
String catenation took 0.000275
Loop extension took 0.000909
Loop appending took 0.000459
cStringIO writing took 0.000422
StringIO writing took 0.002277
----------------------------------------
100 catenations per trial:
String catenation took 0.001982
Loop extension took 0.007511
Loop appending took 0.003010
cStringIO writing took 0.002826
StringIO writing took 0.018513
----------------------------------------
1000 catenations per trial:
String catenation took 0.022112
Loop extension took 0.096759
Loop appending took 0.032322
cStringIO writing took 0.025085
StringIO writing took 0.205307
----------------------------------------
10000 catenations per trial:
String catenation took 1.494656
Loop extension took 0.789164
Loop appending took 0.318447
cStringIO writing took 0.259390
StringIO writing took 2.023437
----------------------------------------
100000 catenations per trial:
String catenation took 170.029309
Loop extension took 8.083905
Loop appending took 3.116776
cStringIO writing took 2.746015
StringIO writing took 19.161821


$ cat stringcattrials.py
#! /usr/bin/env python

import os, sys
from time import time
import StringIO, cStringIO
import sys

def trials(numcats=1000):
"""Do the trials, performing numcats catenations per trial"""
loop = tuple(range(numcats))

s = ''
start = time()
for i in loop:
s += '0'
stop = time()

print "String catenation took \t %3.6f" % (stop-start)


s = []
start = time()
for i in loop:
s += '0'
''.join(s)
stop = time()

print "Loop extension took \t %3.6f" % (stop-start)


s = []
start = time()
for i in loop:
s.append('0')
''.join(s)
stop = time()

print "Loop appending took \t %3.6f" % (stop-start)


s = cStringIO.StringIO()
start = time()
for i in loop:
s.write('0')
s.getvalue()
stop = time()

print "cStringIO writing took \t %3.6f" % (stop-start)


s = []
start = time()
for i in loop:
s += '0'
''.join(s)
stop = time()

print "Loop extension took \t %3.6f" % (stop-start)


s = []
start = time()
for i in loop:
s.append('0')
''.join(s)
stop = time()

print "Loop appending took \t %3.6f" % (stop-start)


s = cStringIO.StringIO()
start = time()
for i in loop:
s.write('0')
s.getvalue()
stop = time()

print "cStringIO writing took \t %3.6f" % (stop-start)


s = StringIO.StringIO()
start = time()
for i in loop:
s.write('0')
s.getvalue()
stop = time()

print "StringIO writing took \t %3.6f" % (stop-start)

if __name__ == '__main__':
nlist = sys.argv[1:] # list of loop repeat counts
if nlist:
for i in nlist:
print '-'*40
print i, "catenations per trial:"
trials(int(i))
else:
trials()

Erik Max Francis

ulæst,
9. maj 2003, 17.15.3109.05.2003
til
Francis Avila wrote:

> Is it the best solution?

My point was that naive string concatenation inside a loop brings out
the worst behavior (O(n^2)), and was suggesting a solution that has
better scaling (amortized O(n)). StringIO/cStringIO would also be an
example of a way to get this superior behavior.

As for the specific merits of building a list and then using S.join vs.
StringIO, that really depends on the specifics of the situation. I
don't know how helpful your comparison here is, since you're dealing
with such tiny strings.

My point would simply be to pick the behavior that has the right big-O
complexity. In a language like Python, you're not looking to optimize
every last cycle out of your program -- otherwise you wouldn't be using
Python -- but you _do_ want to make sure that your program uses the
proper algorithms to prevent complexity explosions. Repeated string
concatenation is bad. The other solutions are good. The relative
goodness of the superior solutions is something I wouldn't worry about
too much.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ The critical period in matrimony is breakfast-time.
\__/ A.P. Herbert
Bosskey.net: Aliens vs. Predator 2 / http://www.bosskey.net/avp2/
A personal guide to Aliens vs. Predator 2.

0 nye opslag