Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Fastest technique for string concatenation

8 views
Skip to first unread message

Emile van Sebille

unread,
Oct 2, 2010, 3:48:01 PM10/2/10
to pytho...@python.org
On 10/2/2010 12:09 PM pyt...@bdurham.com said...

Your times will improve when not burdened by the repeated method lookups
and element-wise list creation:

try with eg,

def testListAppend2():
output = list()
append = output.append
for char in source:
append( char )
output = ''.join( output )

and even further with

def testListAppend3():
output = list(source)
output = ''.join( output )

Same with:

def testStringIO2():
output = cStringIO.StringIO()
write = output.write
for char in source:
write( char )
output = output.getvalue()

Emile


Carey Tilden

unread,
Oct 2, 2010, 4:17:02 PM10/2/10
to pytho...@python.org
On Sat, Oct 2, 2010 at 12:09 PM, <pyt...@bdurham.com> wrote:

> My understanding is that appending to a list and then joining this list when done is the fastest technique for string concatenation. Is this true?

Have you profiled an application and found string concatenation to be
a performance bottleneck?  I would be surprised, but it's always
possible.  If not, I'd suggest you choose the technique that is most
clear and concise, and worry about performance only if it becomes a
real issue.

Carey

pyt...@bdurham.com

unread,
Oct 2, 2010, 4:20:22 PM10/2/10
to Emile van Sebille, pytho...@python.org
Emile,

> Your times will improve when not burdened by the repeated method lookups and element-wise list creation.

Excellent point!!

Here's updated timings for each technique followed by copy and paste
source code for anyone that wants to fiddle with these tests. I've
placed your name above your script suggestions.

My results (Python 2.7/32-bit on Window 7/64-bit)

testListAppend = 10.95 sec
testStringConcat = 49.49 sec
testStringIO = 14.68 sec
testListAppend2 = 7.42 sec <-------- fastest !!!
testListAppend3 = 8.22 sec
testStringIO2 = 10.91 sec

Thanks for the feedback - really appreciated.

Malcolm


# test performance of various string concatenation techniques

import cStringIO
import timeit

source = 'x' * 5000000

def testListAppend():
output = list()
for char in source:
output.append( char )


output = ''.join( output )

def testStringConcat():
output = ''
for char in source:
output += char

def testStringIO():
output = cStringIO.StringIO()
for char in source:
output.write( char )
output = output.getvalue()


# "Emile van Sebille" <em...@fenx.com>:
# Your times will improve when not burdened by the repeated method


lookups and element-wise list creation:

def testListAppend2():


output = list()
append = output.append
for char in source:
append( char )
output = ''.join( output )

def testListAppend3():
input = list( source )


output = list()
append = output.append

for char in input:


append( char )
output = ''.join( output )

def testStringIO2():


output = cStringIO.StringIO()
write = output.write
for char in source:
write( char )
output = output.getvalue()

def time( func ):
timingObject = timeit.Timer( func )
runtime = timingObject.timeit( 10 )
print '%s = %.2f sec' % ( func.__name__, runtime )

time( testListAppend )
time( testStringConcat )
time( testStringIO )
time( testListAppend2 )
time( testListAppend3 )
time( testStringIO2 )

pyt...@bdurham.com

unread,
Oct 2, 2010, 5:33:54 PM10/2/10
to Carey Tilden, pytho...@python.org
Carey,

> Have you profiled an application and found string concatenation to be a performance bottleneck? I would be surprised, but it's always possible. 

The "application" is very simple - its essentially a finite state
machine that parses complex RTF files. We read char by char and do lots
of string concatenation as we move through our various states. Using
lists for concatenation and Emile's technique of eliminating repeated
attribute lookups, we have significantly improved the performance of our
utilities.

Malcolm

Steven D'Aprano

unread,
Oct 2, 2010, 10:50:43 PM10/2/10
to
On Sat, 02 Oct 2010 13:17:02 -0700, Carey Tilden wrote:

> Have you profiled an application and found string concatenation to be
> a performance bottleneck?  I would be surprised, but it's always
> possible.  If not, I'd suggest you choose the technique that is most
> clear and concise, and worry about performance only if it becomes a
> real issue.

While I agree with your general point about premature optimization, I
would not be the tiniest bit surprised to discover that string
concatenation was a performance bottleneck -- and that it only shows up
under some platforms and not others!

Repeated string concatenation risks being an O(n**2) algorithm, which
performs terribly. Last year there was a bug report of awful performance
in the httplib module that only effected Windows users:

http://www.mail-archive.com/python-dev%40python.org/msg40692.html

httplib was FOUR HUNDRED times slower than IE at copying a file over a
local network. Eventually the fault was tracked down to repeated string
concatenation in the module:

http://www.mail-archive.com/python-dev%40python.org/msg41150.html

It costs nothing to avoid repeated string concatenation even prior to
demonstrating a problem.

--
Steven

Peter Otten

unread,
Oct 3, 2010, 5:00:26 AM10/3/10
to pytho...@python.org
pyt...@bdurham.com wrote:

> My understanding is that appending to a list and then joining
> this list when done is the fastest technique for string
> concatenation. Is this true?
>

> The 3 string concatenation techniques I can think of are:
>
> - append to list, join
> - string 'addition' (s = s + char)
> - cStringIO
>
> The code that follows my signature confirms that the list
> append/join technique is indeed the fastest of these 3
> approaches, but perhaps there are other techniques I should
> consider?
>
> Malcolm
>
> # test various techniques for string concatenation


> import cStringIO
> import timeit
> source = 'x' * 5000000
> def testListAppend():
> output = list()
> for char in source:
> output.append( char )
> output = ''.join( output )

You are measuring list creation here, not string concatenation:

$ python -m timeit -s's = "x"*10**5' 'd = []' 'for c in s: d.append(c)'
10 loops, best of 3: 20.9 msec per loop
$ python -m timeit -s's = ["x"]*10**5' '"".join(s)'
100 loops, best of 3: 2.47 msec per loop

To fix that I'd start with a list, not a string:

source = ["x"] * 5000000

> def testStringIO():


> output = cStringIO.StringIO()
> for char in source:
> output.write( char )
> output = output.getvalue()

You can simplify that to

def test_stringio():
output = cStringIO.StringIO()
output.writelines(source) # doesn't add newlines
output = output.getvalue()

which is almost as fast as ''.join:

$ python -m timeit -s's = ["x"]*10**5' '"".join(s)'
100 loops, best of 3: 2.47 msec per loop

$ python -m timeit -s's = ["x"]*10**5' -s 'from cStringIO import StringIO'
'o = StringIO(); o.writelines(s); o.getvalue()'
100 loops, best of 3: 2.88 msec per loop

Peter

Roy Smith

unread,
Oct 3, 2010, 9:19:14 AM10/3/10
to
My local news feed seems to have lost the early part of this thread, so
I'm afraid I don't know who I'm quoting here:

> My understanding is that appending to a list and then joining
> this list when done is the fastest technique for string
> concatenation. Is this true?
>
> The 3 string concatenation techniques I can think of are:
>
> - append to list, join
> - string 'addition' (s = s + char)
> - cStringIO

There is a fourth technique, and that is to avoid concatenation in the
first place. One possibility is to use the common append/join pattern
mentioned above:

vector = []
while (stuff happens):
vector.append(whatever)
my_string = ''.join(vector)

But, it sometimes (often?) turns out that you don't really need
my_string. It may just be a convenient way to pass the data on to the
next processing step. If you can arrange your code so the next step can
take the vector directly, you can avoid creating my_string at all.

For example, if all you're going to do is write the string out to a file
or network socket, you could user vectored i/o, with something like
python-writev (http://pypi.python.org/pypi/python-writev/1.1). If
you're going to iterate over the string character by character, you
could write an iterator which does that without the intermediate copy.
Something along the lines of:

def each(self):
for s in self.vector:
for c in s:
yield c

Depending on the amount of data you're dealing with, this could be a
significant improvement over doing the join().

Will Hall

unread,
Oct 5, 2010, 3:39:10 PM10/5/10
to

Okay. I've never responded to one of these before, so please correct
me if I'm making any large blunders. I'd just recently read Guido's
Python Patterns -- An Optimization Anecdote, and I was wondering why a
similar method to the one he suggests wouldn't work here?

My suggestion:
def arrayConcat():
output = array.array('c', source).tostring()

Am I missing something, or will this work?

Thanks,
Will

Will Hall

unread,
Oct 5, 2010, 3:48:47 PM10/5/10
to

My bad, I forgot the 'import array' statement at the top.

0 new messages