To my knowledge, there are basically three ways to do it.
1: s = part1 + part2 + part3 + ... partn
2: s = string.join([part1, part2, partn], "")
3: s = "%s%s%s%s" % (part1, part2, part3, part4)
Am I missing anything?
My guess was that method #2 is fastest, followed by 3 and then #1 would
be (by far) the slowest.
I did a quick test (using the unix shell 'time' command) which more or
less confirmed my hypothesis.
Just now, however, I decided to do a more indepth test with the Python
profiler. The results were surprising: method #1 above was fastest,
followed closely by #3, and #2 lagged way back. So, according to this
test, s = part1 + part2 + partn seems to be the fastest way to build a
string out of parts.
Can someone look at my profiling script and see if I made any obvious
errors? Is behavior of strings already well-known?
By the way, the new_profile module below is just the same as the
regular Python profile with a new profiling constant calibrated for my
machine.
thanks,
---Preston
print "This program demonstrates the relative efficiency of different
types of string construction methods in Python."
print "Method 1: s = a + b"
print "Method 2: s = string.join([a, b], '')"
print "Method 3: s = '%s%s' % (a, b)"
import string, time
def method_1(a, b, c, d, e):
return a + b + c + d + e
def method_2(a, b, c, d, e):
return string.join([a, b, c, d, e], "")
def method_3(a, b, c, d, e):
return "%s%s%s%s%s" % (a, b, c, d, e)
def wrapper(method):
## a = str(time.time())
## b = "constant"
c_range = 50
d_range = 50
e_range = 50
for c in range(c_range):
for d in range(d_range):
for e in range(e_range):
# args = (a, b, str(c), str(d), str(e))
args = ("a", "b", "c", "d", "e")
apply(method, args)
import new_profile
import pstats
def __main__():
print "Profiling method_1:"
new_profile.run("wrapper(method_1)", "/tmp/m1.prof")
p = pstats.Stats("/tmp/m1.prof")
p.strip_dirs().sort_stats('time','cum', 'nfl').print_stats
().print_callees()
print "Profiling method_2:"
new_profile.run("wrapper(method_2)", "/tmp/m2.prof")
p = pstats.Stats("/tmp/m2.prof")
p.strip_dirs().sort_stats('time','cum', 'nfl').print_stats
().print_callees()
print "Profiling method_3:"
new_profile.run("wrapper(method_3)", "/tmp/m3.prof")
p = pstats.Stats("/tmp/m3.prof")
p.strip_dirs().sort_stats('time','cum', 'nfl').print_stats
().print_callees()
print "Finished!"
__main__()
--
|| Preston Landers <preston...@my-deja.com> ||
Sent via Deja.com http://www.deja.com/
Before you buy.
cStringIO
>Just now, however, I decided to do a more indepth test with the Python
>profiler. The results were surprising: method #1 above was fastest,
>followed closely by #3, and #2 lagged way back. So, according to this
>test, s = part1 + part2 + partn seems to be the fastest way to build a
>string out of parts.
That's probably true. The tricky part is that "%" gives you a lot of
flexibility and simplicity for building strings, so it's almost always
better if you have a moderately complex string to build.
Problem is this: most of the time when speed becomes an issue, you're
building strings in loops. Once a loop is involved, the copying
overhead of "s=s+part" becomes non-trivial. At that point, either
string.join() or cStringIO become the best solution.
--
--- Aahz (@netcom.com)
Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6 (if you want to know, do some research)
I think you under-estimated the difficulty of benchmarking. :)
The speed of each method depends on the situation. Method 1 is
really bad if you continuously add characters on the end of a
long string:
spam = ''
for i in range(10000):
spam = spam + 'a'
This creates and destroys a string object each time around the
loop. In this case, method 2 or [c]StringIO is much better:
spam = []
for i in range(10000):
spam.append('a')
spam = string.join(spam, '')
I believe this is the primary use of the string.join idiom. I
would never write:
spam = string.join([a, b, c])
but rather:
spam = '%s%s%s%s' % (a, b, c)
or even:
spam = a + b + c
if I am lazy and the strings are not expected to be huge.
Neil
>Just now, however, I decided to do a more indepth test with the Python
>profiler. The results were surprising: method #1 above was fastest,
>followed closely by #3, and #2 lagged way back. So, according to this
>test, s = part1 + part2 + partn seems to be the fastest way to build a
>string out of parts.
>
>Can someone look at my profiling script and see if I made any obvious
>errors? Is behavior of strings already well-known?
>
>
Your strings are very short, so the actual copying takes an insignificant
proportion of the time. Your method 2 needs to do a lot of extra work,
looking up a global variable, calling a function etc.
Change the assignment to use longer strings in args:
args = ("a"*1000, "b"*1000, "c"*1000, "d"*1000, "e"*1000)
and now on my machine method_3 beats method_1, with method_2 still last.
Now change the multiplier to 10000 (and reduce the number of times round
the loop by a factor of 10) and method_1 comes last, method_3 still wins.
Finally change the definition of method_2 slightly to remove the global
lookup:
def method_2(a, b, c, d, e, join=string.join):
return join([a, b, c, d, e], "")
At last, with 10000 character strings the modified method_2 is the fastest.