result = "<data>"
for line in temp.splitlines():
if len(line) < 60:
result += "<title>" + line + "</title>\n"
else:
result += "<para>" + line + </para>\n"
It might speed things up a little (or maybe not) to build your result
in a cStringIO.StringIO object, rather than repeatedly creating and
discarding string objects. 40 seconds seems like a crazy long time to
process 5000 lines, though. I think something else is weird. Is this
Python code that you're somehow running in a browser? Or are some of
the lines huge? I mean big enough to cause memory pressure and
swapping.
Another option to get rid of the repeated creation and garbage
collection of intermediate strings is to build your result up in a
list, like:
lst = ["<data>"]
for line in temp.splitlines():
if len(line) < 60:
lst.append("<title>")
lst.append(line)
lst.append("</title>\n")
else:
lst.append("<para>")
lst.append(line)
lst.append("</para>\n")
result = "".join(lst)
Since lists are mutable, the list doesn't have to be created and
discarded during every iteration. Another way that uses a generator
to avoid building the temporary list is:
def wrapline(line):
if len(line) < 60:
return "<title"> + line + "</title>"
else:
return "<para"> + line + "</para>"
result = "".join( wrapline(elem) for elem in temp.splitlines() )
One more comment: using line length to distinguish between titles and
paragraphs is likely to be VERY brittle. Paragraphs shorter than 60
characters do exist, as do titles longer than 60 characters.
--
Shawn
This should be faster because it doesn't have to rebuild the entire
string on each iteration:
result = ['<data>']
for paragraph in temp.split('\n'):
if len(paragraph) < 60:
pattern = '<title>%s</title>\n'
else:
pattern = '<para>%s</para>\n'
result.append(pattern % (paragraph,))
result.append('</data>')
result = ''.join(result)
--
Byron Clark
I just did some tests (on 2.5.2 and 2.6.4) and surprisingly enough,
the fastest way to build up a string is the naive way, with repeated
concatenation. Building a list and joining it was about 40% slower.
Using cStringIO was about 50% slower. Building a list in a list
comprehension and joining it is almost as fast as repeated
concatenation.
My guess is that the interpreter must notice the repeated
concatenation idiom and actually extend the result string in place
rather than actually creating a new immutable string during every
loop.
--
Shawn
--
You received this message because you are subscribed to the Google Groups "Utah Python User Group" group.
To post to this group, send email to utahp...@googlegroups.com.
To unsubscribe from this group, send email to utahpython+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/utahpython?hl=en.
It looks like this optimization first appeared in Python 2.4:
http://python.org/doc/2.4/whatsnew/node12.html (search for "String
concatenations").
--
Byron Clark