loop problem

0 views
Skip to first unread message

Hatem El-Zanaty

unread,
Feb 4, 2010, 5:10:11 AM2/4/10
to utahp...@googlegroups.com
hi all,
in python i'm trying aloop in nearly 5000 element and it takes nearly 40 seconds is there is away to make it faster  I tried whil loop and for loop and no good
can any body help in that here is the code

       arr= temp.split("\n")
       paragraphs= len(arr)
       result="<data>"
       i=0
       print  "loop start"
       while i < paragraphs:
            temp2= arr[i]
            if 0 < len(temp2) <  60:
              result=result +"<title>"+temp2+"</title>\n"
            else:             
              result=result +"<para>"+temp2+"</para>\n"
            i=i+1
please help as soon as possible
Best Regards
hatem gamal

Shawn Willden

unread,
Feb 4, 2010, 8:21:27 AM2/4/10
to utahp...@googlegroups.com
Dunno about faster, but you could definitely make it more concise and Pythonic:

result = "<data>"
for line in temp.splitlines():
if len(line) < 60:
result += "<title>" + line + "</title>\n"
else:
result += "<para>" + line + </para>\n"

It might speed things up a little (or maybe not) to build your result
in a cStringIO.StringIO object, rather than repeatedly creating and
discarding string objects. 40 seconds seems like a crazy long time to
process 5000 lines, though. I think something else is weird. Is this
Python code that you're somehow running in a browser? Or are some of
the lines huge? I mean big enough to cause memory pressure and
swapping.

Another option to get rid of the repeated creation and garbage
collection of intermediate strings is to build your result up in a
list, like:

lst = ["<data>"]
for line in temp.splitlines():
if len(line) < 60:
lst.append("<title>")
lst.append(line)
lst.append("</title>\n")
else:
lst.append("<para>")
lst.append(line)
lst.append("</para>\n")
result = "".join(lst)

Since lists are mutable, the list doesn't have to be created and
discarded during every iteration. Another way that uses a generator
to avoid building the temporary list is:

def wrapline(line):
if len(line) < 60:
return "<title"> + line + "</title>"
else:
return "<para"> + line + "</para>"

result = "".join( wrapline(elem) for elem in temp.splitlines() )

One more comment: using line length to distinguish between titles and
paragraphs is likely to be VERY brittle. Paragraphs shorter than 60
characters do exist, as do titles longer than 60 characters.

--
Shawn

Byron Clark

unread,
Feb 4, 2010, 8:26:39 AM2/4/10
to utahp...@googlegroups.com

This should be faster because it doesn't have to rebuild the entire
string on each iteration:

result = ['<data>']
for paragraph in temp.split('\n'):
if len(paragraph) < 60:
pattern = '<title>%s</title>\n'
else:
pattern = '<para>%s</para>\n'
result.append(pattern % (paragraph,))
result.append('</data>')
result = ''.join(result)

--
Byron Clark

Shawn Willden

unread,
Feb 4, 2010, 8:36:09 AM2/4/10
to utahp...@googlegroups.com
On Thu, Feb 4, 2010 at 6:26 AM, Byron Clark <byron...@gmail.com> wrote:
> This should be faster because it doesn't have to rebuild the entire
> string on each iteration:

I just did some tests (on 2.5.2 and 2.6.4) and surprisingly enough,
the fastest way to build up a string is the naive way, with repeated
concatenation. Building a list and joining it was about 40% slower.
Using cStringIO was about 50% slower. Building a list in a list
comprehension and joining it is almost as fast as repeated
concatenation.

My guess is that the interpreter must notice the repeated
concatenation idiom and actually extend the result string in place
rather than actually creating a new immutable string during every
loop.

--
Shawn

Mike Moore

unread,
Feb 4, 2010, 11:05:23 AM2/4/10
to utahp...@googlegroups.com
I'm no python expert, but I was able to run your code in ~0.5 seconds on a first gen MacBook Pro running Snow Leopard. I got that down to ~0.1 seconds by limiting the string concatenation.

# Start with result as a list
result= ["<data>\n"]
print  "loop start"
for line in open('loop.txt', 'r').read().splitlines():
   if len(line) < 60:
       result.append("<title>")
       result.append(line)
       result.append("</title>\n")
   else:
       result.append("<para>")
       result.append(line)
       result.append("</para>\n")
result.append("</data>\n")
print  "loop end"
# Make result a string
result= "".join(result)
#print result

I hope my ruby bias isn't showing. :)


--
You received this message because you are subscribed to the Google Groups "Utah Python User Group" group.
To post to this group, send email to utahp...@googlegroups.com.
To unsubscribe from this group, send email to utahpython+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/utahpython?hl=en.


Byron Clark

unread,
Feb 4, 2010, 11:28:35 AM2/4/10
to utahp...@googlegroups.com
On Thu, Feb 04, 2010 at 06:36:09AM -0700, Shawn Willden wrote:
> My guess is that the interpreter must notice the repeated
> concatenation idiom and actually extend the result string in place
> rather than actually creating a new immutable string during every
> loop.

It looks like this optimization first appeared in Python 2.4:
http://python.org/doc/2.4/whatsnew/node12.html (search for "String
concatenations").

--
Byron Clark

Hatem El-Zanaty

unread,
Feb 6, 2010, 1:52:04 PM2/6/10
to utahp...@googlegroups.com
hi all,
thanks alot for your help
it works now just fine in 0.5 seconds
Best Regards
hatem gamal

Reply all
Reply to author
Forward
0 new messages