pypy is fast

42 views
Skip to first unread message

David Joerg

unread,
Jan 24, 2012, 12:30:27 PM1/24/12
to sc2reader
Hey nerds, I'm starting to handle lots and lots of replay files, and so parsing speed matters.

Turns out that pypy is 2x faster than the standard cPython, at least for sc2reader parsing:
For parsing 50 random replays, standard cPython took 10.537s, and pypy took 4.201s.
Nice speedup just for installing an alternate Python!

I was hoping to find some things in the code I could optimize, but so far the hot spots in the code look pretty solid already.

For what it's worth, here are the cProfile results showing the functions that take the most time, running in pypy:


         75011558 function calls (74856283 primitive calls) in 29.720 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1450587    3.431    0.000    5.098    0.000 utils.py:347(read)
  6206333    2.428    0.000    5.477    0.000 utils.py:145(read_byte)
 10700203    2.246    0.000    2.246    0.000 {method 'read' of 'cStringIO.StringI' objects}
     1300    2.223    0.002    2.223    0.002 {method 'read' of 'file' objects}
  5860557    1.424    0.000    1.424    0.000 {method 'append' of 'list' objects}


So about 10% of the running time is spent in the read() function in utils.py.

Enjoy,

--David J

Xavi Ramirez

unread,
Jan 24, 2012, 1:11:38 PM1/24/12
to sc2r...@googlegroups.com
Out of curiosity, what are you working on that requires so much replay processing?

David Joerg

unread,
Jan 24, 2012, 1:22:04 PM1/24/12
to sc2r...@googlegroups.com
Trying to gather stats in the spirit of Do You Macro Like a Pro, for example.

Graylin Kim

unread,
Jan 24, 2012, 2:31:08 PM1/24/12
to sc2r...@googlegroups.com
Turns out that pypy is 2x faster than the standard cPython, at least for sc2reader parsing:
For parsing 50 random replays, standard cPython took 10.537s, and pypy took 4.201s.
Nice speedup just for installing an alternate Python!

Glad you finally tried the PyPy stuff like I suggested David. It really is amazing how much faster things get! I probably should have put this tip on the wiki or sent it on the mailing list. I'll do that now...

Only side effect is that it creates a need to avoid most 3rd party Python C Extensions (numpy) since they haven't been ported yet.
 
I was hoping to find some things in the code I could optimize, but so far the hot spots in the code look pretty solid already.

For what it's worth, here are the cProfile results showing the functions that take the most time, running in pypy:


         75011558 function calls (74856283 primitive calls) in 29.720 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1450587    3.431    0.000    5.098    0.000 utils.py:347(read)
  6206333    2.428    0.000    5.477    0.000 utils.py:145(read_byte)
 10700203    2.246    0.000    2.246    0.000 {method 'read' of 'cStringIO.StringI' objects}
     1300    2.223    0.002    2.223    0.002 {method 'read' of 'file' objects}
  5860557    1.424    0.000    1.424    0.000 {method 'append' of 'list' objects}


So about 10% of the running time is spent in the read() function in utils.py.

I spent several days previously using cProfile and kCachegrind to profile sc2reader so I'm glad it looks pretty solid to you. The root of the problem is the byte wrapping technique that blizzard appears to be using. It seems like the only way to make the file reading faster (its like 20% of load time) is to do is back it into C or come up with a more innovative approach to reading the bytes. The current, more straightforward, approach is pretty optimised.

~Graylin

David Joerg

unread,
Jan 24, 2012, 2:33:02 PM1/24/12
to sc2r...@googlegroups.com
In particular, it seems like there aren't any nicely packaged database interfaces that work out-of-the-box with pypy except for the sqlite interface, which isn't gonna cut it for real live internet stuff.  So I'm pack to cPython for now.  TT

Graylin Kim

unread,
Jan 24, 2012, 2:37:06 PM1/24/12
to sc2r...@googlegroups.com
If I may make a recommendation:
  1. Use PyPy to process the file, use cPickle to store it somewhere.
  2. Have a worker process poll the storage location, un-pickle the file and stuff it into a database.
  3. ???
  4. Profit

~Graylin

Zsol

unread,
Jan 24, 2012, 2:49:50 PM1/24/12
to sc2r...@googlegroups.com
Out of curiosity, what would be an acceptable level of performance for you?

Zsol

David Joerg

unread,
Jan 24, 2012, 2:54:42 PM1/24/12
to sc2r...@googlegroups.com
There is no such thing as acceptable performance!   There is only faster!



OK more seriously... I have 30,000 replay files here.  If the entire set can be processed in under a second, then no further speed improvements are necessary.


OK OK even more seriously -- at some point it becomes easier to, rather than investing in speed improvements, to simply parallelize the process, and putting the whole thing up on some kind of map-reduce framework.  So unless there's a quickie speedup trick like switching to pypy or reading the whole file into memory first, I wouldn't worry about it.

For individual replay parsing latency, I think anything under a second is probably fine for, say, human uploaders.   --dj
Reply all
Reply to author
Forward
0 new messages