High Performance Python v0.1 report

15 рдмрд╛рд░ рджреЗрдЦрд╛ рдЧрдпрд╛
рдирд╣реАрдВ рдкрдврд╝реЗ рдЧрдП рдкрд╣рд▓реЗ рдореИрд╕реЗрдЬ рдкрд░ рдЬрд╛рдПрдВ

Ian Ozsvald

рдирд╣реАрдВ рдкрдврд╝реА рдЧрдИ,
3 рдЬреБрд▓ре░ 2011, 6:48:50 am3/7/11
рдИрдореЗрд▓ рдкрд╛рдиреЗ рд╡рд╛рд▓рд╛ shedskin...@googlegroups.com
Hi all. I ran a 4 hour tutorial on High Performance Python at
EuroPython two weeks back. I've written up a report (49 page PDF, CC
licensed):
http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
and I plan to publish a v0.2 updated report in a couple of weeks. It
was a bit of a push writing 49 pages whilst at the conf, I need a bit
of a break to catch up on other stuff now :-)

ShedSkin is covered though I haven't done the profiling yet (I see
Mark's just published some notes on the subject, that's great). If an
even-more-optimal version of the ShedSkin code can be built, I'm all
ears. Currently ShedSkin beats Cython (but possibly just because gcc
4.2 is being used rather than gcc 4.0 - a quirk of my system perhaps).

Armin has tried the pure-python with 'better math' solution (the
fastest src for ShedSkin and Cython) and it brings the trunk version
of PyPy to 4* slower than ShedSkin/Cython (i.e. do no work at all and
it gets within the same order of magnitude as C compiled versions!).
This is pretty impressive.

Someone is also working on a pyOpenCL version to accompany the pyCUDA
examples, that'll go into the v0.2 report. Shout if you have any
improvements, I'd like to make this report a nice tutorial for all
Pythonistas.

Cheers,
Ian.

--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com

http://IanOzsvald.com
http://SocialTiesApp.com/
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald

Mark Dufour

рдирд╣реАрдВ рдкрдврд╝реА рдЧрдИ,
3 рдЬреБрд▓ре░ 2011, 8:24:32 am3/7/11
рдИрдореЗрд▓ рдкрд╛рдиреЗ рд╡рд╛рд▓рд╛ shedskin...@googlegroups.com
hi ian,

thanks a lot for the nice tutorial, and for getting back.. :-)


ShedSkin is covered though I haven't done the profiling yet (I see
Mark's just published some notes on the subject, that's great). If an
even-more-optimal version of the ShedSkin code can be built, I'm all
ears. Currently ShedSkin beats Cython (but possibly just because gcc
4.2 is being used rather than gcc 4.0 - a quirk of my system perhaps).

I played a bit with this, and the following improves performance on my system:

- adding -ffast-math to FLAGS, seems to reduce run-time by about 10%
- compiling first with -fprofile-generate, then -fprofile-use saves about 7% also
- using libgc 7.2alpha6 instead of the common libgc 6.8 helps about 3% (you may already use this one)

in all this improves things by about 20%. I expected some results from 'shedskin -b' as well, but apparently indexing performance is not that crucial in the inner loop. may help for low 'maxiter' values though. if indexing is crucial, this can easily double program performance (see the update on my blog for an example of this).

I guess just using gcc 4.5 could also help. the speedup I see here with gcc 4.5.2 is from 49 seconds to 0.34 seconds. after the tweaks above this becomes about 0.295 seconds.

I seem to remember your system doesn't support march=native. that means you can perhaps get even better results with some further tweaking of the compiler flags.. what is now the final command given to g++..?

thanks again,
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

Mark Dufour

рдирд╣реАрдВ рдкрдврд╝реА рдЧрдИ,
3 рдЬреБрд▓ре░ 2011, 8:32:17 am3/7/11
рдИрдореЗрд▓ рдкрд╛рдиреЗ рд╡рд╛рд▓рд╛ shedskin...@googlegroups.com
I guess you may want to point people to the 'performance tips' section of the shedskin documentation.. the things I just tried are all described there, and more.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

Ian Ozsvald

рдирд╣реАрдВ рдкрдврд╝реА рдЧрдИ,
3 рдЬреБрд▓ре░ 2011, 8:49:40 am3/7/11
рдИрдореЗрд▓ рдкрд╛рдиреЗ рд╡рд╛рд▓рд╛ shedskin...@googlegroups.com
Cool, I'll add some notes to that effect for the next draft. I'm
guessing that Cython and ShedSkin for this problem (with the expanded
math) should perform at roughly the same speed if the compiler is set
up the same. The example for ShedSkin is nice because the user doesn't
really need to do much work to get a great improvement :-)

Cheers,
Ian.

> --
> You received this message because you are subscribed to the Google Groups
> "shedskin-discuss" group.
> To post to this group, send email to shedskin...@googlegroups.com.
> To unsubscribe from this group, send email to
> shedskin-discu...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/shedskin-discuss?hl=en.

рд╕рднреА рдкреНрд░рд╖рдХреЛрдВ рдХреЛ рдЙрддреНрддрд░ рджреЗрдВ
рд▓реЗрдЦрдХ рдХреЛ рдЙрддреНрддрд░ рджреЗрдВ
рдЖрдЧреЗ рднреЗрдЬреЗрдВ
0 рдирдпрд╛ рдореИрд╕реЗрдЬ