The nbody code is based on the language shootout here:
http://shootout.alioth.debian.org/u32/performance.php?test=nbody
I'm not worried about whether the test is really 'apples vs apples', I
just care about the rough orders of magnitude. E.g. CPython on their
Intel box takes 20 mins, JavaScript V8 takes 71 seconds, Fortran, C
and Java take about 20 seconds each. Their site seems be undergoing an
upgrade right now - the graphs don't work and you can't seem to get to
the source code (but I have a copy of the nbody code).
I had to make a minor change to the code so that it would compile in
ShedSkin (the main dictionary had lists and floats, now it just has
lists *of* floats and I dereference mass where required). Here's the
code:
http://dl.dropbox.com/u/1314015/nbody_shedskin.py
- it is just a couple of functions, 'advance' is the monster that eats
all the time.
Roughly speaking for 50,000,000 iterations (e.g. 'python2.7
nbody_shedskin.py 50000000) on my MacBook 2GHz:
CPython 35 mins
PyPy 1.5 JIT 5mins11sec
ShedSkin0.8 -l 2min28sec
ShedSkin0.8 -l -b -w 1min56sec # e.g. shedskin -l -b -w
nbody_shedskin.py; make; ./nbody_shedskin 50000000
Ultimately this means that the compiled and 'best' version of ShedSkin
that I can make (and I'm hoping you can spot any flaws I've made...)
is still beaten by JavaScript V8! I'd love to be able to announce
better figures during my tutorial at EuroPython. However - ShedSkin
does beat PyPy (and PyPy nicely beats CPython). These are great
results, I'd just like to know if I've missed anything obvious in the
benchmark.
In each of the above 4 test cases I confirm that only 1 CPU (50% of my
dual-core MacBook) is used. I'm using CPython 2.7 32bit.
Any feedback gratefully received,
Ian.
--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com
http://IanOzsvald.com
http://SocialTiesApp.com/
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
Heh, I was just writing to explain that I did almost identical here:
http://paste.pocoo.org/show/399045/
Heh, I was just writing to explain that I did almost identical here:
http://paste.pocoo.org/show/399045/
cool! I didn't know you could use an empty class like that and just
assign the attributes after.
That--and yours-- is also faster since it's not indexing into the
global MASS thing--which seems
weird even for a normal python implementation.
Also, Ian, you can make the code shorter using itertools.combinations:
PAIRS = list(combinations(SYSTEM, 2))
not sure of the effect on speed.
> note that here the algorithm has also been improved (by douglas mcneill
> iirc).
>
>
> thanks!!
> mark.
> --
> http://www.youtube.com/watch?v=E6LsfnBmdnk
>
Mark - I'm using your version. Brent, cheers for your version too.
Re. the version in gitorious, it evolves differently so I'll ignore it
(trying not to investigate too many things at once!), I hadn't
realised that someone had tried this already :-) If I'd have looked in
the examples/ directory I'd have known!
My goal is to try to keep the programs mostly the same (I only changed
the shedskin version to make it compile) and to try various tools to
make the code faster. I'm being pragmatic and trying to teach how I
make code faster+maintainable for clients (and often - clients who
don't want to learn new things or change the way they support
things!).
More tomorrow I guess,
i.
--
Brent - I note your point about the odd indexing approach in the
original code. I agree, I found it quite tricky to read. However, that
often occurs in client HPC projects and if I go changing their code
too much, they reject the alterations in favour of what they know.
That'll be part of the story I tell.
i.
> > Heh, I was just writing to explain that I did almost identical here:
> > http://paste.pocoo.org/show/399045/
> >
> >
> ah, I was probably faster because I practiced this.. :S
>
> http://gitorious.org/shedskin/mainline/blobs/master/examples/nbody.py
>
> note that here the algorithm has also been improved (by douglas
> mcneill iirc).
>
>
> thanks!!
> mark.
When changing the devision of distande**3 to /
(distance*distance*distance) the time drops to 55-60%.
Maybe the power function could still need some love to be optimized...
Thomas
Interestingly - I think I can claim that the Mark/Brent version would
beat the V8 Javascript benchmark. In terms of squeezing a lot of
performance out of a piece of code with little work, it is quite
impressive.
Re. the pypy version - maybe I'm staring at it too much but it looks
very much like the cpython version. What's different?
i.
> --
> You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
> To post to this group, send email to shedskin...@googlegroups.com.
> To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
>
>
--
E.g.
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Optimize-Options.html
"This option is not turned on by any -O option since it can result in
incorrect output for programs which depend on an exact implementation
of IEEE or ISO rules/specifications for math functions. It may,
however, yield faster code for programs that do not require the
guarantees of these specifications. "
I'm checking it here:
shedskin0.8 on MacOS X (no native flag anyhow), 02 takes 1min16 using
your class-based code from yesterday.
Adding -ffast-math - no change in speed (result exactly the same)
-O3 and --fast-math - no change in speed (result exactly the same)
So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo),
but that's probably because GCC is already using my hardware fairly
efficiently (yay). Anyhow, I've got to move on to the next task! Maybe
for you the difference was more to do with the native flag (you didn't
say if you'd tried that independently?)?
i.
So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo),
but that's probably because GCC is already using my hardware fairly
efficiently (yay). Anyhow, I've got to move on to the next task! Maybe
for you the difference was more to do with the native flag (you didn't
say if you'd tried that independently?)?
Interesting...what's your CPU? My Core2 Duo is old, it might be that
mine isn't that clever and yours is smarter?
Certainly my Snow Leopard's g++ is older than most (4.2.1 - a common
complaint from Mac users). Since -ffast-math is potentially unsafe I'm
not so worried but it is nice to know that the option can do something
useful (it was the only real 'trick' I had back as a Senior Programmer
if IEEE precision wasn't required!).
Sadly that's a right pain on MacOS and/or might get in the way of
system libs. I know people do upgrade GCC but I'd frankly be a bit
scared! I've nuked this machine once, I'm not losing a day again like
that :-) I hope to give the timings another go on my bigger
physics-office machine in a few weeks (but that's Windows - does
ShedSkin work with MSVC?).
Annoyingly this switch doesn't work in my g++ (4.2.1), the suggestion
online is to use:
-m64 -mtune=core2
in its place. This doesn't make it run any faster. I also added
--fast-math but the speed didn't change.
Can someone else confirm Mark's -ffast-math switch improves
performance without changing the numerical output?
Ian.
Cheers,
Ian.
I'm definitely missing something at this end it seems :-) I'll try
changing **2 -> sqrt, right now I'm arguing with a set of mandelbrot
solvers (I'll probably submit the shedskin version here shortly for
more suggestions!).
I might just use your timing results in my talk!
i.
(distance*distance*distance) the time drops to 55-60%.
Maybe the power function could still need some love to be optimized...