Here's the nbody code I'm using for the EuroPython tutorial. I'm comparing CPython, PyPy, Cython, ShedSkin and a few others tools (and looking at profiling first), the goal is to teach people about their options for making CPU-bound code faster.
I'm not worried about whether the test is really 'apples vs apples', I just care about the rough orders of magnitude. E.g. CPython on their Intel box takes 20 mins, JavaScript V8 takes 71 seconds, Fortran, C and Java take about 20 seconds each. Their site seems be undergoing an upgrade right now - the graphs don't work and you can't seem to get to the source code (but I have a copy of the nbody code).
I had to make a minor change to the code so that it would compile in ShedSkin (the main dictionary had lists and floats, now it just has lists *of* floats and I dereference mass where required). Here's the code: http://dl.dropbox.com/u/1314015/nbody_shedskin.py - it is just a couple of functions, 'advance' is the monster that eats all the time.
Roughly speaking for 50,000,000 iterations (e.g. 'python2.7 nbody_shedskin.py 50000000) on my MacBook 2GHz: CPython 35 mins PyPy 1.5 JIT 5mins11sec ShedSkin0.8 -l 2min28sec ShedSkin0.8 -l -b -w 1min56sec # e.g. shedskin -l -b -w nbody_shedskin.py; make; ./nbody_shedskin 50000000
Ultimately this means that the compiled and 'best' version of ShedSkin that I can make (and I'm hoping you can spot any flaws I've made...) is still beaten by JavaScript V8! I'd love to be able to announce better figures during my tutorial at EuroPython. However - ShedSkin does beat PyPy (and PyPy nicely beats CPython). These are great results, I'd just like to know if I've missed anything obvious in the benchmark.
In each of the above 4 test cases I confirm that only 1 CPU (50% of my dual-core MacBook) is used. I'm using CPython 2.7 32bit.
Any feedback gratefully received, Ian.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
thanks for all your mails :) I will try to reply to all of them soon. for now, please see attachment for a version of your code that is almost twice as fast after compilation.
the main difference is that I added a 'body' class, and use attributes instead of dicts and tuples to hold the physical constants. dict and tuple indexing is very slow compared to simple attribute accesses in C/C++. attributes are slower in cpython I guess, and that's probably the reason this shootout implementation was coded up as it was, but there you have it. imnsho, this version is also more readable.. ;)
On Wed, Jun 1, 2011 at 7:08 PM, Ian Ozsvald <i...@ianozsvald.com> wrote: > Here's the nbody code I'm using for the EuroPython tutorial. I'm > comparing CPython, PyPy, Cython, ShedSkin and a few others tools (and > looking at profiling first), the goal is to teach people about their > options for making CPU-bound code faster.
> I'm not worried about whether the test is really 'apples vs apples', I > just care about the rough orders of magnitude. E.g. CPython on their > Intel box takes 20 mins, JavaScript V8 takes 71 seconds, Fortran, C > and Java take about 20 seconds each. Their site seems be undergoing an > upgrade right now - the graphs don't work and you can't seem to get to > the source code (but I have a copy of the nbody code).
> I had to make a minor change to the code so that it would compile in > ShedSkin (the main dictionary had lists and floats, now it just has > lists *of* floats and I dereference mass where required). Here's the > code: > http://dl.dropbox.com/u/1314015/nbody_shedskin.py > - it is just a couple of functions, 'advance' is the monster that eats > all the time.
> Roughly speaking for 50,000,000 iterations (e.g. 'python2.7 > nbody_shedskin.py 50000000) on my MacBook 2GHz: > CPython 35 mins > PyPy 1.5 JIT 5mins11sec > ShedSkin0.8 -l 2min28sec > ShedSkin0.8 -l -b -w 1min56sec # e.g. shedskin -l -b -w > nbody_shedskin.py; make; ./nbody_shedskin 50000000
> Ultimately this means that the compiled and 'best' version of ShedSkin > that I can make (and I'm hoping you can spot any flaws I've made...) > is still beaten by JavaScript V8! I'd love to be able to announce > better figures during my tutorial at EuroPython. However - ShedSkin > does beat PyPy (and PyPy nicely beats CPython). These are great > results, I'd just like to know if I've missed anything obvious in the > benchmark.
> In each of the above 4 test cases I confirm that only 1 CPU (50% of my > dual-core MacBook) is used. I'm using CPython 2.7 32bit.
> Any feedback gratefully received, > Ian.
> -- > Ian Ozsvald (A.I. researcher, screencaster) > i...@IanOzsvald.com
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
On Wed, Jun 1, 2011 at 12:21 PM, Mark Dufour <mark.duf...@gmail.com> wrote: > hi ian,
> thanks for all your mails :) I will try to reply to all of them soon. for > now, please see attachment for a version of your code that is almost twice > as fast after compilation.
> the main difference is that I added a 'body' class, and use attributes > instead of dicts and tuples to hold the physical constants. dict and tuple > indexing is very slow compared to simple attribute accesses in C/C++. > attributes are slower in cpython I guess, and that's probably the reason > this shootout implementation was coded up as it was, but there you have it. > imnsho, this version is also more readable.. ;)
> On Wed, Jun 1, 2011 at 7:08 PM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> Here's the nbody code I'm using for the EuroPython tutorial. I'm >> comparing CPython, PyPy, Cython, ShedSkin and a few others tools (and >> looking at profiling first), the goal is to teach people about their >> options for making CPU-bound code faster.
>> I'm not worried about whether the test is really 'apples vs apples', I >> just care about the rough orders of magnitude. E.g. CPython on their >> Intel box takes 20 mins, JavaScript V8 takes 71 seconds, Fortran, C >> and Java take about 20 seconds each. Their site seems be undergoing an >> upgrade right now - the graphs don't work and you can't seem to get to >> the source code (but I have a copy of the nbody code).
>> I had to make a minor change to the code so that it would compile in >> ShedSkin (the main dictionary had lists and floats, now it just has >> lists *of* floats and I dereference mass where required). Here's the >> code: >> http://dl.dropbox.com/u/1314015/nbody_shedskin.py >> - it is just a couple of functions, 'advance' is the monster that eats >> all the time.
>> Roughly speaking for 50,000,000 iterations (e.g. 'python2.7 >> nbody_shedskin.py 50000000) on my MacBook 2GHz: >> CPython 35 mins >> PyPy 1.5 JIT 5mins11sec >> ShedSkin0.8 -l 2min28sec >> ShedSkin0.8 -l -b -w 1min56sec # e.g. shedskin -l -b -w >> nbody_shedskin.py; make; ./nbody_shedskin 50000000
>> Ultimately this means that the compiled and 'best' version of ShedSkin >> that I can make (and I'm hoping you can spot any flaws I've made...) >> is still beaten by JavaScript V8! I'd love to be able to announce >> better figures during my tutorial at EuroPython. However - ShedSkin >> does beat PyPy (and PyPy nicely beats CPython). These are great >> results, I'd just like to know if I've missed anything obvious in the >> benchmark.
>> In each of the above 4 test cases I confirm that only 1 CPU (50% of my >> dual-core MacBook) is used. I'm using CPython 2.7 32bit.
>> Any feedback gratefully received, >> Ian.
>> -- >> Ian Ozsvald (A.I. researcher, screencaster) >> i...@IanOzsvald.com
>> -- >> You received this message because you are subscribed to the Google Groups >> "shedskin-discuss" group. >> To post to this group, send email to shedskin-discuss@googlegroups.com. >> To unsubscribe from this group, send email to >> shedskin-discuss+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/shedskin-discuss?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
cool! I didn't know you could use an empty class like that and just assign the attributes after. That--and yours-- is also faster since it's not indexing into the global MASS thing--which seems weird even for a normal python implementation.
Also, Ian, you can make the code shorter using itertools.combinations:
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
Mark - I'm using your version. Brent, cheers for your version too.
Re. the version in gitorious, it evolves differently so I'll ignore it (trying not to investigate too many things at once!), I hadn't realised that someone had tried this already :-) If I'd have looked in the examples/ directory I'd have known!
My goal is to try to keep the programs mostly the same (I only changed the shedskin version to make it compile) and to try various tools to make the code faster. I'm being pragmatic and trying to teach how I make code faster+maintainable for clients (and often - clients who don't want to learn new things or change the way they support things!).
More tomorrow I guess, i.
On 1 June 2011 19:39, Brent Pedersen <bpede...@gmail.com> wrote:
> cool! I didn't know you could use an empty class like that and just > assign the attributes after. > That--and yours-- is also faster since it's not indexing into the > global MASS thing--which seems > weird even for a normal python implementation.
> Also, Ian, you can make the code shorter using itertools.combinations:
> PAIRS = list(combinations(SYSTEM, 2))
> not sure of the effect on speed.
>> note that here the algorithm has also been improved (by douglas mcneill >> iirc).
>> -- >> You received this message because you are subscribed to the Google Groups >> "shedskin-discuss" group. >> To post to this group, send email to shedskin-discuss@googlegroups.com. >> To unsubscribe from this group, send email to >> shedskin-discuss+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/shedskin-discuss?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
Interestingly - I think I can claim that the Mark/Brent version would beat the V8 Javascript benchmark. In terms of squeezing a lot of performance out of a piece of code with little work, it is quite impressive.
Brent - I note your point about the odd indexing approach in the original code. I agree, I found it quite tricky to read. However, that often occurs in client HPC projects and if I go changing their code too much, they reject the alterations in favour of what they know. That'll be part of the story I tell.
i.
On 1 June 2011 22:09, Ian Ozsvald <i...@ianozsvald.com> wrote:
> Mark - I'm using your version. Brent, cheers for your version too.
> Re. the version in gitorious, it evolves differently so I'll ignore it > (trying not to investigate too many things at once!), I hadn't > realised that someone had tried this already :-) If I'd have looked in > the examples/ directory I'd have known!
> My goal is to try to keep the programs mostly the same (I only changed > the shedskin version to make it compile) and to try various tools to > make the code faster. I'm being pragmatic and trying to teach how I > make code faster+maintainable for clients (and often - clients who > don't want to learn new things or change the way they support > things!).
> More tomorrow I guess, > i.
> On 1 June 2011 19:39, Brent Pedersen <bpede...@gmail.com> wrote: >> On Wed, Jun 1, 2011 at 12:32 PM, Mark Dufour <mark.duf...@gmail.com> wrote:
>> cool! I didn't know you could use an empty class like that and just >> assign the attributes after. >> That--and yours-- is also faster since it's not indexing into the >> global MASS thing--which seems >> weird even for a normal python implementation.
>> Also, Ian, you can make the code shorter using itertools.combinations:
>> PAIRS = list(combinations(SYSTEM, 2))
>> not sure of the effect on speed.
>>> note that here the algorithm has also been improved (by douglas mcneill >>> iirc).
>>> -- >>> You received this message because you are subscribed to the Google Groups >>> "shedskin-discuss" group. >>> To post to this group, send email to shedskin-discuss@googlegroups.com. >>> To unsubscribe from this group, send email to >>> shedskin-discuss+unsubscribe@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/shedskin-discuss?hl=en.
>> -- >> You received this message because you are subscribed to the Google Groups "shedskin-discuss" group. >> To post to this group, send email to shedskin-discuss@googlegroups.com. >> To unsubscribe from this group, send email to shedskin-discuss+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
> -- > Ian Ozsvald (A.I. researcher, screencaster) > i...@IanOzsvald.com
On Jun 1, 10:08 am, Ian Ozsvald <i...@ianozsvald.com> wrote:
-snip-
> Their site seems be undergoing an upgrade right now - the graphs don't work and you
> can't seem to get to the source code (but I have a copy of the nbody code).
1) The benchmarks game website is OK now - please try refreshing your
browser cache.
2) You can web browse CVS to see Python 2 and Python 3 implementations
of n-body
>> note that here the algorithm has also been improved (by douglas >> mcneill iirc).
>> thanks!! >> mark.
> When changing the devision of distande**3 to / > (distance*distance*distance) the time drops to 55-60%.
> Maybe the power function could still need some love to be optimized...
> Thomas
> -- > You received this message because you are subscribed to the Google Groups "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
On Wed, Jun 1, 2011 at 11:11 PM, Ian Ozsvald <i...@ianozsvald.com> wrote: > Interestingly - I think I can claim that the Mark/Brent version would > beat the V8 Javascript benchmark. In terms of squeezing a lot of > performance out of a piece of code with little work, it is quite > impressive.
I played a bit with GCC flags, and tried something suggested to me a while a go.. -ffast-math. IIUC, we basically tell the CPU here that it shouldn't care about the precise IEEE specification in weird corner cases, such as dividing infinity by -1, things you may never want to occur in your code anyway.
The website is better - last night I saw the graphs again but the source code links still failed, this morning I see that the src code pages work again too. Cool.
Re. the pypy version - maybe I'm staring at it too much but it looks very much like the cpython version. What's different?
i.
On 1 June 2011 23:15, Isaac Gouy <igo...@yahoo.com> wrote:
> On Jun 1, 10:08 am, Ian Ozsvald <i...@ianozsvald.com> wrote: > -snip- >> Their site seems be undergoing an upgrade right now - the graphs don't work and you >> can't seem to get to the source code (but I have a copy of the nbody code).
> 1) The benchmarks game website is OK now - please try refreshing your > browser cache.
> 2) You can web browse CVS to see Python 2 and Python 3 implementations > of n-body
> -- > You received this message because you are subscribed to the Google Groups "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
IIRC ffast-math will use register-based short-cut arithmetic which can have shorter precision than IEEE based arithmetic. From memory it'll give you more rounding errors but should run faster. Normally my goal is to preserve absolute precision (e.g. for physics - double precision is already imprecise enough!). Definitely one for the tips section though, many apps don't need all that precision.
E.g. http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Optimize-Options.html "This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. "
I'm checking it here: shedskin0.8 on MacOS X (no native flag anyhow), 02 takes 1min16 using your class-based code from yesterday. Adding -ffast-math - no change in speed (result exactly the same) -O3 and --fast-math - no change in speed (result exactly the same)
So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo), but that's probably because GCC is already using my hardware fairly efficiently (yay). Anyhow, I've got to move on to the next task! Maybe for you the difference was more to do with the native flag (you didn't say if you'd tried that independently?)?
i.
On 2 June 2011 09:20, Mark Dufour <mark.duf...@gmail.com> wrote:
> On Wed, Jun 1, 2011 at 11:11 PM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> Interestingly - I think I can claim that the Mark/Brent version would >> beat the V8 Javascript benchmark. In terms of squeezing a lot of >> performance out of a piece of code with little work, it is quite >> impressive.
> I played a bit with GCC flags, and tried something suggested to me a while a > go.. -ffast-math. IIUC, we basically tell the CPU here that it shouldn't > care about the precise IEEE specification in weird corner cases, such as > dividing infinity by -1, things you may never want to occur in your code > anyway.
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
> So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo), > but that's probably because GCC is already using my hardware fairly > efficiently (yay). Anyhow, I've got to move on to the next task! Maybe > for you the difference was more to do with the native flag (you didn't > say if you'd tried that independently?)?
I tried it independently just now, and it's definitely -ffast-math.
>> So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo), >> but that's probably because GCC is already using my hardware fairly >> efficiently (yay). Anyhow, I've got to move on to the next task! Maybe >> for you the difference was more to do with the native flag (you didn't >> say if you'd tried that independently?)?
> I tried it independently just now, and it's definitely -ffast-math.
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
On Thu, Jun 2, 2011 at 11:14 AM, Ian Ozsvald <i...@ianozsvald.com> wrote: > Interesting...what's your CPU? My Core2 Duo is old, it might be that > mine isn't that clever and yours is smarter?
mine's a bit newer I guess, but not that much:
model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
> On 2 June 2011 10:10, Mark Dufour <mark.duf...@gmail.com> wrote:
> >> So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo), > >> but that's probably because GCC is already using my hardware fairly > >> efficiently (yay). Anyhow, I've got to move on to the next task! Maybe > >> for you the difference was more to do with the native flag (you didn't > >> say if you'd tried that independently?)?
> > I tried it independently just now, and it's definitely -ffast-math.
> > -- > > You received this message because you are subscribed to the Google Groups > > "shedskin-discuss" group. > > To post to this group, send email to shedskin-discuss@googlegroups.com. > > To unsubscribe from this group, send email to > > shedskin-discuss+unsubscribe@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/shedskin-discuss?hl=en.
> -- > Ian Ozsvald (A.I. researcher, screencaster) > i...@IanOzsvald.com
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
Certainly my Snow Leopard's g++ is older than most (4.2.1 - a common complaint from Mac users). Since -ffast-math is potentially unsafe I'm not so worried but it is nice to know that the option can do something useful (it was the only real 'trick' I had back as a Senior Programmer if IEEE precision wasn't required!).
On 2 June 2011 10:23, Mark Dufour <mark.duf...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:14 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> Interesting...what's your CPU? My Core2 Duo is old, it might be that >> mine isn't that clever and yours is smarter?
> mine's a bit newer I guess, but not that much:
> model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc > arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 > ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
>> On 2 June 2011 10:10, Mark Dufour <mark.duf...@gmail.com> wrote:
>> >> So my GCC on a Core2 Duo Macbook doesn't show any improvement (boo), >> >> but that's probably because GCC is already using my hardware fairly >> >> efficiently (yay). Anyhow, I've got to move on to the next task! Maybe >> >> for you the difference was more to do with the native flag (you didn't >> >> say if you'd tried that independently?)?
>> > I tried it independently just now, and it's definitely -ffast-math.
>> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "shedskin-discuss" group. >> > To post to this group, send email to shedskin-discuss@googlegroups.com. >> > To unsubscribe from this group, send email to >> > shedskin-discuss+unsubscribe@googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/group/shedskin-discuss?hl=en.
>> -- >> Ian Ozsvald (A.I. researcher, screencaster) >> i...@IanOzsvald.com
>> -- >> You received this message because you are subscribed to the Google Groups >> "shedskin-discuss" group. >> To post to this group, send email to shedskin-discuss@googlegroups.com. >> To unsubscribe from this group, send email to >> shedskin-discuss+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/shedskin-discuss?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
On Thu, Jun 2, 2011 at 11:41 AM, Ian Ozsvald <i...@ianozsvald.com> wrote: > Certainly my Snow Leopard's g++ is older than most (4.2.1 - a common > complaint from Mac users). Since -ffast-math is potentially unsafe I'm > not so worried but it is nice to know that the option can do something > useful (it was the only real 'trick' I had back as a Senior Programmer > if IEEE precision wasn't required!).
it would be interesting to see what happens when you install a recent linux distro on there, with GCC 4.5 or 4.6.. ;-)
Sadly that's a right pain on MacOS and/or might get in the way of system libs. I know people do upgrade GCC but I'd frankly be a bit scared! I've nuked this machine once, I'm not losing a day again like that :-) I hope to give the timings another go on my bigger physics-office machine in a few weeks (but that's Windows - does ShedSkin work with MSVC?). i.
On 2 June 2011 10:44, Mark Dufour <mark.duf...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:41 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> Certainly my Snow Leopard's g++ is older than most (4.2.1 - a common >> complaint from Mac users). Since -ffast-math is potentially unsafe I'm >> not so worried but it is nice to know that the option can do something >> useful (it was the only real 'trick' I had back as a Senior Programmer >> if IEEE precision wasn't required!).
> it would be interesting to see what happens when you install a recent linux > distro on there, with GCC 4.5 or 4.6.. ;-)
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
On Thu, Jun 2, 2011 at 11:46 AM, Ian Ozsvald <i...@ianozsvald.com> wrote: > Sadly that's a right pain on MacOS and/or might get in the way of > system libs. I know people do upgrade GCC but I'd frankly be a bit
I didn't mean to upgrade GCC, but to install a nice linux distro on a separate partition.. :-) this distribution will probably upgrade your GCC every 6 months or so for you.
> scared! I've nuked this machine once, I'm not losing a day again like > that :-) I hope to give the timings another go on my bigger > physics-office machine in a few weeks (but that's Windows - does > ShedSkin work with MSVC?).
well, it doesn't really have to, because the windows version comes with GCC (4.5.. :-)).. but there is a hidden (-v) flag, to generate more or less MSVC compatible output (including makefile). I haven't heard of anyone trying this recently, so I actually just made the option hidden for 0.8.. <http://twitter.com/IanOzsvald>there's also a hidden 'pypy' compatibility (iirc, -p) mode that someone sent a patch for at one point.
Mark suggested I try -march=native as it'll enable SSE2 (I confess I'd forgotten that - the native architecture switches did give me small benefits in the past on e.g. Pentium, AMD64 specific platforms).
Annoyingly this switch doesn't work in my g++ (4.2.1), the suggestion online is to use: -m64 -mtune=core2 in its place. This doesn't make it run any faster. I also added --fast-math but the speed didn't change.
Can someone else confirm Mark's -ffast-math switch improves performance without changing the numerical output?
Ian.
On 2 June 2011 10:52, Mark Dufour <mark.duf...@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 11:46 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> Sadly that's a right pain on MacOS and/or might get in the way of >> system libs. I know people do upgrade GCC but I'd frankly be a bit
> I didn't mean to upgrade GCC, but to install a nice linux distro on a > separate partition.. :-) this distribution will probably upgrade your GCC > every 6 months or so for you.
>> scared! I've nuked this machine once, I'm not losing a day again like >> that :-) I hope to give the timings another go on my bigger >> physics-office machine in a few weeks (but that's Windows - does >> ShedSkin work with MSVC?).
> well, it doesn't really have to, because the windows version comes with GCC > (4.5.. :-)).. but there is a hidden (-v) flag, to generate more or less MSVC > compatible output (including makefile). I haven't heard of anyone trying > this recently, so I actually just made the option hidden for 0.8.. there's > also a hidden 'pypy' compatibility (iirc, -p) mode that someone sent a patch > for at one point.
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
I tried it using gcc 4.5.0 (on OpenSuse 11.3) and the version of
nbody.py in the shedskin examples (0.7.1.3).
Running with the default parameters on a Core 2 Duo, I get
4.0 seconds for -march=native
0.4 seconds for -ffast-math
One potential issue is the code raises values to the power of 0.5
rather than calling sqrt. When I change the power to sqrt (either in
the python file or in the cpp file), the -march=native time drops to
2.8s. (The -ffast-math time is unaffected).
Mark
On Jun 2, 6:02 am, Ian Ozsvald <i...@ianozsvald.com> wrote:
> Mark suggested I try -march=native as it'll enable SSE2 (I confess I'd
> forgotten that - the native architecture switches did give me small
> benefits in the past on e.g. Pentium, AMD64 specific platforms).
> Annoyingly this switch doesn't work in my g++ (4.2.1), the suggestion
> online is to use:
> -m64 -mtune=core2
> in its place. This doesn't make it run any faster. I also added
> --fast-math but the speed didn't change.
> Can someone else confirm Mark's -ffast-math switch improves
> performance without changing the numerical output?
> Ian.
> On 2 June 2011 10:52, Mark Dufour <mark.duf...@gmail.com> wrote:
> > On Thu, Jun 2, 2011 at 11:46 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
> >> Sadly that's a right pain on MacOS and/or might get in the way of
> >> system libs. I know people do upgrade GCC but I'd frankly be a bit
> > I didn't mean to upgrade GCC, but to install a nice linux distro on a
> > separate partition.. :-) this distribution will probably upgrade your GCC
> > every 6 months or so for you.
> >> scared! I've nuked this machine once, I'm not losing a day again like
> >> that :-) I hope to give the timings another go on my bigger
> >> physics-office machine in a few weeks (but that's Windows - does
> >> ShedSkin work with MSVC?).
> > well, it doesn't really have to, because the windows version comes with GCC
> > (4.5.. :-)).. but there is a hidden (-v) flag, to generate more or less MSVC
> > compatible output (including makefile). I haven't heard of anyone trying
> > this recently, so I actually just made the option hidden for 0.8.. there's
> > also a hidden 'pypy' compatibility (iirc, -p) mode that someone sent a patch
> > for at one point.
> > --
> > You received this message because you are subscribed to the Google Groups
> > "shedskin-discuss" group.
> > To post to this group, send email to shedskin-discuss@googlegroups.com.
> > To unsubscribe from this group, send email to
> > shedskin-discuss+unsubscribe@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/shedskin-discuss?hl=en.
> --
> Ian Ozsvald (A.I. researcher, screencaster)
> i...@IanOzsvald.com
I'm definitely missing something at this end it seems :-) I'll try changing **2 -> sqrt, right now I'm arguing with a set of mandelbrot solvers (I'll probably submit the shedskin version here shortly for more suggestions!).
Cheers, Ian.
On 3 June 2011 05:37, Mark Dewing <markdew...@gmail.com> wrote:
> I tried it using gcc 4.5.0 (on OpenSuse 11.3) and the version of > nbody.py in the shedskin examples (0.7.1.3). > Running with the default parameters on a Core 2 Duo, I get
> 4.0 seconds for -march=native > 0.4 seconds for -ffast-math
> One potential issue is the code raises values to the power of 0.5 > rather than calling sqrt. When I change the power to sqrt (either in > the python file or in the cpp file), the -march=native time drops to > 2.8s. (The -ffast-math time is unaffected).
> Mark
> On Jun 2, 6:02 am, Ian Ozsvald <i...@ianozsvald.com> wrote: >> Mark suggested I try -march=native as it'll enable SSE2 (I confess I'd >> forgotten that - the native architecture switches did give me small >> benefits in the past on e.g. Pentium, AMD64 specific platforms).
>> Annoyingly this switch doesn't work in my g++ (4.2.1), the suggestion >> online is to use: >> -m64 -mtune=core2 >> in its place. This doesn't make it run any faster. I also added >> --fast-math but the speed didn't change.
>> Can someone else confirm Mark's -ffast-math switch improves >> performance without changing the numerical output?
>> Ian.
>> On 2 June 2011 10:52, Mark Dufour <mark.duf...@gmail.com> wrote:
>> > On Thu, Jun 2, 2011 at 11:46 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> >> Sadly that's a right pain on MacOS and/or might get in the way of >> >> system libs. I know people do upgrade GCC but I'd frankly be a bit
>> > I didn't mean to upgrade GCC, but to install a nice linux distro on a >> > separate partition.. :-) this distribution will probably upgrade your GCC >> > every 6 months or so for you.
>> >> scared! I've nuked this machine once, I'm not losing a day again like >> >> that :-) I hope to give the timings another go on my bigger >> >> physics-office machine in a few weeks (but that's Windows - does >> >> ShedSkin work with MSVC?).
>> > well, it doesn't really have to, because the windows version comes with GCC >> > (4.5.. :-)).. but there is a hidden (-v) flag, to generate more or less MSVC >> > compatible output (including makefile). I haven't heard of anyone trying >> > this recently, so I actually just made the option hidden for 0.8.. there's >> > also a hidden 'pypy' compatibility (iirc, -p) mode that someone sent a patch >> > for at one point.
>> > -- >> > You received this message because you are subscribed to the Google Groups >> > "shedskin-discuss" group. >> > To post to this group, send email to shedskin-discuss@googlegroups.com. >> > To unsubscribe from this group, send email to >> > shedskin-discuss+unsubscribe@googlegroups.com. >> > For more options, visit this group at >> >http://groups.google.com/group/shedskin-discuss?hl=en.
>> -- >> Ian Ozsvald (A.I. researcher, screencaster) >> i...@IanOzsvald.com
> -- > You received this message because you are subscribed to the Google Groups "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com
On Fri, Jun 3, 2011 at 10:11 AM, Ian Ozsvald <i...@ianozsvald.com> wrote: > I'm definitely missing something at this end it seems :-) I'll try > changing **2 -> sqrt, right now I'm arguing with a set of mandelbrot > solvers (I'll probably submit the shedskin version here shortly for > more suggestions!).
I also see the speedup with just -O2 -ffast-math (without -march=native), so it's starting to look like GCC 4.2 is missing something.. :-)
a speedup of 10 times seems absurd though! (thanks mark, btw!)
Yeah, I've just tried a bunch of optimisation flags on my mandelbrot problem and they barely change the speed at all. I think my gcc might be out of date. I might have found a drag/drop newer version of gcc for Snow Leopard (but I'm wary about conflicts).
I might just use your timing results in my talk!
i.
On 3 June 2011 09:22, Mark Dufour <mark.duf...@gmail.com> wrote:
> On Fri, Jun 3, 2011 at 10:11 AM, Ian Ozsvald <i...@ianozsvald.com> wrote:
>> I'm definitely missing something at this end it seems :-) I'll try >> changing **2 -> sqrt, right now I'm arguing with a set of mandelbrot >> solvers (I'll probably submit the shedskin version here shortly for >> more suggestions!).
> I also see the speedup with just -O2 -ffast-math (without -march=native), so > it's starting to look like GCC 4.2 is missing something.. :-)
> a speedup of 10 times seems absurd though! (thanks mark, btw!)
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.
-- Ian Ozsvald (A.I. researcher, screencaster) i...@IanOzsvald.com