My understanding is that any double that gets stored in a vector or
map is boxed, and therefore, the vast majority of your double
conversions aren't really doing anything, because when you pull them
out of the vector or map, they'll just be Double objects again.
I believe that the biggest reason for the performance difference
between Clojure and Java on this benchmark is that if you use a
Clojure data structure (e.g., a map) to represent each "body" object,
you get a lot of boxing and unboxing of the doubles.
So I hypothesize that the biggest bang for the buck is to find a way
in Clojure to create body objects with primitive double fields. Can
the new new do that? If not, your best bet may be to represent a body
object as a Java array of several doubles, rather than using a Clojure
map. The first double could represent the mass, the second could
represent vx, the second vy, and so on. Make macros for the accessors
that extract these values using aget.
It's speculation on my part, but I think this will get you much closer
to Java speed than your current approach.
There still seems to be a lot of boxing and unboxing going on. For example, in:
(let [[momx momy momz] (offset-momentum bodies)
the call to offset-momentum returns a vector of the three doubles, so
they would be boxed, so momx momy momz are not primitives.
I've tried an approach like you suggest, using mutable Java arrays ofdoubles, macros using aget / aset-double for reading and writing these
arrays, and loop/recur everywhere iteration is needed in the inner
loop. It is here:
I remember reading on someone's blog that even though it isn't
announced by the reflection warning, you can get a big performance
boost with aset and aget by explicitly casting the index to an int.
So that's another idea you might want to try.
At that point is it possible you're just paying the price of
PersistentVector for the "bodies" vector? Does it improve much if you
change bodies to an array?
I'm actually glad that the difference is that small there.
My runtime with n = 5,000,000 goes from ~7.5 seconds to ~4.5 seconds.
I can't currently check whether the java code gets the same
performance boost, but it's possible and even likely that the clojure
version would see a better improvement from those parameters than the
java one.
I actually just confirmed that, on my computer at least, those flags
have basically no effect on the java version and a big difference on
the clojure one.
Am 16.08.2009 um 12:50 schrieb Nicolas Oury:
> The bad news: I am cheating a little bit...
Why is this cheating? People wrote programs in
C and dropped down to Assembly if necessary.
People write programs in Python and drop down
to C if necessary.
Why can't we write programs in Clojure and
drop down to Java if necessary?
Sincerely
Meikel
From the various things I've read about Clojure's performance, I've
always had this sense that:
a) if you have a performance problem, there's probably some inner loop
that needs to be optimized, and so
b) you can use Clojure's type-hinting or primitive-casting within that
inner loop to get Java-like performance.
But these benchmarks are showing that when it comes to performance
degradation caused by boxing/unboxing primitives it doesn't always
occur in a single loop that is easily optimized. The most obvious
problem that is showing up in these benchmark examples is that often
these primitives tend to be combined in manipulated as some kind of
struct (e.g., an x,y,z vector). Or sometimes, the logic manipulating
these primitives can't easily be described in a single loop, but is
more easily expressed when spread out over several functions. But in
Clojure, just about everything you do (other than let/loop/recur)
boxes a primitive right back up again, making it very, very difficult
to write performant code. Very often, primitives need to be combined
as part of structured data, passed to a function, etc.
So I agree with those who are pointing to these examples and saying,
"Hey, what can we do to make it easier to optimize our Clojure
programs, while maintaining the basic structure and feel of the code?"
I think Nicolas Oury is right on when he suggests that adding a
Clojure way to build structures with primitive fields would be a
valuable addition from a performance-within-Clojure standpoint,
allowing for more localized rewrites to boost performance.
I was referring to the rules of the benchmark game. When you benchmark
language, using another language is not fair.
If you were to do your own program, of course you could use Java.
However, in the particular circumstance, it is a bit annoying to use
Java just to create a data structure type.
Dear all,
The good news: I have a version of the N-body benchmark that goes "as
fast as java".
The bad news: I am cheating a little bit...
So to sum up how to have micro-benchmarks as fast as java:
- manual inlining. There seems to have a cost to calling a function.
Maybe the new code generator will improve that.
Which makes me wonder why aget doesn't automatically coerce an index
to an int. Would an input that can't be coerced to an int even really
make sense here?
Seems like literals could be auto-coerced to whatever the type needs to be.
On 2009-08-17, at 8:58 PM, FFT <fft...@gmail.com> wrote:
> On Mon, Aug 17, 2009 at 9:25 AM, Bradbev<brad.be...@gmail.com>
> wrote:
>>
>> On Aug 17, 1:32 am, Nicolas Oury <nicolas.o...@gmail.com> wrote:
>>> I was referring to the rules of the benchmark game. When you
>>> benchmark
>>> language, using another language is not fair.
>>>
>>> If you were to do your own program, of course you could use Java.
>>> However, in the particular circumstance, it is a bit annoying to use
>>> Java just to create a data structure type.
>>>
>> Ah, that makes more sense re the "cheating" then. Your insight for
>> array range check elimination got me thinking - why can't the
>> accessor
>> macros (posx, etc) that use aset/aget have their ranges eliminated by
>> the JVM? After all, it should be a simple constant fold. I found
>> another 2-3x speed up by coercing the indexes with (int x), ie
>> (defmacro mass [p] `(double (aget ~p (int 0))))
>
> I'm not seeing this. Maybe you are running this on "-client"?
I'm running Java 1.5 32bit on OS X 10.5 with -server.
>
>> I don't have the Java version running on my machine, but I saw
>> runtimes go from 833ms to 295ms for 100000 iterations, a 2.8x speed
>> up, which should put the "no cheating" version on the same standing
>> as
>> the Java implementation.
>
> You can't get a consistent timing for anything less than 1-10M
> iterations here.
Why do you think that? Everything I've read says that hotspot kicks
in at 10,000, and I always do a warmup run.
I see consistent enough timings, within 50ms each run. When coerced
ints gives an immediate 3x speedup something is happening. What JVM
are you running & what settings? I'll compile the java version soon
so I can do a direct compare on a single machine. I take it that your
setup is showing clojure 3x slower than the java version?
Brad
I'm not seeing this. Maybe you are running this on "-client"?
> I don't have the Java version running on my machine, but I saw
> runtimes go from 833ms to 295ms for 100000 iterations, a 2.8x speed
> up, which should put the "no cheating" version on the same standing as
> the Java implementation.
You can't get a consistent timing for anything less than 1-10M iterations here.
I don't see much of any difference here from those coercions either.
What clojure version are you using?
-- Aaron
I reworked advance! a little bit and while it didn't have much
performance impact, I find this version a little clearer:
(defmacro doarray
"Executes an expression for each element of array a (presumably for
its side-effects), using an index named idx,
beginning with element 'start'"
[a idx start expr]
`(let [a# ~a end# (int (alength a#))]
(loop [~idx (int ~start)]
(if (< ~idx end#)
(do
~expr
(recur (unchecked-inc ~idx)))))))
(defn advance! [#^"[Ljava.lang.Object;" bodies delta-t]
(let [delta-t (double delta-t)]
(doarray bodies i1 0
(doarray bodies i2 (unchecked-inc i1)
(let [#^doubles b1 (aget bodies i1)
#^doubles b2 (aget bodies i2)
delta-posx (- (posx b1) (posx b2))
delta-posy (- (posy b1) (posy b2))
delta-posz (- (posz b1) (posz b2))
dist-squared (+ (+ (* delta-posx delta-posx)
(* delta-posy delta-posy))
(* delta-posz delta-posz))
dist (Math/sqrt dist-squared)
mag (/ delta-t (* dist-squared dist))
b1-scale (* (- mag) (mass b2))
dv1x (* delta-posx b1-scale)
dv1y (* delta-posy b1-scale)
dv1z (* delta-posz b1-scale)
b2-scale (* mag (mass b1))
dv2x (* delta-posx b2-scale)
dv2y (* delta-posy b2-scale)
dv2z (* delta-posz b2-scale)]
(add-to-vel! b1 dv1x dv1y dv1z)
(add-to-vel! b2 dv2x dv2y dv2z))))
(doarray bodies i 0
(let [#^doubles b (aget bodies i)]
(set-posx! b (+ (posx b) (* (velx b) delta-t)))
(set-posy! b (+ (posy b) (* (vely b) delta-t)))
(set-posz! b (+ (posz b) (* (velz b) delta-t)))))))