You are correct. I've updated that file:
http://github.com/jafingerhut/clojure-benchmarks/blob/bb9755bdeeccae84a9b09fbf34e45f6d45d4b627/RESULTS
I believe that (zero? i) is faster than (= 0 i).
Needed unchecked-dec in place of unchecked-inc, and I used (zero? i)> What I suggest is
>
> (loop [zr (double 0.0)
> zi (double 0.0)
> i (int (inc iterations-remaining))]
> (let [zr2 (* zr zr)
> zi2 (* zi zi)]
> (if (and (not (= 0 i)) (< (+ zr2 zi2 limit-square)))
> (recur (+ (- zr2 zi2) pr) (+ (* (* f2 zr) zi) pi) (unchecked-inc i))
> (whatever...))))
>
> * Same calculations
> * Less items in recur rebind
> * Counter is also primitive
>
> (untested, may need slight tweakage)
instead of (= 0 i) (not sure if that makes much difference), and the
results are much better -- the time went down to a little less than
half of what it was before.
http://github.com/jafingerhut/clojure-benchmarks/blob/fe10ef25ec17b77dd03f6d1ccff4f35273764f3b/RESULTS
Thanks!
Andy
I did two runs for each version, with the only difference between them
being replacing the (zero? i) expression in function 'dot' with a
different expression, as indicated below. (zero? i) is a clear winner
in reducing run time.
> A very straightforward version, and 875.36796ms/100000000 = 8.7536796ns.
> This is on a 2.5GHz machine, so that's only about 22 native instructions per
> iteration. The theoretical minimum is over 1/2 of that:
> Four fmuls
> Three fadds and an fsub
> A floating point comparison
> An integer add and an integer comparison
> A branch back to the start of the loop
> So we're within a factor of 2 of *native* code speeds (nevermind Java) here.
It's not this straightforward. Superscalar CPUs handle multiple
instructions concurrently.
I can't give you concrete numbers on the paralellism of current processors,
but had a lot of fun trying to saturate both pipelines on my first Pentium,
writing i386 assembly ages ago.
Isak
JIT usually needs some time to kick in (especially under -server).
Check if your JVM supports the following flag:
-XX:+PrintCompilation which should print JIT compilation details.
> Note 2: someone with access to disassembly/memory debugger/perf tools might
> be able to learn more by running these loops, Andy's Clojure, and Andy's C
> on their system. Disassemblies of the JIT-generated code for the Clojure
> code and of the compiled C code would be interesting to compare, in
> particular.
There was a good thread on this list some weeks ago which mentioned
another JVM flag:
-XX:+PrintOptoAssembly
The original thread:
http://groups.google.com/group/clojure/browse_thread/thread/314952431ec064b7?fwc=1
Hope this helps.
Cheers,
Daniel
There was a good thread on this list some weeks ago which mentioned
another JVM flag:
-XX:+PrintOptoAssembly
The original thread:
http://groups.google.com/group/clojure/browse_thread/thread/314952431ec064b7?fwc=1
Bugger, and I just read the instructions for building the thing on
windows [2]. Looks to be doable (I'd try cross-compiling on Linux with
VirtualBox [3]), but it's still non-trivial, and it's not sure whether
it'll work on a stock Sun JVM or if you need an OpenJDK build
(backported JDK 6 or work in progress JDK 7). :(
[2] http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
[3] http://www.virtualbox.org/