Clojure performance tests and clojure a little slower than Java

1,699 views
Skip to first unread message

BerlinBrown

unread,
Jul 27, 2009, 1:06:53 PM7/27/09
to Clojure
I was coming up with some performance tests for Clojure, going back
and forth between different JVM languages (JRuby, Scala) and mainly
looking at pure Java code. So far I have found that clojure is about
5-10 times as slow as comparable code in Clojure. Of course this is
before any optimizations.

Here is my question, do you have any code both Clojure and Java where
there is a one to one match between code where Clojure runs as fast as
Java. It could be anything.

Here is one example:
This is code from clojure/contrib to convert a list of data into a
frequency count map. Pretty standard stuff.

(defn frequencies
"Returns a map from distinct items in coll to the number of times
they appear."
[coll]
;;;;;;;;;
(reduce (fn [counts x]
(assoc counts x (inc (get counts x 0))))
{} coll))

(dotimes [x 4]
(println "i: " (int (Math/pow 10.0 x)))
(time
(dotimes [_ (int (Math/pow 10.0 x))]
(let [a (for [_ (range 1000)] (random-string 3))]
(frequencies a)))))

----------------------

public static Map frequencies(final List list) {
final Map map = new HashMap();
// Simple Box and then unbox the count as the value for this
map //
for (Iterator it = list.iterator(); it.hasNext(); ) {
final Object o = it.next();
final String s = o.toString();
Integer prev = (Integer) map.get(s);
if (prev == null) {
// Put a new value on th map
final Integer count = ONE;
map.put(s, count);
} else {
final Integer inc = new Integer(prev.intValue() + 1);
map.put(s, inc);
} // End of the if - else //
}
return map;
}
----------------------

Clojure Results (same machine 1.5 JDK)

i: 1
"Elapsed time: 51.657859 msecs"
i: 10
"Elapsed time: 212.568221 msecs"
i: 100
"Elapsed time: 1623.107702 msecs"
i: 1000
"Elapsed time: 16356.185166 msecs"
(used:2M/0M [2M,63M ])
Done
Performing simple garbage collection cooldown
(used:2M/0M [2M,63M ])
(used:2M/0M [2M,63M ])

-------------

Here are the Java results.

14.631606999999999 ms
4.828342999999999 ms
i: 10
Elapsed time: 9.803628 msecs
i: 100
Elapsed time: 97.562451 msecs
i: 1000
Elapsed time: 972.775771 msecs
(used:1M/0M [1M,63M ])
(used:1M/0M [1M,63M ])
(used:1M/0M [1M,63M ])

NOTE:!!! This is just one example. All my other examples, ended up
the same way. I hope you don't nitpick this one.

I know we should take performance tests with a grain a salt. But at
the same time, I feel we should at least try to measure the speed of
some of our code. I haven't found many speed tests out there on
clojure.

Aaron Cohen

unread,
Jul 27, 2009, 1:34:58 PM7/27/09
to clo...@googlegroups.com
One thing you need to do is define what you mean exactly when you say "Java vs Clojure."

In your example you are comparing clojure code vs java code but you are also comparing clojure data structures (PersistentMap) with traditional Java data structures (HashMap).  I'm not sure you meant to conflate the two.

Also, there have been plenty of threads that start off comparing java and clojure performance and eventually end up at near-equivalent performance after using type hints/coercion/etc.  It has become something of a FAQ.  I'd start with: http://groups.google.com/groups/search?q=group:clojure+performance&qt_s=Search+Groups

-- Aaron

Berlin Brown

unread,
Jul 27, 2009, 1:57:37 PM7/27/09
to Clojure
"One thing you need to do is define what you mean exactly when you say
"Java
vs Clojure." "

Like I said, I didn't want to get too focused on this particular
example. Is there code where I could run in Clojure and where I could
run in Java that would end up with the same result. And then I could
see the differences in performance. Also is there an example where
Clojure code is comparable.

For example, lets say the Euler problem number 1 wants you want to
return a result of 123.

The end result is 123, if I write code in Java and then in Clojure, I
want to see the runtimes of the different versions.

On Jul 27, 1:34 pm, Aaron Cohen <remled...@gmail.com> wrote:
> One thing you need to do is define what you mean exactly when you say "Java
> vs Clojure."
> In your example you are comparing clojure code vs java code but you are also
> comparing clojure data structures (PersistentMap) with traditional Java data
> structures (HashMap). I'm not sure you meant to conflate the two.
>
> Also, there have been plenty of threads that start off comparing java and
> clojure performance and eventually end up at near-equivalent performance
> after using type hints/coercion/etc. It has become something of a FAQ. I'd
> start with:http://groups.google.com/groups/search?q=group:clojure+performance&qt...
> <http://groups.google.com/groups/search?q=group:clojure+performance&qt...>
> -- Aaron

Berlin Brown

unread,
Jul 27, 2009, 2:00:04 PM7/27/09
to Clojure
Really? I haven't seen one snippet of Java code and Clojure code that
does similar thing where the runtime is equivalent.

Aaron Cohen

unread,
Jul 27, 2009, 2:15:34 PM7/27/09
to clo...@googlegroups.com

Berlin Brown

unread,
Jul 27, 2009, 2:19:25 PM7/27/09
to Clojure


On Jul 27, 2:15 pm, Aaron Cohen <remled...@gmail.com> wrote:
> http://groups.google.com/group/clojure/browse_thread/thread/7ab7c1e62...
> <http://groups.google.com/group/clojure/browse_thread/thread/7ab7c1e62...>http://groups.google.com/group/clojure/browse_thread/thread/6cd78dea8...
>
> <http://groups.google.com/group/clojure/browse_thread/thread/6cd78dea8...>For
> more examples, please just browse the first google link I gave you.
>
> -- Aaron
>
"Elapsed time: 60.273874 msecs"
"Elapsed time: 18.692337 msecs"
"Elapsed time: 12.910268 msecs"
"Elapsed time: 12.157825 msecs"
"Elapsed time: 11.790928 msecs"
nil

In janino java-compiled code:

"Elapsed time: 5.542431 msecs"
"Elapsed time: 8.884038 msecs"
"Elapsed time: 1.356255 msecs"
"Elapsed time: 1.276561 msecs"
"Elapsed time: 1.247018 msecs"
nil

In Clojure with primitives:

"Elapsed time: 21.794794 msecs"
"Elapsed time: 4.948466 msecs"
"Elapsed time: 1.01541 msecs"
"Elapsed time: 0.820752 msecs"
"Elapsed time: 0.792814 msecs"
nil

I think this was the only thread where I found a time comparison
between java and clojure, still no Java combined with Clojure code
though.

Thanks for trying though.

AndyF

unread,
Jul 27, 2009, 5:26:54 PM7/27/09
to Clojure
Berlin:

I've got a start at programs for solving two of the problems solved in
many other languages on "The Computer Language Benchmarks Game" web
site:

http://shootout.alioth.debian.org

In particular, the k-nucleotide problem has as a significant part of
its computation time a similar task to what you have done -- tallying
the number of times each unique item appears in a collection, using a
map (which most programs on the web site implement with a mutable hash
table, not a persistent map like in Clojure).

I also have adapted some solutions for calculating complex numbers in
the Mandelbrot set from a discussion in this group from several months
back, and addded a bit of code so the input/output match what is
expected for the Mandelbrot benchmark. I don't have Clojure
implementations for the other problems on the web site, yet (I'd be
happy to add them to my collection if you'd like to send them to me).
Here are some comparisons of run times on my iMac with 2.2 GHz Intel
Core 2 Duo, 2 GB RAM. I make no claims that these are the best
Clojure implementations that could be made, but I have done several
versions that were slower and/or didn't finish due to running out of
memory, before getting to the Clojure versions I have now. This is
intended to be viewed in a fixed width font. You can also look at the
RESULTS file in the zip/tar.gz files linked below, where you can get
the full programs and test inputs I used.

Times are real / user / sys on my iMac. real means elapsed time from
start to finish, which can be less than (user+sys) in the cases where
the program explicitly uses both processor cores.

| sbcl | perl | ghc | java | clj
-----------------------------------------------------
mand- | wrong | out of | 32.7 | 28.6 | 340.4
elbrot | output | mem | 59.3 | 54.4 | 350.5
| | (?) | 0.8 | 0.4 | 4.7

k-nuc- | 190.9 | 306.0 | 90.5 | 52.4 | 1677.6 (27m 57.6s)
leotide | 187.9 | 302.7 | 130.8 | 89.6 | 2245.1 (37m 25.1s)
| 2.4 | 1.9 | 4.6 | 1.8 | 24.2 ( 24.2s)

For these two programs at least, the Clojure implementations have a
total CPU time (the sum of the 2nd and 3rd numbers reported above) of
6.5 times more for Clojure vs. Java on the mandelbrot program, and 25
times more for Clojure vs. Java on the k-nucleotide program.

Here are links to zip and .tar.gz files containing the programs and
input files used to produce these results. They've been tested in Mac
OS X, but I suspect they ought to work on Linux with little or no
modification. Windows users would need to do a little more to run the
programs, but it shouldn't be difficult. Sorry, they are each about
75 Mbytes, primarily because a few of the sample input and expected
output files are quite large.

http://homepage.mac.com/jafingerhut/files/language-shootout.tar.gz
http://homepage.mac.com/jafingerhut/files/language-shootout.zip

For the k-nucleotide program, I think this may be a fundamental issue
with persistent implementations of maps / hash tables, but I'd be
happy to find a better implementation of them, if that would help
similar Clojure programs to speed up.  I haven't read Chris Okasaki's
book on functional data structures yet, but I suspect a good
understanding of its contents might lead to some improvements.  I've
read that Okasaki's book doesn't spell out an implementation for hash
tables, but leaves it as an exercise for the reader, so don't rush off
and buy the book expecting to have a quick job of translating some
pseudo-ML into Java.

I thought it was interesting that even the Haskell entry to the k-
nucleotide benchmark uses a *mutable* hash table (at least, I think
they are from the discussion on the Wiki page linked below -- my
Haskell knowledge isn't extensive enough to understand all of their
code).  I don't think that is idiomatic Haskell, but the people
writing the Haskell entry are apparently willing to forego pure
functional programming when they can get significantly better
performance from a mutable data structure.

http://www.haskell.org/haskellwiki/Shootout/Knucleotide

In terms of what instructions the processor is running every time a
mutable hash table is updated, it is roughly "calculate the hash
function, look up the corresponding entry in the hash table, compare
the keys for exactness and perhaps traverse more entries looking for
the exact match, then add 1 to the count when you find the matching
entry".

Clojure's persistent hash map is calculating the hash function, then I
believe the basic idea is looking that hash value up in a tree of
nodes, each of which is a Java array of 32 elements.  When it finds
the matching entry in that tree, doing the "assoc" to "modify" the
count of an entry involves copying that array of 32 elements, with one
element in the copy different than the original, and then going back
up the tree to the root, creating new copies of arrays of 32 elements,
each with one element different than the original, pointing at the new
copied child.  I'm sure there are special cases for small maps, but I
think that is the behavior for sufficiently large hash maps.  If so,
then it is pretty easy to see that the processor is doing
significantly more work for each map update, i.e. assoc call, than any
implementation that uses mutable hash tables.

Of course, Clojure makes it easy to call Java code implementing
mutable hash tables, if performance is all-important for such a task,
and you realize the issues with having a mutable data structure vs.
Clojure's persistent data structures.

Andy

Isaac Gouy

unread,
Jul 27, 2009, 9:22:56 PM7/27/09
to Clojure


On Jul 27, 2:26 pm, AndyF <andy_finger...@alum.wustl.edu> wrote:
-snip-
> I thought it was interesting that even the Haskell entry to the k-
> nucleotide benchmark uses a *mutable* hash table (at least, I think
> they are from the discussion on the Wiki page linked below -- my
> Haskell knowledge isn't extensive enough to understand all of their
> code).  I don't think that is idiomatic Haskell, but the people
> writing the Haskell entry are apparently willing to forego pure
> functional programming when they can get significantly better
> performance from a mutable data structure.


Perhaps they understood that the way to have a program accepted was to
have it do as asked - "update a hashtable of k-nucleotide keys"

http://shootout.alioth.debian.org/u32q/benchmark.php?test=knucleotide&lang=all&box=1#about

ataggart

unread,
Jul 28, 2009, 12:03:44 AM7/28/09
to Clojure


On Jul 27, 10:06 am, BerlinBrown <berlin.br...@gmail.com> wrote:
> So far I have found that clojure is about
> 5-10 times as slow as comparable code in Clojure.

I infer you mean clojure execution times are about 5-10 times longer
than in java. It depends on what you're doing, but that sounds
plausible.

Here's are some slides of a comparison between a number of JVM
languages:

http://www.slideshare.net/michael.galpin/performance-comparisons-of-dynamic-languages-on-the-java-virtual-machine?type=powerpoint

Note that in the "reversible" section, he has a loop in the clojure
code with unnecessary Integer object creation which ends up doubling
the execution time.

Berlin Brown

unread,
Jul 28, 2009, 8:09:22 AM7/28/09
to Clojure
Thanks AndyF, do you mind if I use some of your examples and put my
own spin on them. I was curious and want to run my own tests and see
if I get similar output?


On Jul 28, 12:03 am, ataggart <alex.tagg...@gmail.com> wrote:
> On Jul 27, 10:06 am, BerlinBrown <berlin.br...@gmail.com> wrote:
>
> > So far I have found that clojure is about
> > 5-10 times as slow as comparable code in Clojure.
>
> I infer you mean clojure execution times are about 5-10 times longer
> than in java.  It depends on what you're doing, but that sounds
> plausible.
>
> Here's are some slides of a comparison between a number of JVM
> languages:
>
> http://www.slideshare.net/michael.galpin/performance-comparisons-of-d...

AndyF

unread,
Jul 28, 2009, 11:52:40 AM7/28/09
to Clojure
Anybody is welcome to take the Clojure sources and do what they will
with them -- publish the original or modified code under their own
license, sell it, etc.

For the Perl, Common Lisp, Haskell, etc. sources, those are published
under this license. It isn't my code.

http://shootout.alioth.debian.org/u32q/miscfile.php?file=license&title=revised%20BSD%20license

Andy

fft1976

unread,
Jul 28, 2009, 2:37:32 PM7/28/09
to Clojure
Thanks, AndyF for writing the code! I'm glad someone's done a
comparison of idiomatic code instead of making unsubstantiated claims.

Berlin Brown

unread,
Jul 28, 2009, 11:26:48 PM7/28/09
to Clojure


On Jul 28, 2:37 pm, fft1976 <fft1...@gmail.com> wrote:
> Thanks, AndyF for writing the code! I'm glad someone's done a
> comparison of idiomatic code instead of making unsubstantiated claims.

Are you implying that my claims are unsubstantiated or non idiomatic.
I took the frequency code from clojure/contrib and it only takes up 3
lines.

fft1976

unread,
Jul 28, 2009, 11:55:57 PM7/28/09
to Clojure
On Jul 28, 8:26 pm, Berlin Brown <berlin.br...@gmail.com> wrote:
> On Jul 28, 2:37 pm, fft1976 <fft1...@gmail.com> wrote:
>
> > Thanks, AndyF for writing the code! I'm glad someone's done a
> > comparison of idiomatic code instead of making unsubstantiated claims.
>
> Are you implying that my claims are unsubstantiated or non idiomatic.

No

Andy Fingerhut

unread,
Jul 28, 2009, 10:47:02 PM7/28/09
to Clojure
I have added a script that uses the Java version of the benchmark
programs to generate the large files that were in the distribution
file I gave a link to earlier, so it is much smaller. I've also
published it on github and added a COPYING file that makes the
licenses more explicit (revised BSD for everything, some copyright by
me, the rest by the owner of the benchmark web site). You can get it
here:

git://github.com/jafingerhut/clojure-benchmarks.git

Again, submissions for new benchmark programs, or improvements to
existing ones, are welcome.

Thanks,
Andy


On Jul 28, 8:52 am, AndyF <andy_finger...@alum.wustl.edu> wrote:
> Anybody is welcome to take the Clojure sources and do what they will
> with them -- publish the original or modified code under their own
> license, sell it, etc.
>
> For the Perl, Common Lisp, Haskell, etc. sources, those are published
> under this license.  It isn't my code.
>
> http://shootout.alioth.debian.org/u32q/miscfile.php?file=license&titl...

André Thieme

unread,
Aug 6, 2009, 6:28:38 PM8/6/09
to Clojure
On 27 Jul., 23:26, AndyF <andy_finger...@alum.wustl.edu> wrote:

Hello Andy, could you please update the following table?

>
>         |  sbcl  |  perl  |   ghc  |  java |   clj
> -----------------------------------------------------
> mand-   | wrong  | out of |  32.7  |  28.6  | 340.4
> elbrot  | output | mem    |  59.3  |  54.4  | 350.5
>         |        | (?)    |   0.8  |   0.4  |   4.7
>
> k-nuc-  | 190.9  | 306.0  |  90.5  |  52.4  | 1677.6 (27m 57.6s)
> leotide | 187.9  | 302.7  | 130.8  |  89.6  | 2245.1 (37m 25.1s)
>         |   2.4  |   1.9  |   4.6  |   1.8  |   24.2 (    24.2s)


When I understand you correctly, you now use Transients
in those benchmarks, which improved their runtime behaviour.

Andy Fingerhut

unread,
Aug 6, 2009, 6:57:55 PM8/6/09
to Clojure
You are correct. I've updated that file:

http://github.com/jafingerhut/clojure-benchmarks/blob/bb9755bdeeccae84a9b09fbf34e45f6d45d4b627/RESULTS

I've attempted to always put the best Clojure program results I've
found so far on the same line with the run times for the programs in
other languages, and then sometimes I will also include the second
best Clojure results on a separate group of lines after that.

Thanks,
Andy

John Harrop

unread,
Aug 6, 2009, 9:49:38 PM8/6/09
to clo...@googlegroups.com
On Thu, Aug 6, 2009 at 6:57 PM, Andy Fingerhut <andy_fi...@alum.wustl.edu> wrote:
You are correct.  I've updated that file:

http://github.com/jafingerhut/clojure-benchmarks/blob/bb9755bdeeccae84a9b09fbf34e45f6d45d4b627/RESULTS

Could you post the Mandelbrot code you use? Because I know for a fact that Clojure can do FP calcs as fast as native C code, given the -server vm and a loop/recur with unboxed doubles, and my understanding of Mandelbrot is that it's just FP calcs.

Andy Fingerhut

unread,
Aug 7, 2009, 12:57:25 PM8/7/09
to Clojure

On Aug 6, 6:49 pm, John Harrop <jharrop...@gmail.com> wrote:
> On Thu, Aug 6, 2009 at 6:57 PM, Andy Fingerhut <
>
> andy_finger...@alum.wustl.edu> wrote:
> > You are correct.  I've updated that file:
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/bb9755bdeeccae8...
>
> Could you post the Mandelbrot code you use? Because I know for a fact that
> Clojure can do FP calcs as fast as native C code, given the -server vm and a
> loop/recur with unboxed doubles, and my understanding of Mandelbrot is that
> it's just FP calcs.

Here is the root of the clojure-benchmarks github directory tree.
They are all in there:

http://github.com/jafingerhut/clojure-benchmarks/tree/master

You can get specific files by browsing the directory tree on the web
site, or if you use git, you can git clone the whole thing and have a
local copy on your computer:

git clone git://github.com/jafingerhut/clojure-benchmarks.git

It is mandelbrot/mandelbrot.clj-1.clj. There is a
mandelbrot.clj-2.clj that is the same, except it uses my modified-pmap
to compute separate rows in parallel. I'm still investigating reasons
why parallelism isn't getting me much bang for the buck in the
"questions about pmap" thread before worrying performance turning more
complex programs using floats/doubles in parallel.

I did go back to the archives for a discussion in early April, 2009
when others on this list were discussing how to tweak the core of the
mandelbrot code (the function "dot" in my version), and I used the
fastest of those versions I could find, but I may have missed some
opportunities for speeding it up, and I definitely welcome any and all
improved versions. I'd recommend comparing the speed on your machine
vs. the Java version that is in the same directory as the Clojure
source.

Andy

John Harrop

unread,
Aug 7, 2009, 8:14:45 PM8/7/09
to clo...@googlegroups.com
Your core loop seems to be:

(loop [zr (double 0.0)
         zi (double 0.0)
         zr2 (double 0.0)
         zi2 (double 0.0)
         iterations-remaining iterations-remaining]
    (if (and (not (neg? iterations-remaining))
             (< (+ zr2 zi2) limit-square))
      (let [new-zi (double (+ (* (* f2 zr) zi) pi))
            new-zr (double (+ (- zr2 zi2) pr))
            new-zr2 (double (* new-zr new-zr))
            new-zi2 (double (* new-zi new-zi))]
        (recur new-zr new-zi new-zr2 new-zi2 (dec iterations-remaining)))

What I suggest is

(loop [zr (double 0.0)
       zi (double 0.0)
       i (int (inc iterations-remaining))]
  (let [zr2 (* zr zr)
        zi2 (* zi zi)]
    (if (and (not (= 0 i)) (< (+ zr2 zi2 limit-square)))
       (recur (+ (- zr2 zi2) pr) (+ (* (* f2 zr) zi) pi) (unchecked-inc i))
       (whatever...))))

* Same calculations
* Less items in recur rebind
* Counter is also primitive

(untested, may need slight tweakage)

Andy Fingerhut

unread,
Aug 7, 2009, 9:19:24 PM8/7/09
to Clojure
Needed unchecked-dec in place of unchecked-inc, and I used (zero? i)
instead of (= 0 i) (not sure if that makes much difference), and the
results are much better -- the time went down to a little less than
half of what it was before.

http://github.com/jafingerhut/clojure-benchmarks/blob/fe10ef25ec17b77dd03f6d1ccff4f35273764f3b/RESULTS

I also tried it with 4 parallel threads on my 2-core machine, and this
problem parallelizes significantly better than the simpler problem in
the "questions about pmap" discussion thread -- nearly half the
elapsed time, and only 6% more total CPU time for the 4-thread version
vs. the sequential version.

Thanks!
Andy

Mark Engelberg

unread,
Aug 8, 2009, 5:23:45 AM8/8/09
to clo...@googlegroups.com
On Fri, Aug 7, 2009 at 5:14 PM, John Harrop<jharr...@gmail.com> wrote:
>     (if (and (not (= 0 i)) (< (+ zr2 zi2 limit-square)))

I believe that (zero? i) is faster than (= 0 i).

John Harrop

unread,
Aug 8, 2009, 2:50:03 AM8/8/09
to clo...@googlegroups.com
On Fri, Aug 7, 2009 at 9:19 PM, Andy Fingerhut <andy_fi...@alum.wustl.edu> wrote:
> What I suggest is
>
> (loop [zr (double 0.0)
>        zi (double 0.0)
>        i (int (inc iterations-remaining))]
>   (let [zr2 (* zr zr)
>         zi2 (* zi zi)]
>     (if (and (not (= 0 i)) (< (+ zr2 zi2 limit-square)))
>        (recur (+ (- zr2 zi2) pr) (+ (* (* f2 zr) zi) pi) (unchecked-inc i))
>        (whatever...))))
>
> * Same calculations
> * Less items in recur rebind
> * Counter is also primitive
>
> (untested, may need slight tweakage)

Needed unchecked-dec in place of unchecked-inc, and I used (zero? i)
instead of (= 0 i) (not sure if that makes much difference), and the
results are much better -- the time went down to a little less than
half of what it was before.

Try (= 0 i) or whatever. Since zero? is a function it's probably boxing the integer, unless it's a definline rather than a defn.

http://github.com/jafingerhut/clojure-benchmarks/blob/fe10ef25ec17b77dd03f6d1ccff4f35273764f3b/RESULTS

Why post a link to a blank web page?
 
Thanks!
Andy

You're welcome. 

John Harrop

unread,
Aug 8, 2009, 5:16:48 PM8/8/09
to clo...@googlegroups.com
On primitive ints? Have you tested it? 

Andy Fingerhut

unread,
Aug 9, 2009, 3:06:24 AM8/9/09
to Clojure
On Aug 8, 2:16 pm, John Harrop <jharrop...@gmail.com> wrote:
> On Sat, Aug 8, 2009 at 5:23 AM, Mark Engelberg <mark.engelb...@gmail.com>wrote:
>
>
>
> > On Fri, Aug 7, 2009 at 5:14 PM, John Harrop<jharrop...@gmail.com> wrote:
> > >     (if (and (not (= 0 i)) (< (+ zr2 zi2 limit-square)))
>
> > I believe that (zero? i) is faster than (= 0 i).
>
> On primitive ints? Have you tested it?

I have, now, at your suggestion. Here are the results on my iMac,
2.16 GHz Intel Core 2 Duo, Mac OS X 10.5.8, Java version:

% java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)

All are for the mandelbrot.clj-3.clj program here:

http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/mandelbrot/mandelbrot.clj-3.clj

(By the way, these links should not be to blank pages, as you
experienced earlier. I can follow them when reading in the Clojure
group web page, or in my email client. Perhaps the software you are
using to read these messages is truncating the URLs, and thus they are
not going to the intended place? Sorry they are so long, which tends
to cause these kinds of issues.)

I did two runs for each version, with the only difference between them
being replacing the (zero? i) expression in function 'dot' with a
different expression, as indicated below. (zero? i) is a clear winner
in reducing run time.

If it makes a difference, I'm also using latest github clojure and
clojure-contrib as of about August 6, 2009. Here is what the REPL
says about the definition of zero?

% clj
Clojure 1.1.0-alpha-SNAPSHOT
user=> (use 'clojure.contrib.repl-utils)
nil
user=> (source zero?)
(defn zero?
"Returns true if num is zero, else false"
{:tag Boolean
:inline (fn [x] `(. clojure.lang.Numbers (isZero ~x)))}
[x] (. clojure.lang.Numbers (isZero x)))



using (zero? i) in dot:

real 2m38.378s
user 2m40.601s
sys 0m1.875s

real 2m30.614s
user 2m33.081s
sys 0m1.853s

using (= 0 i) in dot:

real 3m18.556s
user 3m20.334s
sys 0m2.060s

real 3m20.425s
user 3m23.392s
sys 0m1.734s

using (== 0 i) in dot:

real 3m21.701s
user 3m24.043s
sys 0m1.982s

real 3m22.441s
user 3m24.420s
sys 0m2.013s

John Harrop

unread,
Aug 9, 2009, 3:37:18 AM8/9/09
to clo...@googlegroups.com
On Sun, Aug 9, 2009 at 3:06 AM, Andy Fingerhut <andy_fi...@alum.wustl.edu> wrote:
I did two runs for each version, with the only difference between them
being replacing the (zero? i) expression in function 'dot' with a
different expression, as indicated below.  (zero? i) is a clear winner
in reducing run time.

That's very surprising. Were you using -server in the vm options? It seems to me that (= 0 i) for a primitive int i ought to be very fast, compared to a Java method call that presumably takes a Number (and so would box the int).

Rich Hickey

unread,
Aug 9, 2009, 9:41:33 AM8/9/09
to Clojure


On Aug 9, 3:37 am, John Harrop <jharrop...@gmail.com> wrote:
> On Sun, Aug 9, 2009 at 3:06 AM, Andy Fingerhut <
>
That's why speculating is a waste of time.

zero? is preferred.

Rich

John Harrop

unread,
Aug 12, 2009, 1:00:12 AM8/12/09
to clo...@googlegroups.com
I've got some interesting new benchmark results here.

user=> (time
         (let [m (int 100000000) b (double 4.0) x0 (double -0.2) y0 (double 0.1) t (double 2.0)]
           (loop [x (double 0.0) y (double 0.0) i (int 0)]
             (if (< i m)
               (let [x2 (* x x) y2 (* y y)]
                 (if (< (+ x2 y2) b)
                   (recur (+ (- x2 y2) x0) (+ (* t (* x y)) y0) (unchecked-inc i))
                 i))
               i))))
"Elapsed time: 875.36796 msecs"
100000000

A very straightforward version, and 875.36796ms/100000000 = 8.7536796ns. This is on a 2.5GHz machine, so that's only about 22 native instructions per iteration. The theoretical minimum is over 1/2 of that:

Four fmuls
Three fadds and an fsub
A floating point comparison
An integer add and an integer comparison
A branch back to the start of the loop

So we're within a factor of 2 of *native* code speeds (nevermind Java) here.

user=> (time
         (let [m (int 100000000) b (double 4.0) x0 (double -0.2) y0 (double 0.1) t (double 2.0)]
           (loop [x (double 0.0) y (double 0.0) i m]
             (if (> i 0)
               (let [x2 (* x x) y2 (* y y)]
                 (if (< (+ x2 y2) b)
                   (recur (+ (- x2 y2) x0) (+ (* t (* x y)) y0) (unchecked-dec i))
                 i))
               i))))
"Elapsed time: 3418.8542 msecs"
0

Shockingly, reversing the count-up to a count-down and not changing anything else at all makes things much, MUCH worse, about four times slower.

user=> (time
         (let [m (int 100000000) b (double 4.0) x0 (double -0.2) y0 (double 0.1) t (double 2.0)]
           (loop [x (double 0.0) y (double 0.0) i m]
             (if (zero? i)
               i
               (let [x2 (* x x) y2 (* y y)]
                 (if (< (+ x2 y2) b)
                   (recur (+ (- x2 y2) x0) (+ (* t (* x y)) y0) (unchecked-dec i))
                 i))))))
"Elapsed time: 874.9564 msecs"
0

The original, at-least-half-of-C speed again.

It seems like < and zero? are as fast as each other and > much slower on primitive ints.

user=> ^#'<
{:ns #<Namespace clojure.core>, :name <, :file "clojure/core.clj", :line 589, :arglists ([x] [x y] [x y & more]), :inline-arities #{2}, :inline #<core$fn__3363 clojure.core$fn__3363@b655a>, :doc "Returns non-nil if nums are in monotonically increasing order,\n  otherwise false."}
user=> ^#'zero?
{:ns #<Namespace clojure.core>, :name zero?, :file "clojure/core.clj", :line 742, :arglists ([x]), :inline #<core$fn__3485 clojure.core$fn__3485@7787a5>, :tag java.lang.Boolean, :doc "Returns true if num is zero, else false"}
user=> ^#'>
{:ns #<Namespace clojure.core>, :name >, :file "clojure/core.clj", :line 617, :arglists ([x] [x y] [x y & more]), :inline-arities #{2}, :inline #<core$fn__3379 clojure.core$fn__3379@51e67c>, :doc "Returns non-nil if nums are in monotonically decreasing order,\n  otherwise false."}

It isn't that > isn't inlined, either.

I think the inline implemenation #<core$fn__3379 clojure.core$fn__3379@51e67c> of > needs to be investigated for why it is apparently vastly slower than that of <.

In the meantime, I'm not sure why Andy's code is still 3x slower than Java, while mine runs at more than half the speed of native C. Perhaps it's a difference in our JVMs or hardware. I'm using 1.6.0u13 -server on a 2.5GHz Intel CPU. If Andy's using a different JVM, not using -server, or using a CPU with fewer registers (to run as fast as it does my loop must be running entirely in CPU registers -- there's not enough "slack time" between the 5ns minimum to execute the calculations at 2.5GHz and the 9ns actual time to make a round trip to my 7ns RAM -- but on another CPU arch with fewer registers the loop might not fit in the registers. The bigger difference between C and Clojure on Andy's system could then result if the JIT is making it less register-efficient (using say one or two more) than hand-tuned assembly or a good C optimizer, and this is making the difference on hardware without as many registers as my Intel chip. Even a single 7ns memory fetch delay, if not pipelined with computations not dependent on it, would significantly degrade the speed of this tight loop on fast hardware.

Another possibility is pipelining differences. On my CPU it might be memory-fetching something every loop but running some of the calculations during this fetch, so the fetch seems to only take 4ns instead of 7. Maybe Andy's CPU doesn't do as good a job. Yet another is that the fetch is from the L1 cache and takes only 4ns on mine, and longer on Andy's (L1 cache speed difference). (Again, Andy's C vs. Java speed difference would need explaining as the C being more efficient in some way, perhaps in registers used.) Yet another thought that occurs to me is a difference in Clojure versions. I'm using 1.0; if he's using bleeding-edge and there was a performance regression in one of the operations used in the loop (or in loop/recur itself) that would do it. Yet another is that Andy's C code is actually faster than the "theoretical maximum" by using SIMD instructions that the JIT fails to use. (The (* x x) and (* y y) can probably be parallelized with 3DNOW/MMX, and the (+ (...) x0) and (+ (...) y0) likewise.

So, differences in hardware (registers, cache speed, pipeline stalls, or more than one of those), Clojure version, or JVM version could account for the difference, or the C code using SIMD and the JIT-generated code failing to do so.

Another way SIMD or other such instructions could matter is if my Java version's JIT exploits them and Andy's does not (this would be another JVM version associated case). Or, Sun's JIT exploits only one of 3DNOW and MMX, the C code uses the other, and Andy's CPU only supports the other; since my CPU supports both, that could explain the differences.

It would help if Andy posted the results of evaluating the above three expressions at his REPL, and also posted his CPU speed in GHz, or just the results of using those to calculate the number of instruction cycles needed for each of the three expressions.

If the estimated number of instruction cycles is close to 22 for the < and zero? loops, then it's the C code that's astonishingly fast, probably due to SIMD, or else Andy's version of the loop differs from mine in some subtle but performance-impacting way. To distinguish the latter two cases, Andy's loop can be timed by wrapping in "time" and its number of cycles per iteration estimated; if that estimate is also close to 22, the C blazes, else Andy's loop is somehow slower than mine.

If the estimated number of instruction cycles is much more than 22 (say, more than 25) then Andy's system (CPU, JVM, Clojure version) is the source of the performance difference.

Note: the expressions should be run three or four times. The first two or three timings will be longer than the later ones. (JIT?) Run until the times are consistent and at least three repetitions have been run in rapid succession.

Note 2: someone with access to disassembly/memory debugger/perf tools might be able to learn more by running these loops, Andy's Clojure, and Andy's C on their system. Disassemblies of the JIT-generated code for the Clojure code and of the compiled C code would be interesting to compare, in particular.

Bradbev

unread,
Aug 12, 2009, 2:06:16 AM8/12/09
to Clojure
On Jul 28, 7:47 pm, Andy Fingerhut <andy_finger...@alum.wustl.edu>
wrote:
> I have added a script that uses the Java version of the benchmark
> programs to generate the large files that were in the distribution
> file I gave a link to earlier, so it is much smaller.  I've also
> published it on github and added a COPYING file that makes the
> licenses more explicit (revised BSD for everything, some copyright by
> me, the rest by the owner of the benchmark web site).  You can get it
> here:
>
> git://github.com/jafingerhut/clojure-benchmarks.git
>
> Again, submissions for new benchmark programs, or improvements to
> existing ones, are welcome.
>
> Thanks,
> Andy
Hi Andy,
I think I may have found a bug in your code. The calculation for
magnitude is
mag (/ (/ delta-t dist-squared) dist)
In the Java code for n-body it is
double mag = dt / (dSquared * distance);
Which has the nice effect of removing an expensive divide from the
inner loop. Doing that & manually inlining the functions in advance!
gives me about ~5% faster numbers.
Code is at http://clojure.pastebin.com/m46d57338

Cheers,
Brad

Mike Hinchey

unread,
Aug 12, 2009, 3:07:40 AM8/12/09
to clo...@googlegroups.com
John, this will speed up to the same as the others if you let the 0.  IIRC from looking at the bytecode before, the 0 is an Integer not an int.

user> (time
         (let [m (int 100000000) b (double 4.0) x0 (double -0.2) y0 (double 0.1) t (double 2.0) i0 (int 0)]

           (loop [x (double 0.0) y (double 0.0) i m]
             (if (> i i0)

               (let [x2 (* x x) y2 (* y y)]
                 (if (< (+ x2 y2) b)
                   (recur (+ (- x2 y2) x0) (+ (* t (* x y)) y0) (unchecked-dec i))
                 i))
               i))))
"Elapsed time: 881.387889 msecs"
0

-Mike

Isak Hansen

unread,
Aug 12, 2009, 4:58:59 AM8/12/09
to clo...@googlegroups.com
On Wed, Aug 12, 2009 at 7:00 AM, John Harrop<jharr...@gmail.com> wrote:

> A very straightforward version, and 875.36796ms/100000000 = 8.7536796ns.
> This is on a 2.5GHz machine, so that's only about 22 native instructions per
> iteration. The theoretical minimum is over 1/2 of that:
> Four fmuls
> Three fadds and an fsub
> A floating point comparison
> An integer add and an integer comparison
> A branch back to the start of the loop
> So we're within a factor of 2 of *native* code speeds (nevermind Java) here.

It's not this straightforward. Superscalar CPUs handle multiple
instructions concurrently.

I can't give you concrete numbers on the paralellism of current processors,
but had a lot of fun trying to saturate both pipelines on my first Pentium,
writing i386 assembly ages ago.


Isak

Daniel

unread,
Aug 12, 2009, 4:54:23 AM8/12/09
to clo...@googlegroups.com
On Wed, Aug 12, 2009 at 12:00 PM, John Harrop<jharr...@gmail.com> wrote:
> Note: the expressions should be run three or four times. The first two or
> three timings will be longer than the later ones. (JIT?) Run until the times
> are consistent and at least three repetitions have been run in rapid
> succession.

JIT usually needs some time to kick in (especially under -server).
Check if your JVM supports the following flag:
-XX:+PrintCompilation which should print JIT compilation details.

> Note 2: someone with access to disassembly/memory debugger/perf tools might
> be able to learn more by running these loops, Andy's Clojure, and Andy's C
> on their system. Disassemblies of the JIT-generated code for the Clojure
> code and of the compiled C code would be interesting to compare, in
> particular.

There was a good thread on this list some weeks ago which mentioned
another JVM flag:
-XX:+PrintOptoAssembly

The original thread:
http://groups.google.com/group/clojure/browse_thread/thread/314952431ec064b7?fwc=1

Hope this helps.

Cheers,
Daniel

Glen Stampoultzis

unread,
Aug 12, 2009, 11:40:05 PM8/12/09
to clo...@googlegroups.com
There was a good thread on this list some weeks ago which mentioned
another JVM flag:
-XX:+PrintOptoAssembly

The original thread:
http://groups.google.com/group/clojure/browse_thread/thread/314952431ec064b7?fwc=1


There's some more information about it at [1].  It looks like you need a plugin to get it to work.  I'm on Windows and unfortunately for Windows there doesn't seem to be a prebuilt binary for this platform. 

[1] http://wikis.sun.com/display/HotSpotInternals/PrintAssembly


Daniel

unread,
Aug 13, 2009, 4:48:56 AM8/13/09
to clo...@googlegroups.com

Bugger, and I just read the instructions for building the thing on
windows [2]. Looks to be doable (I'd try cross-compiling on Linux with
VirtualBox [3]), but it's still non-trivial, and it's not sure whether
it'll work on a stock Sun JVM or if you need an OpenJDK build
(backported JDK 6 or work in progress JDK 7). :(

[2] http://hg.openjdk.java.net/jdk7/hotspot/hotspot/file/tip/src/share/tools/hsdis/README
[3] http://www.virtualbox.org/

Reply all
Reply to author
Forward
0 new messages