Data Access Layer design

88 views
Skip to first unread message

michaelr524

unread,
Jun 24, 2010, 3:38:47 AM6/24/10
to Clojure
Hi,

I've started developing a web application using Compojure. I use
CouchDB for storing data.
Currently I've just created a dal.clj file to contain all the CRUD
functions.
I might later decide that I want to switch to other Database so I want
to make the ground ready for this.

So my question is, how would you design a DAL in Clojure?
Can protocols be exploited here?

Thanks
Michael

Heinz N. Gies

unread,
Jun 24, 2010, 5:08:55 PM6/24/10
to clo...@googlegroups.com
So out of curiosity I did some benchmarking of the new equal branch and wanted to see how much I can get out of clojure if I push the limits. Now that said I know equal isn't done yet but I figured it won't hurt. Bad news first, despite all my attempts clojure was still slower then optimized scala code (of cause this is a purely hypothetical benchmark and that scala is way closer to java gives it a huge advantage when it coms to that).

The benchmark I did was the fannkuchen-redux benchmark the code I came up with was the following:

http://github.com/Licenser/clj-shootout/blob/master/fannkuchen/src/profile/fannkuchen.clj

the scala version I used is from the alioth shootout:

http://github.com/Licenser/clj-shootout/blob/master/fannkuchen/scala/fannkuchen.scala

For the clojure version, I tried to get rid of anything that isn't very close to java. All data ist stored in mutable arrays, all in all I use three and no extra data is allocated or tossed away for this. I tried to make everything I could to a macro (removing overhead for function calls), also I tried to use unchecked and primitive math wherever possible. The code is not pretty and certainly not idiomatic but it is somewhat fast, I managed to get things to a point where my clojure version took only 3.5 times longer then the scala version (I say only since the most idiomatic version was beyond compare so it was just beautiful!).

After this I did some profiling - eevar will love me for that - and I noticed one thing that stood out. intCast is called a gazillion times (in a run of about 4 seconds it was called ca. 150 million times) that IS a lot and it took some time (more then 10% of the over all execution time). That stroke me as odd, so I did some diging and found an interesting point:

Java arrays use int's as index - which since long is the primitive causes every single array access to be typecast via the intCast(long) call which is expensive since it does range checking. Now since clojure is open source - yay - I changed the aget java code to take a long and do a unckeded typecast as in '(int)i' when accessing the array and made my own aget and aset methods that did not typecheck (since I knew what it was called with). This bought about 10% execution time.

So what I'd suggest is to add some unckecked-aget methods that take a long as index and don't typecast, for array intensive code this is a 10% speedup which isn't that bad. Other then that I am still looking why clojure is slower by such a magnitude.

Also it would be great if there were a way to give functions signatures if desired, as in having (aget Object) that runs (int ...) on Object and (aget int/long) that does just pass the value w/o casting.

Also the profiles show that despite my best efforts there is still a large number of calls made for Number casts, I'm not sure where they are from, likely from clojures core interiors.


Here are some of the results I got: http://grab.by/57x0

and some profiling results: http://grab.by/57vO (look at the time taken for intCast)

Regards,
Heinz

David Nolen

unread,
Jun 24, 2010, 5:17:53 PM6/24/10
to clo...@googlegroups.com
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Don't have time to dive into this right now, but all I can say is now you are starting to have an idea what the "other side" looks like. The kind of hoop jumping you have to do get within 3x of Scala is absurd.

This style of coding is largely unnecessary with the latest equals branch. However in order to be competitive you're going to have to be a bit more clever with your combination of deftype/record, protocols, statics:

* Data locality, solved by fields with deftype/record (also allows to define a bunch of method/fns that have access to the field type to avoid cast)
* primitive return solved by static
* macro inlining is unnecessary with static

David Nolen

unread,
Jun 24, 2010, 5:28:30 PM6/24/10
to clo...@googlegroups.com
Couple other notes:

* You can use definterface to define type-hinted methods for deftype/record, you can't do that with protocols. 
* A good example of how fast protocols can be is http://github.com/ztellman/cantor.
* Protocols and static cannot interact at the moment

 

Michał Marczyk

unread,
Jun 24, 2010, 5:35:49 PM6/24/10
to clo...@googlegroups.com
On 24 June 2010 23:17, David Nolen <dnolen...@gmail.com> wrote:
> Don't have time to dive into this right now, but all I can say is now you
> are starting to have an idea what the "other side" looks like. The kind of
> hoop jumping you have to do get within 3x of Scala is absurd.

I have to admit this is true...

(With the above realisation, I feel like I'm biting off a chunk of a
cake which I really wanted to keep just a short while ago.)

Sincerely,
Michał

David Nolen

unread,
Jun 24, 2010, 5:46:02 PM6/24/10
to clo...@googlegroups.com
If you want some ideas: http://gist.github.com/452032

This was posted earlier in the ML in an un-optimized Clojure form as taking 98secs while the Java version took about 16ms on the OP's machine. With Rich's latest changes this optimized Clojure code runs in about 20-25ms on my i7 laptop.

David

Mark Engelberg

unread,
Jun 24, 2010, 10:50:25 PM6/24/10
to clo...@googlegroups.com
When exactly did people start expecting Clojure to be as fast as Java
and/or Scala?

I seem to recall in one of the original Clojure videos, Rich talked
about the relationship between Clojure and Java. There's a long
history of C programmers dropping down to assembly, Python programmers
dropping down to C, and so on. He explained that Java was Clojure's
"assembly". You use Clojure to write the logic that is too complex to
code compactly in Java. You write Java to code the low-level bits you
can't do in Clojure.

I must admit I was surprised by Rich's recent statement that Clojure
is not useful to him if he can't write high performance code in it,
because he "only writes the hard bits". To me this seemed like a
change of perspective from the days when it was accepted that the most
performance-critical parts would likely be implemented in Java;
perhaps the Clojure-in-Clojure project has gotten him more focused on
replacing Java entirely with Clojure. He says we wouldn't like a
Clojure that was written in Clojure as it currently stands, because it
would be too slow. That's probably true, but I never expected it to
be possible. Up until this recent discussion, I believed that the
main focus for Clojure-in-clojure was to find ways to move more parts
into Clojure without affecting performance, in order to achieve better
portability to other hosts. I didn't think it would ever be possible
to move all of it over.

I agree that it has been extraordinarily difficult to write
high-performance numerical code in Clojure. I agree that everyone
benefits if this gets easier. I want faster code as much as anyone.
But I'm worried that people's goals for static-typed-performance might
be unrealistic. I'm delighted that Clojure already performs so well
for a dynamically typed language, thanks to Rich's hard work
optimizing the Java code that underlies Clojure so everything can be
as fast as possible, but it seems that a lot of people here will be
disappointed if Clojure can't compete with Java/Scala on the
benchmarks.

I'm glad someone is starting to tackle these benchmarks. I think it's
especially interesting to see how fast the code can be when written
idiomatically, so I hope we'll see more of these results as well.
Once you start using mutable arrays and type hinting every single
variable, why aren't you just using Java?

With respect to this particular benchmark, I don't think it will be
possible to get idiomatic code in the same ballpark as Java, because
if you use Clojure's vectors to store the permuted values, they will
be boxed. Unless Clojure starts having primitive vectors, I don't see
how it's possible to get around this. But I'd love to be proven
wrong.

Wilson MacGyver

unread,
Jun 25, 2010, 12:33:13 AM6/25/10
to clo...@googlegroups.com
On Jun 24, 2010, at 10:50 PM, Mark Engelberg <mark.en...@gmail.com> wrote:

> When exactly did people start expecting Clojure to be as fast as Java
> and/or Scala?
>

One of the earlier talk/video, the claim was clojure is between 1x to 3x of java
performance.

Fast math performance was touched on here also
http://www.infoq.com/interviews/hickey-clojure

And that's pre 1.0

Richard Newman

unread,
Jun 25, 2010, 1:16:54 AM6/25/10
to clo...@googlegroups.com
Mark,

I don't disagree with your message as a whole, but:

> Once you start using mutable arrays and type hinting every single
> variable, why aren't you just using Java?

The argument I've heard is this: there are two ways to get a fast
program in a slow-by-default language.

The first is Python's way: write the whole thing in Python (high-
level), find out the parts that are slow, then rewrite those in a
different language (C). Now you need a C compiler, you need to build
separately on each platform, and so on.

The second is Common Lisp's way: write the whole thing in un-annotated
CL (high-level), find out the parts that are slow, then add
annotations or make representational changes (lists => vectors, alists
=> hash-maps) to allow the compiler to make them fast.

The argument is that being able to use *the same language* to write
anywhere on the spectrum from slow/succinct to fast/verbose is useful:
no complex combined build system, no bridging between environments,
iterative development, easier profiling, your code always works, etc.

If I can take one little part of my Clojure program and make it ugly
like Java to get it as fast as Java, it's better than having to
actually _use_ Java in a heterogeneous project.

-R

David Nolen

unread,
Jun 25, 2010, 1:31:14 AM6/25/10
to clo...@googlegroups.com
On Thu, Jun 24, 2010 at 10:50 PM, Mark Engelberg <mark.en...@gmail.com> wrote:
With respect to this particular benchmark, I don't think it will be
possible to get idiomatic code in the same ballpark as Java, because
if you use Clojure's vectors to store the permuted values, they will
be boxed.  Unless Clojure starts having primitive vectors, I don't see
how it's possible to get around this.  But I'd love to be proven
wrong.

gvec already holds primitives. What we're missing are fns that can take advantage of that.

David 

Heinz N. Gies

unread,
Jun 25, 2010, 5:10:51 AM6/25/10
to clo...@googlegroups.com

On Jun 24, 2010, at 23:17 , David Nolen wrote:

> Don't have time to dive into this right now, but all I can say is now you are starting to have an idea what the "other side" looks like. The kind of hoop jumping you have to do get within 3x of Scala is absurd.

Yes to see what the "other side" looks like was one of the goals, and it is abselutly insane I agree, so to be mean, adding a few (long) or +' in there won't make it much worst :P but that isn't the point I wanted to see how far one can go.

> This style of coding is largely unnecessary with the latest equals branch. However in order to be competitive you're going to have to be a bit more clever with your combination of deftype/record, protocols, statics:
>
> * Data locality, solved by fields with deftype/record (also allows to define a bunch of method/fns that have access to the field type to avoid cast)
> * primitive return solved by static
> * macro inlining is unnecessary with static

I took the aproach that was most likely to succeed to be fast and I figured the lower you go the faster you get, I am not entirely sure but I think that some using records and types instead of arrays would be slower then this implementation but I'll gladly be proven wrong :P.


On Jun 25, 2010, at 4:50 , Mark Engelberg wrote:

> When exactly did people start expecting Clojure to be as fast as Java
> and/or Scala?

Don't get me wrong I neither expect nor demand clojure to be as Fast as Java/Scala, this was or is a scientific experiment to see how fast it can be, scala is just a nice measuring pole especially since Rich mentioed that this changes in the equal branch bring us a bit closer to the point where people don't claim clojure to be that much slower.

> I'm glad someone is starting to tackle these benchmarks. I think it's
> especially interesting to see how fast the code can be when written
> idiomatically, so I hope we'll see more of these results as well.

> Once you start using mutable arrays and type hinting every single
> variable, why aren't you just using Java?

Because Java us ugly as hell, it's worst then the most horrible clojure code you could possibly write :P. (my 2 cents) also it would defeat the idea behind this test. The ideomatic (at least in my eyes) clojure code is also included in the tests but it is slower in a magnitude that it makes it un-benchmarkable for longer runs. On the other hand the ideomatic code nearly reads as the problem description, which is wonderful and I think in most cases I'd say 'screw speed and make it nice' ... actualy I saied that before :P ... but you get the point I think.

Regards,
heinz

Nicolas Oury

unread,
Jun 25, 2010, 6:19:53 AM6/25/10
to clo...@googlegroups.com
Have you tried manual copying of the perm array (as in the scala version)?
It is probably not much better or worse, but it would be nice to have the same algorithm than scala, for comparaison purpose.




--

Martin DeMello

unread,
Jun 25, 2010, 6:55:31 AM6/25/10
to clo...@googlegroups.com
On Fri, Jun 25, 2010 at 8:20 AM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> When exactly did people start expecting Clojure to be as fast as Java
> and/or Scala?
>
> I seem to recall in one of the original Clojure videos, Rich talked
> about the relationship between Clojure and Java.  There's a long
> history of C programmers dropping down to assembly, Python programmers
> dropping down to C, and so on.  He explained that Java was Clojure's
> "assembly".  You use Clojure to write the logic that is too complex to
> code compactly in Java.  You write Java to code the low-level bits you
> can't do in Clojure.

I've actually been thinking of learning scala to have something to
"drop down to" when clojure isn't fast enough. Anyone have experience
with this?

martin

Heinz N. Gies

unread,
Jun 25, 2010, 7:18:29 AM6/25/10
to clo...@googlegroups.com

On Jun 25, 2010, at 12:19 , Nicolas Oury wrote:

> Have you tried manual copying of the perm array (as in the scala version)?
> It is probably not much better or worse, but it would be nice to have the same algorithm than scala, for comparaison purpose.

Yes the original version did that, the impact of change was minimal but in the end I found it cleaner to use the low level system funciton.

Regards,
Heniz

Nicolas Oury

unread,
Jun 25, 2010, 8:45:25 AM6/25/10
to clo...@googlegroups.com
could you also give more info of the jvm/flags you use.
Especially I see you are on a mac where last time I checked the default was the client HotSpot.



--

j-g-faustus

unread,
Jun 26, 2010, 1:56:32 PM6/26/10
to Clojure
Possibly of interest here, although I've only tested it using Clojure
1.1:

I just did an implementation of the n-body problem from
http://shootout.alioth.debian.org/u32q/benchmark.php?test=nbody&lang=all

The fastest code I've managed so far is here:
http://github.com/j-g-faustus/Clojure-test-code/blob/master/shootout/nbody.clj

On my machine this is about 4x slower than the shootout Java
implementation. Using Java as the baseline and comparing my local
results to the shootout timings, it puts Clojure 1.1 on par with
Erlang, Go and OCaml.

On profiling I have a bunch of intCast(Object) and doubleCast(double)
totaling ~9% CPU time (no screenshot yet, sorry), I'll see if I can
eliminate them for a few percent improvement.

A bit more here:
http://stackoverflow.com/questions/3124344/clojure-number-crunching-performance
where I am also asking
* are there any obvious improvements to my code?
* do you consider 4x slower than Java and 10x quicker than Python/Ruby
representative for Clojure performance?

For the record, I think an order of magnitude quicker than Perl/Python/
Ruby is a quite decent result.
~4x slower than C, Java and Scala is a performance hit I can live with
in return for a more comfortable language.

If 1.2 gives even better performance, or similar performance with less
effort, so much the better.

Regards
jf

PS: If there are no obvious improvements, I'd like to submit this n-
body implementation to the shootout. Any objections? (Like "wait for
1.2"?)


On Jun 24, 11:08 pm, "Heinz N. Gies" <he...@licenser.net> wrote:
> The benchmark I did was the fannkuchen-redux benchmark the code I came up with was the following:
>
> http://github.com/Licenser/clj-shootout/blob/master/fannkuchen/src/pr...
>
> the scala version I used is from the alioth shootout:
>
> http://github.com/Licenser/clj-shootout/blob/master/fannkuchen/scala/...

Nicolas Oury

unread,
Jun 27, 2010, 12:03:37 PM6/27/10
to clo...@googlegroups.com
On Sat, Jun 26, 2010 at 6:56 PM, j-g-faustus <johannes...@gmail.com> wrote:

On my machine this is about 4x slower than the shootout Java
implementation. Using Java as the baseline and comparing my local
results to the shootout timings, it puts Clojure 1.1 on par with
Erlang, Go and OCaml.

On profiling I have a bunch of intCast(Object) and doubleCast(double)
totaling ~9% CPU time (no screenshot yet, sorry), I'll see if I can
eliminate them for a few percent improvement.

Which Hotspot and flags do you use?
You could be more idiomatic and probably faster with 1.2's (definterface Body-ish  (x[]..). x= ...)  and the like with type annotations (or better with protocols, but they have no annotations yet, I think)
 and (deftype Body [x y ...] Body-ish ....).
Object field access are a bit faster than array access on the jvm. (as a first try, you could mesure the difference with using the Body class from java and the main loop in clojure, to check where the 
difference comes from)

Best regards,

Nicolas.

j-g-faustus

unread,
Jun 27, 2010, 5:59:51 PM6/27/10
to Clojure
On Jun 27, 6:03 pm, Nicolas Oury <nicolas.o...@gmail.com> wrote:
> Which Hotspot and flags do you use?

Sun JVM 1.6.0 64 bit, the one that comes bundled with OS X 10.6. No
flags.
Specifying something like "-server -Xms256M" shaves off a couple of
seconds (3-4%), but I haven't looked very closely at the options.
Whatever JVM flag tweaks can be done for Clojure can be done for Java
too, and I'm primarily interested in relative performance, which I
suppose is roughly the same as long as I use the same flags for both.

Thanks for the tip on field access, I'll try that for comparison.

Definterface and deftype look promising, I'll try them out when I have
finished the 1.1 version.


Thanks for feedback,

jf

Aaron Cohen

unread,
Jun 28, 2010, 2:02:42 PM6/28/10
to clo...@googlegroups.com
Doing these tests on clojure 1.1, while self-enlightening, is kind of missing the point.

The current primitive work on master for 1.2 are trying to make optimizing more practical, possible and less ugly. It's well known that 1.1 optimization is ugly at best and often not completely successful.

Nicolas Oury

unread,
Jun 28, 2010, 2:12:37 PM6/28/10
to clo...@googlegroups.com
On Sun, Jun 27, 2010 at 10:59 PM, j-g-faustus <johannes...@gmail.com> wrote:
Whatever JVM flag tweaks can be done for Clojure can be done for Java
too, and I'm primarily interested in relative performance, which I
suppose is roughly the same as long as I use the same flags for both.

That's not exactly true. Clojure emits code that is made to be as fast as possible once hot using the right jvm.
(At least it was like that last time I checked.) 
Maybe someone that knows better the inside of the compiler could give a set of relevant options that helps for 
Clojure performance.
One of the best example of that I know of is the aget in clojure. Last time I checked, it did not get compiled to the bytecode to access
an array but to a static method in the runtime that access the array.
A "good" jvm inlines small static method. And so the access become as fast as java. But if the jvm is bad, it will affect clojure far more than 
java, because clojure relies more in the jvm (and rightly so, jvms are very good).

If what I say is still true (and I don't know the internals of clojure well enough to be sure), it is counter productive to optimize on a "bad" jvm:
 you could introduce some costs trying to remove some "fake" costs.

("good"/"bad" is not the right terminology, it is a shorthand for "a jvm for which Clojure is/is not optimised". Apologies if I have hurt a jvm's feelings.).
Nicolas.



j-g-faustus

unread,
Jun 28, 2010, 3:47:50 PM6/28/10
to Clojure
Well, it gives a baseline to compare 1.2 improvements against. In
terms of speed, convenience or readability.

j-g-faustus

unread,
Jun 29, 2010, 2:05:08 PM6/29/10
to Clojure
OK, I tried this. Object field access instead of arrays made a few
percent difference, but not enough to be significant.

Definterface and defprotocol, on the other hand, not only gave cleaner
code but was more than twice as fast. A huge win if you ask me :)

So summarizing this particular benchmark:
* 1.1 style optimization using primitive Java arrays peaks at ~4x
slower than Java.
* 1.2 style optimization using mutable primitive fields in a deftype
is only ~1.7x (70%) slower than Java.

Links:
* more detail including profiling snapshots, JVM version etc.
http://wiki.github.com/j-g-faustus/Clojure-test-code/
* 1.2 implementation:
http://github.com/j-g-faustus/Clojure-test-code/blob/master/shootout/nbody_type.clj


I haven't tried the new numeric branches, there seems to be a
sufficient number of people with opinions on those already :)

But I can add the observation that it is possible to write very fast
numeric code in the 1.2 master branch as it stands today. Possibly non-
idiomatically by using mutable fields, but still fast, still Clojure
and far cleaner than the 1.1 optimizations.

Thanks for the deftype tip.

Regards
jf

On Jun 27, 6:03 pm, Nicolas Oury <nicolas.o...@gmail.com> wrote:

David Nolen

unread,
Jun 29, 2010, 3:31:51 PM6/29/10
to clo...@googlegroups.com
On Tue, Jun 29, 2010 at 2:05 PM, j-g-faustus <johannes...@gmail.com> wrote:
OK, I tried this. Object field access instead of arrays made a few
percent difference, but not enough to be significant.

Definterface and defprotocol, on the other hand, not only gave cleaner
code but was more than twice as fast. A huge win if you ask me :)

So summarizing this particular benchmark:
* 1.1 style optimization using primitive Java arrays peaks at ~4x
slower than Java.
* 1.2 style optimization using mutable primitive fields in a deftype
is only ~1.7x (70%) slower than Java.

Links:
* more detail including profiling snapshots, JVM version etc.
 http://wiki.github.com/j-g-faustus/Clojure-test-code/
* 1.2 implementation:
 http://github.com/j-g-faustus/Clojure-test-code/blob/master/shootout/nbody_type.clj

I haven't tried the new numeric branches, there seems to be a
sufficient number of people with opinions on those already :)

You should give the latest equiv branch a shot and let us know. The gap should be a bit smaller since arithmetic operations won't box their results.

David

j-g-faustus

unread,
Jun 29, 2010, 2:50:49 PM6/29/10
to Clojure
On Jun 29, 8:05 pm, j-g-faustus <johannes.fries...@gmail.com> wrote:
> Definterface and defprotocol, on the other hand

Correction: definterface and deftype.

j-g-faustus

unread,
Jun 30, 2010, 10:19:50 AM6/30/10
to Clojure
Tried the equiv branch briefly: The "1.1 style" version is ~4%
quicker, but still ~4x slower than Java and ~2x slower than mutable
deftype.

But I found another issue: Array access apparently converts primitives
to boxed values at every access. This is perhaps because aget/aset is
a function and primitives cannot cross function boundaries?
That would explain the relative slowness of arrays.

Here is a test case http://gist.github.com/458669
And a profiler screenshot http://i589.photobucket.com/albums/ss334/j-g-faustus/profiling/array-test-50k.png

15% CPU time goes to Double.valueOf(double) in all-primitive array
access and another ~4% to intCast(int).

The number of calls to Double.valueOf(double) seems to suggest that it
is called only on aset, not on aget, though I can't think of any
reason how that could be.

Does anyone know more about this?


Regards
jf

Nicolas Oury

unread,
Jun 30, 2010, 11:20:19 AM6/30/10
to clo...@googlegroups.com
On Wed, Jun 30, 2010 at 3:19 PM, j-g-faustus <johannes...@gmail.com> wrote:

The number of calls to Double.valueOf(double) seems to suggest that it
is called only on aset, not on aget, though I can't think of any
reason how that could be.

Does anyone know more about this?

Do you use the client or server Hotspot?

David Nolen

unread,
Jun 30, 2010, 12:14:36 PM6/30/10
to clo...@googlegroups.com
On Wed, Jun 30, 2010 at 10:19 AM, j-g-faustus <johannes...@gmail.com> wrote:
Tried the equiv branch briefly: The "1.1 style" version is ~4%
quicker, but still ~4x slower than Java and ~2x slower than mutable
deftype.

But I found another issue: Array access apparently converts primitives
to boxed values at every access. This is perhaps because aget/aset is
a function and primitives cannot cross function boundaries?
That would explain the relative slowness of arrays.

Here is a test case http://gist.github.com/458669
And a profiler screenshot http://i589.photobucket.com/albums/ss334/j-g-faustus/profiling/array-test-50k.png

15% CPU time goes to Double.valueOf(double) in all-primitive array
access and another ~4% to intCast(int).

The number of calls to Double.valueOf(double) seems to suggest that it
is called only on aset, not on aget, though I can't think of any
reason how that could be.

Does anyone know more about this?


Regards
jf

On the equiv branch I don't see this at all. Also on the equiv branch most of your type hints inside the fn are unnecessary.

(defn ^:static test-double ^doubles [^long n]
  (let [len  5
        arr (double-array 5)
        n    n]
    (dotimes [_ n]
      (dotimes [i len]
        (dotimes [j (inc i)]
          (aset arr j 
                (+ 1.0 (- (aget arr i) (aget arr j)))))))
    arr))

David 
 

j-g-faustus

unread,
Jun 30, 2010, 12:14:03 PM6/30/10
to Clojure
$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)

Running the bechmark with -client or -server flag makes just a couple
of percent difference in timing.

On Jun 30, 5:20 pm, Nicolas Oury <nicolas.o...@gmail.com> wrote:

j-g-faustus

unread,
Jun 30, 2010, 12:20:51 PM6/30/10
to Clojure
OK, I'll try again. Thanks.

jf

On Jun 30, 6:14 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 10:19 AM, j-g-faustus
> <johannes.fries...@gmail.com>wrote:
>
>
>
>
>
> > Tried the equiv branch briefly: The "1.1 style" version is ~4%
> > quicker, but still ~4x slower than Java and ~2x slower than mutable
> > deftype.
>
> > But I found another issue: Array access apparently converts primitives
> > to boxed values at every access. This is perhaps because aget/aset is
> > a function and primitives cannot cross function boundaries?
> > That would explain the relative slowness of arrays.
>
> > Here is a test casehttp://gist.github.com/458669
> > And a profiler screenshot
> >http://i589.photobucket.com/albums/ss334/j-g-faustus/profiling/array-...

j-g-faustus

unread,
Jun 30, 2010, 9:27:54 PM6/30/10
to Clojure
I'm getting some form of boxing during array access, even on the equiv
branch.

Speaking of which: (type (+ 1N 1)) => clojure.lang.BigInt means that
I'm using the correct branch, right?

Here is a simple speed test comparing 3-element vectors using Java
arrays, immutable deftype and mutable deftype:
http://gist.github.com/459383

Using the equiv branch, I get
* Java arrays: 18s
* Immutable deftype: 10s
* Mutable deftype: 2s
* Plain Java: 2s

Here is a profiler snapshot for the array case, where
java.lang.Number.init(), java.lang.Double.init() and
java.lang.Double.valueOf() take a significant chunk of the CPU time:
http://i589.photobucket.com/albums/ss334/j-g-faustus/profiling/eqv-arrays.png

From a practical standpoint I guess it means that deftype is the way
to go if you want fast numbers today.
In this case even the immutable deftype was faster than mutable Java
arrays, so you can be fast, idiomatic and thread-safe simultaneously.

For the future, it would be nice if Java arrays were equally fast.
There is unchecked-inc-int and -long, would there be room for an
unchecked-aget/aset-int/long/double?


Regards
jf


On Jun 30, 6:14 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 10:19 AM, j-g-faustus
> <johannes.fries...@gmail.com>wrote:

Nicolas Oury

unread,
Jul 1, 2010, 10:29:28 AM7/1/10
to clo...@googlegroups.com
On Thu, Jul 1, 2010 at 2:27 AM, j-g-faustus <johannes...@gmail.com> wrote:
Using the equiv branch, I get
* Java arrays: 18s
* Immutable deftype: 10s
* Mutable deftype: 2s
* Plain Java: 2s

That's a very encouraging result! That proves that the equiv branch, and deftypes, can be as fast as java.

Could you test an array in Plain Java to compare what comes from arrays and what comes from array access in Clojure?

j-g-faustus

unread,
Jul 1, 2010, 11:17:09 AM7/1/10
to Clojure
I did:
"Java arrays 18s" is Java arrays in Clojure.
"Plain Java 2s" is Java arrays in Java.

Here's the Java code: http://gist.github.com/460063

That's what you meant, right?

I agree that it looks very good :)

"Mutable deftype" is a special case in that apart from the dotimes
loop counter all numeric operations are done internally in a single
mutable deftype instance, but it is still very good - in this scenario
there was close to zero overhead from using Clojure.



On Jul 1, 4:29 pm, Nicolas Oury <nicolas.o...@gmail.com> wrote:

Heinz N. Gies

unread,
Jul 1, 2010, 12:24:27 PM7/1/10
to clo...@googlegroups.com

I did:
"Java arrays 18s" is Java arrays in Clojure.
"Plain Java 2s" is Java arrays in Java.
One reason here is that clojures literals as 1 2 and 3 you use for array indexes are longs, the aget methods want int's so you've funny casting there which is slow for comparison I've done it on a private test thingy where I use longs as array indexes:

user=> (time-mtype)    
(5000000.00 10000000.00 15000000.01)
Bye from Clojure
"Elapsed time: 245.152 msecs"
nil
user=> (time-type)
(5000000.00 10000000.00 15000000.01)
Bye from Clojure
"Elapsed time: 1360.897 msecs"
nil
user=> (time-arr)
(5000000.00 10000000.00 15000000.01)
Bye from Clojure
"Elapsed time: 2029.323 msecs"
nil

Regards,
Heinz

Greg

unread,
Jul 1, 2010, 11:42:53 AM7/1/10
to clo...@googlegroups.com
Ooo... sorry for the side-question, but I noticed that your code doesn't seem to use coercions for primitives and uses type-hints instead.

I was just asking the other day on #clojure why Clojure had coercion functions at all and why type hints didn't work for primitives, does this mean 1.2 will get rid of coercions (or make them no longer necessary) and allow type-hints for primitives?

If so, that's really awesome. :-D

j-g-faustus

unread,
Jul 1, 2010, 1:01:39 PM7/1/10
to Clojure
On Jul 1, 6:24 pm, "Heinz N. Gies" <he...@licenser.net> wrote:
> One reason here is that clojures literals as 1 2 and 3 you use for array indexes are longs, the aget methods want int's

Agreed. If we can take the profiling snapshot I linked to at face
value, the boxing and casting adds up to ~40% of the CPU time.
Given that the speed difference between Java arrays in Java and in
Clojure is ~9x (900%), there are probably other factors here as well.

Just note that the REPL timings you are listing are for N=50 million.
My "18s vs 2s" timings are from the command line with N=500 million
and include JVM startup time. (So that I could compare it to plain
Java.)

But your private fix apparently got the relative speed difference down
to ~8x, which is an improvement.


Regards
jf

j-g-faustus

unread,
Jul 1, 2010, 1:23:08 PM7/1/10
to Clojure
On Jul 1, 5:42 pm, Greg <g...@kinostudios.com> wrote:
> Ooo... sorry for the side-question, but I noticed that your code doesn't seem to use coercions for primitives and uses type-hints instead.
>
> I was just asking the other day on #clojure why Clojure had coercion functions at all and why type hints didn't work for primitives, does this mean 1.2 will get rid of coercions (or make them no longer necessary) and allow type-hints for primitives?

Beats me, I'm just following the example from David Nolen earlier in
this thread :)

Seriously, I'm just starting out with Clojure (a few weeks) and this
is my way to get to know the language, so I'm not the right person to
ask. I may have made mistakes in my code, and will be happy to be
corrected.

Still, it looks like 1.2 may get type hints:

> ^long and ^double hints allowed on args
> other reference type hints allowed as well
> unlike for non-statics, these will be enforced.

From http://www.assembla.com/wiki/show/clojure/Enhanced_Primitive_Support

But note also the start of the page:
> This work is not promised to become part of Clojure, as is or otherwise, now or in the future.

Mike Meyer

unread,
Jul 1, 2010, 1:44:25 PM7/1/10
to clo...@googlegroups.com
On [many an occasion] "Many people" wrote:

> user=> (time foo)
...
> "Elapsed time: 245.152 msecs"

I hate to be a wet blanket, but how accurate is this? The doc doesn't
even say whether it measures wall clock time or cpu time. Even if you
knew that, it's at best a foundation to build a real benchmarking
system on.

By "real benchmarking" I mean one that does things like run the
benchmark a few times to fill caches and let the JIT do it's thing,
etc., makes sure it runs the target code enough times to get a chunk
of time large enough to means something, and takes into account the
overhead of making multiple runs. Now do that a dozen or so times so
you can take some stats, so when you compare two timings you actually
have enough information to decide if one is really faster than the
other, or the difference is just fuzz in the measurement system.

Ok, I started building something like that, and immediately started
getting ludicrous results. Googling for Java benchmarks turned up
exactly *one* such tool (which presumably works on any runnable or
callable, but fails on Clojure functions) amongst the sets of
benchmark code (most of which didn't talk about how they measured the
time).

First thing: a timer that returns the time so I can work with it:

(defmacro nano-timer "Time body, returning nanoseconds and result"
[body] `(do
(System/runFinalization)
(System/gc)
(let [start# (System/nanoTime)
res# ~body]
[(- (System/nanoTime) start#) res#])))

Second thing: run the benchmark many times, subtracting out the loop
overhead:

(defmacro loop-timer
"Time code in a loop, returning nanoseconds taken minus loop overhead"
[code count] `(let [overhead# (first (nano-timer (dotimes [_# ~count])))
mytime# (first (nano-timer (dotimes [_# ~count] ~code)))]
(- mytime# overhead#)))


Which has the annoying habit of returning negative numbers :-(. I'd
really like to know how the loop that calculates overhead takes more
time than the loop that actually runs the code, no matter how trivial
the code is!

Is anyone using anything more sophisticated than clojure.core/time for
benchmarking clojure code?

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Peter Schuller

unread,
Jul 1, 2010, 1:51:06 PM7/1/10
to clo...@googlegroups.com
> Is anyone using anything more sophisticated than clojure.core/time for
> benchmarking clojure code?

No, but last time I thought about this I figured a very simple
(benchmark ...) would simply:

* Iterate with exponentially higher repeat counts until total runtime
reaches >= 1 second (say).
* Then, iterate with the same count until the timing over some period,
say 10 seconds, has a sufficiently low standard deviation.
* Then, take that repeat count and re-run enough to reach, say, 10
seconds, of time.

Obviously the 1, 10 and std dev thesholds would be selectable, with
some default. So that in the normal case for micro-benchmarking, you'd
just do:

(benchmark (run-my-stuff))

And it would usually "just work" even if run-my-stuff is a trivial
function that completes really really quickly (due to the exponential
growth).

Thoughts? It's obviously not meant for really serious benchmarking,
but I think it would be a good thing to have in cases where one might
now otherwise use (time ..).

--
/ Peter Schuller

Mike Meyer

unread,
Jul 1, 2010, 1:59:19 PM7/1/10
to clo...@googlegroups.com, peter.s...@infidyne.com

This is pretty much where I was heading when I started writing those
two macros. Wanting to be able to time very short runtimes without
timing the loop, I wrote loop-timer to deal with loop overhead. Only
to have it return negative values in the process of trying to get a
reasonable total run time. This makes that process, um, interesting.

Hugo Duncan

unread,
Jul 1, 2010, 3:21:03 PM7/1/10
to clo...@googlegroups.com
On Thu, 01 Jul 2010 13:44:25 -0400, Mike Meyer
<mwm-keyword-goo...@mired.org> wrote:

> Is anyone using anything more sophisticated than clojure.core/time for
> benchmarking clojure code?

I wrote a benchmarking lib at http://github.com/hugoduncan/criterium

The references in the README are worth checking.

You might also find Lau's blog post of interest:
http://www.bestinclass.dk/index.clj/2010/03/functional-fluid-dynamics-in-clojure.html

--
Hugo Duncan

Heinz N. Gies

unread,
Jul 1, 2010, 3:27:16 PM7/1/10
to clo...@googlegroups.com

On Jul 1, 2010, at 21:21 , Hugo Duncan wrote:

> On Thu, 01 Jul 2010 13:44:25 -0400, Mike Meyer <mwm-keyword-goo...@mired.org> wrote:
>
>> Is anyone using anything more sophisticated than clojure.core/time for
>> benchmarking clojure code?
>
> I wrote a benchmarking lib at http://github.com/hugoduncan/criterium

Actually the initial thing I started did not even use time I used a very flawed method using 'time java -cp ...' But I did this on purpose since my goal was to benchmark clojure in comparison to Scala (or other languages) where neither (time) nor creterium are useable since it is language specific.

Regards,
Heinz

Mike Meyer

unread,
Jul 1, 2010, 3:51:28 PM7/1/10
to clo...@googlegroups.com, hugod...@users.sourceforge.net
On Thu, 01 Jul 2010 15:21:03 -0400
"Hugo Duncan" <hugod...@users.sourceforge.net> wrote:

> On Thu, 01 Jul 2010 13:44:25 -0400, Mike Meyer
> <mwm-keyword-goo...@mired.org> wrote:
>
> > Is anyone using anything more sophisticated than clojure.core/time for
> > benchmarking clojure code?
>
> I wrote a benchmarking lib at http://github.com/hugoduncan/criterium
>
> The references in the README are worth checking.

I'd seen the Elliptic group posting.

This looks nice, but doesn't work with 1.1 :-(. Do you know the last
commit that did?

Better yet, can I talk you into posting a 1.1 jar file to the
downloads area, maybe along with a 1.2 RC, so users who aren't
comfortable with the Java infrastructure
(http://www.mired.org/home/mwm/papers/simple-clojure.html) can play
along?

Thanks,

j-g-faustus

unread,
Jul 1, 2010, 2:27:09 PM7/1/10
to Clojure
On Jul 1, 7:51 pm, Peter Schuller <peter.schul...@infidyne.com> wrote:
> > Is anyone using anything more sophisticated than clojure.core/time for
> > benchmarking clojure code?

Criterium, a benchmarking library for Clojure, seems pretty good:
http://github.com/hugoduncan/criterium

Based on ideas in this article:
http://www.ibm.com/developerworks/java/library/j-benchmark1.html

The stackoverflow question where I found it, thanks to Michal Marczyk:
http://stackoverflow.com/questions/3041299/how-to-benchmark-functions-in-clojure


Getting *accurate* results can be hard, even with a benchmarking
library. Criterium runs the code 60 times and does statistical
analysis on the results, but I can still get variations above +/-10%
from run to run in the REPL.

I think benchmarking works best when
* starting a new run each time - i.e. from the command line, a "clean
slate" JVM
* having something that runs long enough to stabilize the JVM -
Criterium wants 1 min or more total runtime.
* running it more than once and checking that results are tolerably
consistent
* looking for differences in orders of magnitude rather than a couple
of percent more or less.

I suppose the good news is that differences in orders of magnitude (1x
vs 10x vs 100x) tend to stand out and do not need anything
particularly scientific apart from running it long enough a few times.

Mike Meyer

unread,
Jul 1, 2010, 9:44:54 PM7/1/10
to clo...@googlegroups.com, johannes...@gmail.com
On Thu, 1 Jul 2010 11:27:09 -0700 (PDT)
j-g-faustus <johannes...@gmail.com> wrote:
> On Jul 1, 7:51 pm, Peter Schuller <peter.schul...@infidyne.com> wrote:
> > > Is anyone using anything more sophisticated than clojure.core/time for
> > > benchmarking clojure code?
> Criterium, a benchmarking library for Clojure, seems pretty good:
> http://github.com/hugoduncan/criterium

The author responded here.

> Based on ideas in this article:
> http://www.ibm.com/developerworks/java/library/j-benchmark1.html
>
> The stackoverflow question where I found it, thanks to Michal Marczyk:
> http://stackoverflow.com/questions/3041299/how-to-benchmark-functions-in-clojure
>
>
> Getting *accurate* results can be hard, even with a benchmarking
> library. Criterium runs the code 60 times and does statistical
> analysis on the results, but I can still get variations above +/-10%
> from run to run in the REPL.
>
> I think benchmarking works best when
> * starting a new run each time - i.e. from the command line, a "clean
> slate" JVM
> * having something that runs long enough to stabilize the JVM -
> Criterium wants 1 min or more total runtime.
> * running it more than once and checking that results are tolerably
> consistent
> * looking for differences in orders of magnitude rather than a couple
> of percent more or less.

Criterium (and the Java code based on the same ideas) has what looks
like a serious problem for microbenchmarking:

After figuring out how many times to execute the body to make it last
a minute, it times the execution of the loop that does this. Which
means the reported time includes loop overhead.

When I tried the obvious solution (time an empty loop and subtract
that value from the original) on top of time, it was sometimes
generating negative values, so the empty loop is taking more time than
the loop being timed. That's a pretty solid indication that timing the
target code was lost in the noise of timing the loop overhead.

Possibly trying this fix with all of criterium code to stabilize the
JVM would help with the problem. I'd fix it (pretty easy) and try
myself, but there's no 1.1 jar and the current sources require 1.2.

If nothing else adding code to measure the empty loop and punting if
the difference between that and the code loop is statistically
insignificant would seem like a good idea.

Hugo Duncan

unread,
Jul 1, 2010, 10:42:05 PM7/1/10
to clo...@googlegroups.com
On Thu, 01 Jul 2010 15:51:28 -0400, Mike Meyer
<mwm-keyword-goo...@mired.org> wrote:

> This looks nice, but doesn't work with 1.1 :-(. Do you know the last
> commit that did?

I'm not sure that I would be too confident on the correctness of any
version that ran on 1.1.

> Better yet, can I talk you into posting a 1.1 jar file to the
> downloads area, maybe along with a 1.2 RC, so users who aren't
> comfortable with the Java infrastructure
> (http://www.mired.org/home/mwm/papers/simple-clojure.html) can play
> along?

I would need to take the time to backport the lib from its existing
state. As far as I remember there is only very light defrecord usage to
contend with. I'm afraid it is not high priority for me at the moment.

I would also wonder how significant the loop overhead is, especially for
comparing implementations of the same function, which would then have
similar loop overhead. It is interesting to time Thread/sleep calls with
different sleep periods (not that sleep is going to be accurate itself,
but a regression of loop count versus Criterium's measured time and the
sleep time might be interesting).

Another issue with timing fast, simple functions is that jit can optimise
a calculation to a constant if it is simple enough.

My major concern with Criterium would rather be on the percentage garbage
collection time reported. There still needs to be work done on removing
the allocation of result structures from the body of the timed results. I
would expect that to have a bigger impact than the loop overhead.

Hugo


--
Hugo Duncan

Aaron Cohen

unread,
Jul 1, 2010, 11:39:16 PM7/1/10
to clo...@googlegroups.com

If nothing else adding code to measure the empty loop and punting if
the difference between that and the code loop is statistically
insignificant would seem like a good idea.


It's actually notoriously hard to time the "empty loop" on the JVM. Once you've iterated a few thousand times, the JIT will kick in and recognize the loop does no work (or possibly even find a closed form solution for the loop if there are simple math ops within it) and remove it entirely.

Cliff Click devotes entire talks to benchmarking in Java, how hard it is to do correctly, and the common pitfalls: http://www.azulsystems.com/events/javaone_2009/session/2009_J1_Benchmark.pdf

Mike Meyer

unread,
Jul 2, 2010, 12:56:53 AM7/2/10
to clo...@googlegroups.com, aa...@assonance.org
On Thu, 1 Jul 2010 23:39:16 -0400
Aaron Cohen <aa...@assonance.org> wrote:

> > If nothing else adding code to measure the empty loop and punting if
> > the difference between that and the code loop is statistically
> > insignificant would seem like a good idea.
> It's actually notoriously hard to time the "empty loop" on the JVM. Once
> you've iterated a few thousand times, the JIT will kick in and recognize the
> loop does no work (or possibly even find a closed form solution for the loop
> if there are simple math ops within it) and remove it entirely.

If it tosses out your empty loop, then you're back where you started -
timing the loop as well as the operation.

If it finds the closed form solution for the real loop - that would
explain the results I'm seeing.

> Cliff Click devotes entire talks to benchmarking in Java, how hard it is to
> do correctly, and the common pitfalls:
> http://www.azulsystems.com/events/javaone_2009/session/2009_J1_Benchmark.pdf

()#*$Q#% PPT. But I think some of the ideas can be incorporated....

Thanks,

Peter Schuller

unread,
Jul 2, 2010, 2:48:03 AM7/2/10
to clo...@googlegroups.com, aa...@assonance.org
WIth regards to benchmarking that accurately discounts loop overhead;
it seems to me that even if you apply elaborate logic, if the thing
you're benchmarking is so small that looping overhead becomes
significant, you risk making the benchmark subject to subtle
variations anyway,
--
/ Peter Schuller

Greg

unread,
Jul 1, 2010, 10:19:56 PM7/1/10
to clo...@googlegroups.com
I don't see how the loop is relevant here, at least if the same benchmarking function is used for all the benchmarks you're doing, it should make a difference then since the overhead is the same.

j-g-faustus

unread,
Jul 2, 2010, 8:44:50 AM7/2/10
to Clojure
On Jul 2, 3:44 am, Mike Meyer <mwm-keyword-googlegroups.
620...@mired.org> wrote:
> On Thu, 1 Jul 2010 11:27:09 -0700 (PDT)
> j-g-faustus <johannes.fries...@gmail.com> wrote:
> > Criterium, a benchmarking library for Clojure, seems pretty good:
>
> The author responded here.
>

I noticed, my reply was sent an hour earlier. I'm still on moderation,
so my mails are 1-12 hours delayed.
Sorry.

Mike Meyer

unread,
Jul 2, 2010, 1:41:17 PM7/2/10
to clo...@googlegroups.com, gr...@kinostudios.com
On Thu, 1 Jul 2010 22:19:56 -0400
Greg <gr...@kinostudios.com> wrote:

> I don't see how the loop is relevant here, at least if the same benchmarking function is used for all the benchmarks you're doing, it should make a difference then since the overhead is the same.

It depends on what you're benchmarking. If the loop time is much
smaller than either the actual code time or the standard deviation in
the measurements, then it's just noise, and you can ignore it. If it's
on the order of the same size as the standard deviation, then it can
fool you into falsely concluding that there's no statistically
significant difference between two algorithms. Once it gets to be
around half the total benchmark time, your results are pretty much
worthless.

j-g-faustus

unread,
Jul 2, 2010, 3:27:43 PM7/2/10
to Clojure
On Jul 2, 7:41 pm, Mike Meyer <mwm-keyword-googlegroups.
620...@mired.org> wrote:
> On Thu, 1 Jul 2010 22:19:56 -0400
> It depends on what you're benchmarking. If the loop time ... is
> on the order of the same size as the standard deviation, then it can
> fool you into falsely concluding that there's no statistically
> significant difference between two algorithms. Once it gets to be
> around half the total benchmark time, your results are pretty much
> worthless.

I'm not a JVM benchmarking expert by any means, but I have been
working with Java for close to a decade,
and to the best of my knowledge that level of micro-benchmarking
precision is not possible, for all the reasons listed and linked to -
dynamic HotSpot optimizations, unpredictable garbage collection,
limited timing resolution etc.

For loops that are part of the program, like iterating over all the
pixels in an image to compute a transform, it's a non-issue. Those
loops are part of the algorithm, and if transform A takes the same
time as transform B because both of them are dominated by the time it
takes to loop through every pixel, you can correctly conclude that the
difference between A and B does not matter.

But the benchmarking loop time and other administrative overhead
(including the extra overhead on garbage collection) needs to be an
insignificant fraction of the total, yes.
If the benchmarking framework does more work than the code being
measured, you will be benchmarking the framework rather than the code
you want to measure.

I normally deal with it by putting larger chunks of code inside the
benchmarking loop, aiming for at least a second of runtime per
invocation.

Even then, the PDF from Azul systems linked to earlier says that
"performance varies from run to run - stable numbers within a single
JVM invocation, but could vary > 20% with new JVM invocation", which
matches well with my experience.

In practice, I have given up on timing very small blocks of code, and
I have given up on trying for precision better than +/-10%.

Your mileage may vary.


jf





Reply all
Reply to author
Forward
0 new messages