Clojure generates unnecessary and slow type-checks

1,359 views
Skip to first unread message

Leon Barrett

unread,
Jun 13, 2013, 3:02:56 PM6/13/13
to clo...@googlegroups.com
Hi. I've been working with people at Prismatic to optimize some simple math code in Clojure. However, it seems that Clojure generates an unnecessary type check that slows our (otherwise-optimized) code by 50%. Is there a good way to avoid this, is it a bug in Clojure 1.5.1, or something else? What should I do to work around this?

Here's my example. The aget seems to generate an unnecessary checkcast bytecode. I used Jasper and Jasmin to decompile and recompile Bar.class into Bar_EDITED.class, without that bytecode. The edited version takes about 2/3 the time.

(ns demo
  (:import demo.Bar_EDITED))

(definterface Foo
  (arraysum ^double [^doubles a ^int i ^int asize ^double sum]))

(deftype Bar []
  Foo
  (arraysum ^double [this ^doubles a ^int i ^int asize ^double sum]
    (if (< i asize)
      (recur a (unchecked-inc-int i) asize (+ sum (aget a i)))
      sum)))

(defn -main [& args]
  (let [bar (Bar.)
        bar-edited (Bar_EDITED.)
        asize 10000
        a (double-array asize)
        i 0
        ntimes 10000]
    (time
      (dotimes [iter ntimes]
        (.arraysum bar a i asize 0)))
    (time
      (dotimes [iter ntimes]
        (.arraysum bar-edited a i asize 0)))))

;; $ lein2 run -m demo
;; Compiling demo
;; "Elapsed time: 191.015885 msecs"
;; "Elapsed time: 129.332 msecs"


Here's the bytecode for Bar.arraysum:

  public java.lang.Object arraysum(double[], int, int, double);
    Code:
       0: iload_2       
       1: i2l           
       2: iload_3       
       3: i2l           
       4: lcmp          
       5: ifge          39
       8: aload_1       
       9: iload_2       
      10: iconst_1      
      11: iadd          
      12: iload_3       
      13: dload         4
      15: aload_1       
      16: aconst_null   
      17: astore_1      
      18: checkcast     #60                 // class "[D"
      21: iload_2       
      22: invokestatic  #64                 // Method clojure/lang/RT.intCast:(I)I
      25: daload        
      26: dadd          
      27: dstore        4
      29: istore_3      
      30: istore_2      
      31: astore_1      
      32: goto          0
      35: goto          44
      38: pop           
      39: dload         4
      41: invokestatic  #70                 // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
      44: areturn       

As far as I can tell, Clojure generated a checkcast opcode that tests on every loop to make sure the double array is really a double array. When I remove that checkcast, I get a 1/3 speedup (meaning it's a 50% overhead).

Can someone help me figure out how to avoid this overhead?

Thanks.

- Leon Barrett

Jason Wolfe

unread,
Jun 13, 2013, 4:50:48 PM6/13/13
to clo...@googlegroups.com
Taking a step back, the core problem we're trying to solve is just to sum an array's values as quickly as in Java. (We really want to write a fancier macro that allows arbitrary computations beyond summing that can't be achieved by just calling into Java, but this simpler task gets at the crux of our performance issues).

This Java code:

  public static double asum_noop_indexed(double[] arr) {
    double s = 0;
    for (int i = 0; i < arr.length; i++) {
      s += arr[i];
    }
    return s;
  }

can run on an array with 10k elements in about 8 microseconds. In contrast, this Clojure code (which I believe used to be as fast as the Java in a previous Clojure version):

(defn asum-identity [^doubles a]
  (let [len (long (alength a))]
    (loop [sum 0.0
           idx 0]
      (if (< idx len)
        (let [ai (aget a idx)]
          (recur (+ sum ai) (unchecked-inc idx)))
        sum))))

executes on the same array in about 40 microseconds normally, or 14 microseconds with *unchecked-math* set to true.  (We weren't using unchecked-math properly until today, which is why we were doing the hacky interface stuff above, please disregard that -- but I think the core point about an extra cast is still correct).  

For reference, (areduce a1 i r 0.0 (+ (aget a1 i) r)) takes about 23 ms to do the same computation (with unchecked-math true).

Does anyone have ideas for how to achieve parity with Java on this task?  They'd be much appreciated! 

Thanks, Jason

Glen Mailer

unread,
Jun 14, 2013, 3:04:12 AM6/14/13
to clo...@googlegroups.com
This doesn't really answer your question directly, but is there a reason you need to keep this in clojure, or are you just aiming to establish why this is happening?

My understanding was that for performance critical code the general advice is to drop down to raw java?

Glen

Jason Wolfe

unread,
Jun 14, 2013, 3:11:52 AM6/14/13
to clo...@googlegroups.com
Thanks for your response.  I attempted to answer this in my clarification, but our goal is to attack this 'general advice' and make it possible to get the same speed for array handling in natural-seeming Clojure without writing Java.  In particular, we want to create macros that make it easy to achieve maximum performance by putting *your code* for manipulating array elements in the middle of an optimized loop, and this can't be done easily at the library level (as far as I can see) by dropping to Java, since in Java your code would always have to be wrapped in a method invocation with corresponding performance implications.  

Our previous version of this library (developed for Clojure 1.2, IIRC) was able to get within 0-30% or so of raw Java speed while providing a clean Clojure interface, and we're trying to get back to this point with Clojure 1.5 so we can release it as open-source for everyone to use.

-Jason

David Nolen

unread,
Jun 14, 2013, 3:41:07 AM6/14/13
to clojure
Maybe this is an unintended side effect of the changes around unboxed support in loop/recur?


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Mikera

unread,
Jun 14, 2013, 5:58:18 AM6/14/13
to clo...@googlegroups.com
Hi Jason,

Have you guys taken a look at core.matrix for any of this stuff? We're also shooting for near-Java-parity for all of the core operations on large double arrays. 

(use 'clojure.core.matrix)
(require '[criterium.core :as c])

(let [a (double-array (range 10000))]
       (c/quick-bench (esum a)))

WARNING: Final GC required 69.30384798936066 % of runtime
Evaluation count : 45924 in 6 samples of 7654 calls.
             Execution time mean : 12.967112 µs
    Execution time std-deviation : 326.480900 ns
   Execution time lower quantile : 12.629252 µs ( 2.5%)
   Execution time upper quantile : 13.348527 µs (97.5%)
                   Overhead used : 3.622005 ns

All the core.matrix functions get dispatched via protocols, so they work on any kind of multi-dimensional matrix (not just Java arrays). This adds a tiny amount of overhead (about 10-15ns), but it is negligible when dealing with medium-to-large vectors/matrices/arrays.

I'm interested in feedback and hopefully we can collaborate: I'm keen to get the best optimised numerical functions we can in Clojure. Also, I think you may find the core.matrix facilities very helpful when moving to higher level abstractions (i.e. 2D matrices and higher order multi-dimensional arrays)

Jason Wolfe

unread,
Jun 14, 2013, 1:15:34 PM6/14/13
to clo...@googlegroups.com
Hey Mikera,

I did look at core.matrix awhile ago, but I'll take another look.

Right now, flop is just trying to make it easy to write *arbitrary*
array operations compactly, while minimizing the chance of getting
worse-than-Java performance. This used to be very tricky to get right
when flop was developed (against Clojure 1.2); the situation has
clearly improved since then, but there still seem to be some
subtleties in going fast with arrays in 1.5.1 that we are trying to
understand and then automate.

As I understand it, core.matrix has a much more ambitious goal of
abstracting over all matrix types. This is a great goal, but I'm not
sure if the protocol-based implementation can give users any help
writing new core operations efficiently (say, making a new array with
c[i] = a[i] + b[i]^2 / 2) -- unless there's some clever way of
combining protocols with macros (hmmm).

I just benchmarked core.matrix/esum, and on my machine in Clojure
1.5.1 it's 2.69x slower than the Java version above, and 1.62x slower
than our current best Clojure version.

A collaboration sounds interesting -- although right now it seems like
our goals are somewhat orthogonal. What do you think?

-Jason
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/LTtxhPxH_ws/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Mikhail Kryshen

unread,
Jun 14, 2013, 8:46:23 PM6/14/13
to clo...@googlegroups.com
JIT will probably remove unnecessary checkcast instructions. What looks
suspicious to me is that i and asize are converted to longs (notice i2l
opcodes). I noticed earlier that loops with long counters are measurably
slower for the same number of iterations (probably, HotSpot does not
apply some optimizations in this case).

Leon Barrett <lbar...@climate.com> writes:

> Hi. I've been working with people at Prismatic to optimize some simple math
> code in Clojure. However, it seems that Clojure generates an unnecessary
> type check that slows our (otherwise-optimized) code by 50%. Is there a
> good way to avoid this, is it a bug in Clojure 1.5.1, or something else?
> What should I do to work around this?
>
> Here's my example. The aget seems to generate an unnecessary checkcastbytecode. I used Jasper and Jasmin to decompile and recompile Bar.class
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Mikhail

Mikera

unread,
Jun 15, 2013, 7:37:06 AM6/15/13
to clo...@googlegroups.com
On Friday, 14 June 2013 18:15:34 UTC+1, Jason Wolfe wrote:
Hey Mikera,

I did look at core.matrix awhile ago, but I'll take another look.

Right now, flop is just trying to make it easy to write *arbitrary*
array operations compactly, while minimizing  the chance of getting
worse-than-Java performance.  This used to be very tricky to get right
when flop was developed (against Clojure 1.2); the situation has
clearly improved since then, but there still seem to be some
subtleties in going fast with arrays in 1.5.1 that we are trying to
understand and then automate.

As I understand it, core.matrix has a much more ambitious goal of
abstracting over all matrix types.  This is a great goal, but I'm not
sure if the protocol-based implementation can give users any help
writing new core operations efficiently (say, making a new array with
c[i] = a[i] + b[i]^2 / 2) -- unless there's some clever way of
combining protocols with macros (hmmm).

A longer term objective for core.matrix could be to allow compiling such expressions. Our GSoC student Maik Schünemann is exploring how to represent and optimised mathematical expressions in Clojure, and in theory these could be used to compile down to efficient low-level operations. API could look something like this:

;; define an expression
(def my-expression (expression [a b] (+ a (/ (* b b) 2))))

;; compile the expression for the specified matrix implementation A
(def func (compile-expression A my-expression)). 

;; now computation can be run using the pre-compiled, optimised function
(func A B)

In the case that A is a Java double array, then perhaps the flop macros could be the engine behind generating the compiled function?

 
I just benchmarked core.matrix/esum, and on my machine in Clojure
1.5.1 it's 2.69x slower than the Java version above, and 1.62x slower
than our current best Clojure version.

Great - happy to steal your implementation :-) 

Other core.matrix implementations are probably faster BTW: vectorz-clj is pure Java and has esum for the general-purpose Vector type implemented in exactly the same way as your fast Java example. Clatrix executes a lot of operations via native code using BLAS.

 
A collaboration sounds interesting -- although right now it seems like
our goals are somewhat orthogonal.  What do you think?

Seems like the goals are relatively complementary. I think the areas of potential collaboration are of three kinds:
1) Leveraging flop to get the fastest possible implementations for core.matrix
2) Using core.matrix to provide higher-level abstractions / operations on top of basic arrays
3) Sharing tools / standards / approaches where it makes sense

Clearly a significant area of overlap is that we both want the fastest possible operations on double arrays. I'd be delighted if we could leverage flop to get the best possible implementations for core.matrix. 

Another potential area of collaboration is the planned NDArray implementation. This will be a pure Clojure, NumPy-style, efficient N-dimensional array implementation. Under the hood, this will need to run on Java arrays. Dmitry Groshev is our GSoC student who will be working on this.


Jason Wolfe

unread,
Jun 16, 2013, 4:46:09 PM6/16/13
to clo...@googlegroups.com
On Fri, Jun 14, 2013 at 5:46 PM, Mikhail Kryshen <mik...@kryshen.net> wrote:
> JIT will probably remove unnecessary checkcast instructions. What looks
> suspicious to me is that i and asize are converted to longs (notice i2l
> opcodes). I noticed earlier that loops with long counters are measurably
> slower for the same number of iterations (probably, HotSpot does not
> apply some optimizations in this case).

That was my impression as well, but in this case it doesn't seem to
happen, perhaps because the double array is reloaded every time within
the loop so the JIT is not able to determine that the cast is
redundant. Leon's experiments above seem to confirm this, because
just removing the cast instruction is sufficient to achieve parity
with Java performance.

Jason Wolfe

unread,
Jun 16, 2013, 4:53:59 PM6/16/13
to clo...@googlegroups.com
Awesome, that sounds like a very interesting and promising approach.

>>
>> I just benchmarked core.matrix/esum, and on my machine in Clojure
>> 1.5.1 it's 2.69x slower than the Java version above, and 1.62x slower
>> than our current best Clojure version.
>
>
> Great - happy to steal your implementation :-)
>
> Other core.matrix implementations are probably faster BTW: vectorz-clj is
> pure Java and has esum for the general-purpose Vector type implemented in
> exactly the same way as your fast Java example. Clatrix executes a lot of
> operations via native code using BLAS.

Yes, that part of core.matrix is great, and we'll definitely look into
using it.

>>
>> A collaboration sounds interesting -- although right now it seems like
>> our goals are somewhat orthogonal. What do you think?
>
>
> Seems like the goals are relatively complementary. I think the areas of
> potential collaboration are of three kinds:
> 1) Leveraging flop to get the fastest possible implementations for
> core.matrix
> 2) Using core.matrix to provide higher-level abstractions / operations on
> top of basic arrays
> 3) Sharing tools / standards / approaches where it makes sense
>
> Clearly a significant area of overlap is that we both want the fastest
> possible operations on double arrays. I'd be delighted if we could leverage
> flop to get the best possible implementations for core.matrix.
>
> Another potential area of collaboration is the planned NDArray
> implementation. This will be a pure Clojure, NumPy-style, efficient
> N-dimensional array implementation. Under the hood, this will need to run on
> Java arrays. Dmitry Groshev is our GSoC student who will be working on this.

These all sound like great directions to pursue. We'll be fleshing
out our API for double arrays in the next week or so, and I'll invite
you to check out what we've got once things are a bit more concrete --
we'd love comments at that point, and we can figure out next steps for
collaboration then as well.

Cheers, Jason

Rich Hickey

unread,
Jun 18, 2013, 9:36:33 AM6/18/13
to clo...@googlegroups.com

Could you please try your tests against master?

Here, I went from 145ms to 85ms going from 1.5.1 to 1.6.0 master.

If it's the same for you, someone can git bisect and figure out what's up.

Thanks,

Rich

Stuart Sierra

unread,
Jun 18, 2013, 2:12:28 PM6/18/13
to clo...@googlegroups.com
One has to be very careful with this kind of
micro-benchmarking on the JVM. Dead-code elimination can
easily make something seem fast simply because it's not
doing anything. For example, in Java:
https://gist.github.com/stuartsierra/5807356

Being careful not to have dead code, I get about the same
results: 9 microseconds to sum an array of 10000 doubles in
Java.

Getting back to Clojure, I notice a large difference between
Clojure 1.2.1 and 1.5.1 to sum an array. But looking more
closely, that difference is entirely due to dead-code
elimination:
https://gist.github.com/stuartsierra/5807676

With a loop doing real work, my tests show Clojure's
performance on this task is virtually indistinguishable from
Java. The performance hasn't changed since 1.2, although the
syntax has. (I get the same result with `areduce`.)

Also beware of this "helpful" Leiningen 2.1+ feature: it
disables some JVM optimizations to get faster start-up time.
https://github.com/technomancy/leiningen/wiki/Faster#tiered-compilation

To avoid this, you need to add the following to your
`project.clj` file:

    :jvm-opts ^:replace []

The `^:replace` is critical.

Whenever I'm uncertain about a benchmark, I run it without
Leiningen to make sure I know what's going on.

-S

Jason Wolfe

unread,
Jun 18, 2013, 2:29:00 PM6/18/13
to clo...@googlegroups.com
Hi Rich, 

I don't see any difference between 1.5.1 and 1.6.0 master.  The good news is that Stuart Sierra nailed the problem above -- I had no idea that leiningen messed with jvm-opts in the host process, and that seems to have been solely responsible for the performance issues (so we're at Java parity in both 1.5.1 and 1.6.0).   Sorry for the misdirected attribution, and thanks for looking into this -- we really appreciate everyone's help!

-Jason

Leon Barrett

unread,
Jun 18, 2013, 2:28:35 PM6/18/13
to clo...@googlegroups.com
Holy crap, the ^:replace seems to improve my example to the performance of Java! I'm looking further, but that may be the key.


--

Stuart Sierra

unread,
Jun 18, 2013, 2:30:54 PM6/18/13
to clo...@googlegroups.com
For the record, it was Rich who discovered, and told me, that Leiningen was adding extra JVM args. :)
-S

Jason Wolfe

unread,
Jun 18, 2013, 2:33:12 PM6/18/13
to clo...@googlegroups.com
Hi Stuart,



On Tuesday, June 18, 2013 11:12:28 AM UTC-7, Stuart Sierra wrote:
One has to be very careful with this kind of
micro-benchmarking on the JVM. Dead-code elimination can
easily make something seem fast simply because it's not
doing anything. For example, in Java:
https://gist.github.com/stuartsierra/5807356

Being careful not to have dead code, I get about the same
results: 9 microseconds to sum an array of 10000 doubles in
Java.

Getting back to Clojure, I notice a large difference between
Clojure 1.2.1 and 1.5.1 to sum an array. But looking more
closely, that difference is entirely due to dead-code
elimination:
https://gist.github.com/stuartsierra/5807676

With a loop doing real work, my tests show Clojure's
performance on this task is virtually indistinguishable from
Java. The performance hasn't changed since 1.2, although the
syntax has. (I get the same result with `areduce`.)

We thought we were being very careful -- taking into account warmup, dead code elimination, and so on (our actual benchmark suite runs the code many times and reports the sum of the results).

 
Also beware of this "helpful" Leiningen 2.1+ feature: it
disables some JVM optimizations to get faster start-up time.
https://github.com/technomancy/leiningen/wiki/Faster#tiered-compilation

To avoid this, you need to add the following to your
`project.clj` file:

    :jvm-opts ^:replace []

The `^:replace` is critical.

Whenever I'm uncertain about a benchmark, I run it without
Leiningen to make sure I know what's going on.

This was our issue.  We had no idea that Leiningen added extra stuff to the host jvm-opts, thanks so much for your help with this! 

-Jason

Stuart Sierra

unread,
Jun 18, 2013, 3:03:01 PM6/18/13
to clo...@googlegroups.com
Great. Glad we found it!

Since this was so confusing, I've filed an issue with Leiningen to make :jvm-opts in a project.clj override any Leiningen defaults:
https://github.com/technomancy/leiningen/pull/1230

-S

Jason Wolfe

unread,
Jun 18, 2013, 3:19:00 PM6/18/13
to clo...@googlegroups.com
Yes, I agree that this is very confusing behavior -- I just chimed in
on the pull request. Thanks again for your help and this follow-up.

Stuart Sierra

unread,
Jun 19, 2013, 9:24:13 AM6/19/13
to clo...@googlegroups.com
Jason Wolfe wrote:
> We thought we were being very careful

Sorry, didn't mean to imply that you weren't. ;)

It was me who wasn't careful: when I started investigating
this, I used a dead-code loop similar to the Gist I posted,
which made it look like Clojure 1.2 was much faster than
1.5. I guessed (incorrectly) that you had observed the same
effect for the same reasons.

-S


Jason Wolfe

unread,
Jun 20, 2013, 3:45:47 AM6/20/13
to clo...@googlegroups.com
I should follow up on this and clarify that core.matrix's esum is in fact as fast as Java -- I apologize for the false statement (I was unaware that new versions of leiningen disable advanced JIT optimizations by default, which lead to the numbers I reported).

Nevertheless, I hope there may be room for interesting collaboration on more complex operations, or code gen as you mentioned.  I'll follow up later when we're a bit further along.

Mikera

unread,
Jun 28, 2013, 10:17:09 AM6/28/13
to clo...@googlegroups.com
Great thanks for confirming, I was getting worried :-)

On the topic of code gen, we've been thinking a bit about how to represent expressions in the expresso project, and are developing a few potential use case API examples.


If anyone has any additional use cases to think about, then please throw them in!
 

Jason Wolfe

unread,
Jun 29, 2013, 5:01:32 PM6/29/13
to clo...@googlegroups.com
This looks very interesting, thanks for sharing!  I'll think about it a bit while working on our codebase and see if I can contribute any good examples.


 

--

Dmitry Groshev

unread,
Jul 2, 2013, 11:02:53 AM7/2/13
to clo...@googlegroups.com
Jason, can you please help me with this:

I'm not sure if the protocol-based implementation can give users any help 
> writing new core operations efficiently (say, making a new array with 
> c[i] = a[i] + b[i]^2 / 2) -- unless there's some clever way of 
> combining protocols with macros (hmmm). 

Did you mean an optimization of the whole expression (like in expresso project) or optimization of the way this operation is done on the actual matrix representation? The difference is subtle, but important: the former one can be done with some "abstract" matrix implementation and the latter one is a matter of a particular implementation. If I understand correctly, expresso project is strictly about the former. I've elaborated on this in numerical-clojure group: https://groups.google.com/forum/#!topic/numerical-clojure/eioNIa8oc7E (point 4).

Jim - FooBar();

unread,
Jul 2, 2013, 2:05:48 PM7/2/13
to clo...@googlegroups.com
I'm really sorry for coming back to this but even after everything we
learned I'm still not able to get performance equal to java in a simple
factorial benchmark. I'd like to think that I'm doing all the correct
things to keep the comparison fair...observe this:

benchmarks.core=> (crit/bench (jf! 50))
WARNING: Final GC required 2.3432463351555 % of runtime
Evaluation count : 311580 in 60 samples of 5193 calls.
Execution time mean : 196.444969 µs
Execution time std-deviation : 10.637274 µs
Execution time lower quantile : 194.356268 µs ( 2.5%)
Execution time upper quantile : 197.042127 µs (97.5%)
Overhead used : 258.723396 ns

Found 9 outliers in 60 samples (15.0000 %)
low-severe 2 (3.3333 %)
low-mild 7 (11.6667 %)
Variance from outliers : 40.1247 % Variance is moderately inflated by
outliers nil

now java:

benchmarks.core=> (crit/bench (.factorial ^Benchmarks (Benchmarks.) 50))
WARNING: Final GC required 2.656271755497413 % of runtime
Evaluation count : 562260 in 60 samples of 9371 calls.
Execution time mean : 107.148989 µs
Execution time std-deviation : 1.650542 µs
Execution time lower quantile : 106.504235 µs ( 2.5%)
Execution time upper quantile : 108.934066 µs (97.5%)
Overhead used : 258.723396 ns

Found 5 outliers in 60 samples (8.3333 %)
low-severe 1 (1.6667 %)
low-mild 4 (6.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by
outliers

can you spot any differences with this code that would justify needing
almost twice as much time?

(defn jf! "Calculate factorial of n as fast as Java without
overflowing." [n]
(loop [i (int n)
ret 1N]
(if (== 1 i) ret
(recur (dec i) (* ret i)))))

now java:

public BigInteger factorial(final int n){
BigInteger res = BigInteger.valueOf(1L); //build upresult
for (int i = n; i > 1; i--)
res = res.multiply(BigInteger.valueOf(i));
return res;
}

I know this is getting ridiculous but I'm preparing a presentation and
I was sort of counting on this example...Of course, it goes without
saying that I'm using unchecked-math and :jvm-opts ^replace[] .


am I doing something wrong?

thanks for your time

Jim

Leon Barrett

unread,
Jul 2, 2013, 2:11:19 PM7/2/13
to clo...@googlegroups.com
Try longs instead of ints? Clojure doesn't support local ints, so you may be casting longs to ints a lot.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/LTtxhPxH_ws/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.

Jim - FooBar();

unread,
Jul 2, 2013, 2:21:49 PM7/2/13
to clo...@googlegroups.com
with a long counter it needs slightly longer (216 microseconds in average instead of 196)!
any other ideas?

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/LTtxhPxH_ws/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Leon Barrett

unread,
Jul 2, 2013, 2:48:27 PM7/2/13
to clo...@googlegroups.com
I got nothin'. Stare at the bytecode?

Jason Wolfe

unread,
Jul 2, 2013, 3:50:46 PM7/2/13
to clo...@googlegroups.com
I guess this is not connected to our issue.  

You're using Clojure's BigInt, which is probably a bit slower than BigInteger if you know you want it to be big:

user> (class 1N)
clojure.lang.BigInt

Have you tried translating your Java code directly to see if that helps?


On Tue, Jul 2, 2013 at 11:05 AM, Jim - FooBar(); <jimpi...@gmail.com> wrote:
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/LTtxhPxH_ws/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.

Stuart Sierra

unread,
Jul 2, 2013, 8:33:33 PM7/2/13
to clo...@googlegroups.com
Hi Jim,

I cannot reproduce your results. I see Clojure and Java with similar performance when they are both using java.lang.BigInteger. Clojure's arbitrary-precision integer defaults to clojure.lang.BigInt, which in my test is about 12% slower than java.lang.BigInteger.

See https://gist.github.com/stuartsierra/5914513 for my code and results. I did not use Criterium or Leiningen to run the benchmark.

-S

Jim

unread,
Jul 3, 2013, 5:32:45 AM7/3/13
to clo...@googlegroups.com
After following Jason's suggestion to use BigInteger and .multiply instead of BigInt and * I too am getting speed on par with Java (108-109 miroseconds on my RPI). Therefore, I consider this issue closed.... :)

@Stuart

thanks for running the benchmark yourself...maybe I could have save you some time if I had posted my new results late last night...I was about to conclude that 'nothing beats a for-loop' but it seems that loop/recur is equally fast.

Jim
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages