Achieving Java-level array performance in Clojure

263 views
Skip to first unread message

Jason Wolfe

unread,
Jun 16, 2013, 9:27:43 PM6/16/13
to cloju...@googlegroups.com, Leon Barrett
Hi all,

I wanted to call your attention to a thread over on the Clojure mailing list [1], and see if anyone here could help us out.

My post [2] summarizes the core issue: we can't figure out how to write a Clojure function that sums the values in a double array in less than 1.7x the time of the equivalent Java code.  We think we've tracked this down to a double-array typecast in the inner loop which is not optimized out by the JIT [3], and have not yet been able to find a workaround.  

In particular, we're wondering if anyone here would help provide us direction on:
1. how to achieve Java-level array manipulation performance with Clojure 1.5.1 as it stands, or 
2. if this is not possible, what changes to Clojure would be needed to make it possible.

Thanks for your help!

-Jason

Alan Dipert

unread,
Jun 17, 2013, 8:11:20 AM6/17/13
to cloju...@googlegroups.com, Leon Barrett
Hi,
It's not quite the answer you seek, but your message caused me to write https://github.com/tailrecursion/javastar and I thought it might interest you. 

"Java-level array manipulation performance with Clojure", kinda ;-)

Alan


--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jason Wolfe

unread,
Jun 17, 2013, 2:28:45 PM6/17/13
to cloju...@googlegroups.com, Leon Barrett
Thanks Alan! I'm not sure if this will help with this particular
quest, but looks like a really useful tool and I will definitely check
out out.

-Jason
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure Dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure-dev/G9sdbSVC-_k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

David Nolen

unread,
Jun 17, 2013, 2:45:09 PM6/17/13
to cloju...@googlegroups.com
As I somewhat suggested on the main ML, look at the loop/recur emission code, you'll see that it emits casts - this seems OK for primitive numbers but probably not what you want for arrays.

Zach Tellman

unread,
Jun 17, 2013, 6:13:15 PM6/17/13
to cloju...@googlegroups.com, Leon Barrett
I was looking into this (and playing around with the 'no-disassemble' library, which I just discovered), and it looks like this checkcast behavior might be normal for anything which isn't a primitive type [1].  In the linked example, it looks like even if I hint a member variable with the type, it's still emitted as a java.lang.Object, and the type is rechecked on each invocation.

I suppose this makes sense in most contexts, since we rarely have guarantees that a non-primitive type is anything other than an Object.  However, when implementing an interface we do have this guarantee, and in the case of the deftype we could enforce this invariant in the constructor, rather than everywhere else.  I'm not sure how significant a change this would be, though.  Can anyone with more familiarity with the compiler weigh in?

Zach

Jason Wolfe

unread,
Jun 18, 2013, 1:33:02 AM6/18/13
to cloju...@googlegroups.com, Leon Barrett
Thanks Zach.

I've spent some time in the past looking at disassembler output, and unless anything has changed I think Clojure always takes this lazy approach to type casting for non-primitives and relies on the JIT to optimize out the unnecessary casts.  I haven't looked in the compiler, but I suspect this simplifies things by not having to worry about the type except when emitting method calls.

Typed sub-Object fields in types (and records) would allow us to work around the issue with the deftype solution, which is a still a bit awkward since we need a type for every use case (less awkward if it also worked with reify, but that seems like more of a stretch).  But I've also wished for typed fields outside of this context, so that we could do things like:

(defrecord Foo [^Map m])

(.put (.m (Foo. (HashMap.))) 1 2) ;; currently reflects, would not if m was Map

This is also true of Clojure functions.  Several members of our team have been confused by the fact that 

(defn foo [^Map m] (seq m)

emits no typecast, so (foo "test") returns '(\t \e \s \t) rather than throwing, and the flipside of this is that 

(defn foo [^Map m]
  (.put m 1 2)
  (.put m 3 4))

emits two typecasts.  Usually the JIT takes care of this I think, but for whatever reason this doesn't seem to be working in our examples, i.e.,:

(defn asum-identity [^doubles a]
  (let [len (long (alength a))]
    (loop [sum 0.0
           idx 0]
      (if (< idx len)
        (let [ai (aget a idx)]
          (recur (+ sum ai) (unchecked-inc idx)))
        sum))))

compiles to bytecode:

public java.lang.Object invoke(java.lang.Object);
  Code:
   0: aload_1
   1: checkcast #78; //class "[D"
   4: arraylength
   5: i2l
   6: lstore_2
   7: dconst_0
   8: dstore 4
   10: lconst_0
   11: lstore 6
   13: lload 6
   15: lload_2
   16: lcmp
   17: ifge 50
   20: aload_1
   21: checkcast #78; //class "[D"
   24: lload 6
   26: l2i
   27: daload
   28: dstore 8
   30: dload 4
   32: dload 8
   34: dadd
   35: lload 6
   37: lconst_1
   38: ladd
   39: lstore 6
   41: dstore 4
   43: goto 13
   46: goto 55
   49: pop
   50: dload 4
   52: invokestatic #42; //Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
   55: areturn

I don't know enough about bytecode or the JIT to understand why the second cast isn't elided in this case.

In general I think we'd prefer the initial cast upon binding and no further casts for safety, clarity, and predictable performance, but I don't expect this is on the table.

-Jason 

Jason Wolfe

unread,
Jun 18, 2013, 2:47:26 AM6/18/13
to cloju...@googlegroups.com

On Monday, June 17, 2013 11:45:09 AM UTC-7, David Nolen wrote:
As I somewhat suggested on the main ML, look at the loop/recur emission code, you'll see that it emits casts - this seems OK for primitive numbers but probably not what you want for arrays.

Thanks, David.  By "not what you want for arrays" do you mean there's a way to work around this, or are you just summarizing the state of affairs?

For what it's worth the performance is the same whether you let the double array outside the loop or put it in the loop bindings and recur with it.

David Nolen

unread,
Jun 18, 2013, 8:06:10 AM6/18/13
to cloju...@googlegroups.com
Nevermind me, based on the earlier examples I thought perhaps the loop itself emits the cast, in the example you've shown here the array is not part of the loop but you are still getting a cast. I guess the thing I was curious about was figuring out what changed so that the additional cast appeared?

David

Rich Hickey

unread,
Jun 18, 2013, 9:40:00 AM6/18/13
to cloju...@googlegroups.com, Leon Barrett

I replied over there, so let's continue it there now please.

Thanks,

Rich

Rich Hickey

unread,
Jun 18, 2013, 9:41:09 AM6/18/13
to cloju...@googlegroups.com
Let's please not conflate these two issues. In the original report, all of the types involved were arguments, not members.

I know you have an expectation from type hints on deftype fields that is currently not met - the hints are used for calls but not for the storage/member type, which is Object for all reference types.

There's already a comment in Compiler.java:

//todo - when closed-overs are fields, use more specific types here and in ctor and emitLocal?

and a ticket:


It's probably mostly tedium to implement and make sure all cases are covered, while not breaking anything about closures, which share this code.

Rich

Jason Wolfe

unread,
Jun 18, 2013, 1:57:21 PM6/18/13
to cloju...@googlegroups.com
Gotcha, thanks Rich!  If I get some extra bandwidth, I'll use this as an excuse to learn about how Compiler.java works.

-Jason
Reply all
Reply to author
Forward
0 new messages