public static void ConvertToAWT(byte[] cpuArray){
// Given an array of bytes representing a c-style bgra image,
// converts to a java style abgr image
int len = java.lang.reflect.Array.getLength(cpuArray);
for (int i = 0; i < len; i+=4){
byte b = cpuArray[i+0];
byte g = cpuArray[i+1];
byte r = cpuArray[i+2];
byte a = cpuArray[i+3];
cpuArray[i+0] = a;
cpuArray[i+1] = b;
cpuArray[i+2] = g;
cpuArray[i+3] = r; }}
(defn java-like []
(loop [i (int 0)] (if (< i buffer-size)
(let [ + clojure.core/unchecked-add
b (aget cpuArray i)
g (aget cpuArray (+ 1 i))
r (aget cpuArray (+ 2 i))
a (aget cpuArray (+ 3 i))]
(aset-byte cpuArray i a)
(aset-byte cpuArray (+ 1 i) b)
(aset-byte cpuArray (+ 2 i) g)
(aset-byte cpuArray (+ 3 i) r)
(recur (int (+ i 4)))))))
(defn clojure-like []
(doall (flatten (map (fn [[b g r a]] [a b g r]) (partition 4 4
cpuArray)))))
for a byte-array of size 1920000, the java-like clojure function, a
line for line translation, takes several minutes, while the java
method takes around 3 milliseconds.
the clojure-like one takes 6 seconds.
Why is the clojure function so much more obnoxiously slow than its
java counterpart?
Can anyone shed some light on what I'm doing wrong?
sincerely,
--Robert McIntyre
Boxing, most likely, and/or reflection. Your cpuArray needs to be
produced with (int-array ...) and hinted with ^ints to get top
performance.
It's correct. Clojure's compiler inlines certain functions, including
clojure.core/unchecked-foo and, when the arity is 2, clojure.core/+ --
but not a bare, un-qualified +, which might (or might not) at runtime
refer to clojure.core/+ or clojure.core/unchecked-add or whatever.
This:
(defn convert-image [#^bytes cpuArray]
(let [unchecked-add clojure.core/unchecked-add
len (int (count cpuArray))]
(loop [i (int 0)] (if (< i len)
(let [
b (byte (aget cpuArray i))
g (byte (aget cpuArray (unchecked-add 1
i)))
r (byte (aget cpuArray (unchecked-add 2
i)))
a (byte (aget cpuArray (unchecked-add 3
i)))]
(aset-byte cpuArray i a)
(aset-byte cpuArray (unchecked-add 1 i) b)
(aset-byte cpuArray (unchecked-add 2 i) g)
(aset-byte cpuArray (unchecked-add 3 i) r)
(recur (int (unchecked-add i 4))))))))
vs this.
(defn convert-image [#^bytes cpuArray]
(let [len (java.lang.reflect.Array/getLength cpuArray)]
(loop [i (int 0)] (if (< i len)
(let [i2 (unchecked-add 1 i)
i3 (unchecked-add 2 i)
i4 (unchecked-add 3 i)
b (byte (aget cpuArray i))
g (byte (aget cpuArray i2))
r (byte (aget cpuArray i3))
a (byte (aget cpuArray i4))]
(aset-byte cpuArray i a)
(aset-byte cpuArray i2 b)
(aset-byte cpuArray i3 g)
(aset-byte cpuArray i4 r)
(recur (unchecked-add i 4)))))))
The first function takes forever; the second MUCH faster.
Upon disassembling the byte-code of the two compiled functions, it
does seem like the + was not being inlined.
Since the method of reassignment doesn't preserve the metadata, this
makes sense.
However, my new, modified function is still around 20 times slower
than the java version :(
I still don't understand what's slowing me down, but I'm much happier
that I can get within 20x of java instead of 20000x
thanks everyone.
sincerely,
--Robert McIntyre
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
Is this hint still necessary on 1.3.0?
I copied your code and tested it out, but on my machine it takes 230
milliseconds while the java version takes about 3.
If it's not too much trouble, how long does the java implementation
take on your machine?
sincerely,
--Robert McIntyre
David, thanks for your suggestions.
I copied your code and tested it out, but on my machine it takes 230
milliseconds while the java version takes about 3.
If it's not too much trouble, how long does the java implementation
take on your machine?
sincerely,
--Robert McIntyre
-verbose:gc -Xmn500M -Xms2000M -Xmx2000M -server
sincerely,
--Robert McIntyre
Part of the difference (under 1.2) is due to the (substantial)
overhead of accessing the buffer-size var on every iteration.
I ran a quick check and using David's version of the code result
averaged 17.2ms. Just changing buffer-size to a local with using (let
[buffer-size (int 1920000)]...) the time dropped to an average 3.4ms.
Maybe someone could compile all these "let's speed up this clojure
code" threads and put it somewhere as a valuable resource we could
point to when things like this come up again?
sincerely,
--Robert McIntyre
What is this "java server" of which you speak. In JVMs I'm familiar
with (Sun/Oracle, OpenJDK) it's just a matter of passing the option
-server to java when starting it.
// Ben