I need help tracking down a performance problem.

51 views
Skip to first unread message

Vincent Foley

unread,
Mar 19, 2009, 8:12:11 PM3/19/09
to Clojure
Hello,

For the past few days, I've been trying, unsuccessfully, to make an
application I wrote faster. A Java program that performs, more or
less, the same task takes 12 seconds (on my machine) to parse 1000
files; my Clojure program takes nearly 3 minutes. This more than an
order of magnitude slower! Using the profiling tools available with
the JVM, I quickly determined which function was the costliest. I
copied it into a simple script file to profile it in isolation. I
have made the script and the profile results (long!) available at this
URL: http://gist.github.com/82136

I'm finding the results puzzling: is dereferencing a var *that*
expensive? Can anyone tell me if they see something fundamentally
wrong with my approach that would explain this abysmal performance?

Thank you,

Vincent.

P.S.: I am using Sun's JVM 1.6.0_10 as shipped in Ubuntu Ibex. My
machine is an Athlon 64 X2 4200+ with 3 GB of RAM.

stephaner

unread,
Mar 19, 2009, 10:17:07 PM3/19/09
to Clojure
Hi Mr Foley,

Running your code on my machine give the following result:
(load-file "slow.clj")
"Elapsed time: 1736.034068 msecs"
nil
user> (load-file "slow.clj")
"Elapsed time: 678.850779 msecs"
nil

I'm on Ubuntu 8.10 on a Hp dv6700 with 2gig RAM.

Stephane

Christophe Grand

unread,
Mar 20, 2009, 6:56:11 AM3/20/09
to clo...@googlegroups.com
Hello Vincent,

Vincent Foley a écrit :


> Hello,
>
> For the past few days, I've been trying, unsuccessfully, to make an
> application I wrote faster. A Java program that performs, more or
> less, the same task takes 12 seconds (on my machine) to parse 1000
> files; my Clojure program takes nearly 3 minutes. This more than an
> order of magnitude slower! Using the profiling tools available with
> the JVM, I quickly determined which function was the costliest. I
> copied it into a simple script file to profile it in isolation. I
> have made the script and the profile results (long!) available at this
> URL: http://gist.github.com/82136
>

On my box nearly one half of the total time is sepent in the building of
arr (and by looking at your profiling data a good chunk of these traces
are related to this map call).
Better to get it out of the loop.

(let [arr (into-array Byte/TYPE (map byte [7 ; 1 Byte
3 0 97 0 98 0 99 ; 3 Shorts
0 0 100 100 ; 1 Integer
65 66 67 68 69 70 71 72 73 74
; 10 String
0 0 0 0 0 0 0 0 0 0 ; 10 ignored
]))
buf (ByteBuffer/wrap arr)]
(time
(dotimes [_ 10000]
(.position buf 0)
(run buf))))

Can you give the profile results for this code?

Christophe

--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)


Vincent Foley

unread,
Mar 20, 2009, 9:04:34 AM3/20/09
to Clojure
Here: http://gist.github.com/82352

I have posted memory and cpu profiling figures.

Christophe Grand

unread,
Mar 21, 2009, 9:40:26 AM3/21/09
to clo...@googlegroups.com
Hello,

Is the java code online somewhere to compare? (Since your code is an
interpreter for byte-fields specs there's a lot of stuff going to read a
single byte, I wonder if the java code you're benchmarking against is
that dynamic)

Right now, null-string and read-field-aux are the most time consuming
functions.

Creating a cons for each byte read is too much overhead, I rewrote
null-string like this (yes it's lower level):
(defn null-string
"Read a nul-terminated string. Stop at \\0 or at length n
whichever comes first."
[#^ByteBuffer buf n]
(let [arr #^"[B" (make-array Byte/TYPE n)
sb (StringBuilder.)]
(.get buf arr)
(areduce arr i append true
(when append
(let [b (aget arr i)]
(when-not (zero? (int b))
(.append sb (char b))))))))

Concerning read-field-aux I rewrote it like that:
(def read-field-aux
(let [getters {Byte get-byte
Short get-short
Integer get-integer}]
(fn [#^ByteBuffer buf n type]
(let [f (getters type)]
(if (== (int n) 1)
(f buf)
(let [arr (make-array Object n)]
(dotimes [i n]
(aset arr i (f buf)))
(vec arr)))))))

closing over getters (rather than rebuilding it inside the closure)
yielded nearly as much as going "lower level".

With these two changes, it's the dispatch fn that now dominates.


Vincent Foley a écrit :

Christophe Grand

unread,
Mar 21, 2009, 9:51:11 AM3/21/09
to clo...@googlegroups.com
Christophe Grand a écrit :

> With these two changes, it's the dispatch fn that now dominates.
>

Correction: it's the dispatching, not the computation of the dispatch
value that dominates.

Vincent Foley

unread,
Mar 21, 2009, 4:56:50 PM3/21/09
to Clojure
The code is available at this URL: http://code.google.com/p/bwhf/
(look at the hu.becliza.andras.bwhf.control.[BinRepParser,
BinReplayUnpacker] files)

It is definitely not as dynamic, which helps quite a lot, but I wanted
to have something high level and declarative.

Thank you for your help, I'll look into the improvements you
suggested.

Vincent.

Christophe Grand

unread,
Mar 22, 2009, 4:20:40 PM3/22/09
to clo...@googlegroups.com
Vincent Foley a écrit :

> The code is available at this URL: http://code.google.com/p/bwhf/
> (look at the hu.becliza.andras.bwhf.control.[BinRepParser,
> BinReplayUnpacker] files)
>
> It is definitely not as dynamic, which helps quite a lot, but I wanted
> to have something high level and declarative.
>
If you want to go high-level and declarative you can certainly implement
this as a macro which will compile your declaration into fast static code.

Vincent Foley

unread,
Mar 22, 2009, 4:33:09 PM3/22/09
to Clojure
How would I do that? Make a macro that expands into a map literal
with the appropriate calls to .get, .getShort and .getInt?

On Mar 22, 4:20 pm, Christophe Grand <christo...@cgrand.net> wrote:
> Vincent Foley a écrit :> The code is available at this URL:http://code.google.com/p/bwhf/
> > (look at the hu.becliza.andras.bwhf.control.[BinRepParser,
> > BinReplayUnpacker] files)
>
> > It is definitely not as dynamic, which helps quite a lot, but I wanted
> > to have something high level and declarative.
>
> If you want to go high-level and declarative you can certainly implement
> this as a macro which will compile your declaration into fast static code.
>
> --

Vincent Foley

unread,
Mar 23, 2009, 11:57:25 AM3/23/09
to Clojure
Using the new versions of null-string and read-field-aux that you gave
me, in my real application, the execution time went from 160 seconds
to 150 seconds. As for using macros, I wrote one for the example
program, but I realized that it wouldn't work in my application,
because I sometimes use (apply parse-buffer buf field-vector).

Thanks for the assistance,

Vincent.

Christophe Grand

unread,
Mar 23, 2009, 3:35:00 PM3/23/09
to clo...@googlegroups.com
Hi Vincent!

Vincent Foley a écrit :


> Using the new versions of null-string and read-field-aux that you gave
> me, in my real application, the execution time went from 160 seconds
> to 150 seconds. As for using macros, I wrote one for the example
> program, but I realized that it wouldn't work in my application,
> because I sometimes use (apply parse-buffer buf field-vector).
>

I looked at your program, you can use macros there too.

Given compile-fields a function that outputs the code of a closure
taking a ByteBuffer as only argument then you can rewrite action as a
macro like this:

(defmacro action
[name & v-forms]
{:name name
:fields (compile-fields v-forms)}) and replace (apply parse-buffer
buf fields) with (fields buf).


Do you think it's feasible?

Vincent Foley

unread,
Mar 23, 2009, 10:31:01 PM3/23/09
to Clojure
I can try that.

More generally, is it possible that I'm just doing this whole thing
wrong? That using vectors to represent binary fields and records in a
declarative is just a bad idea and that I should try and explore lower-
level alternatives?

Vincent

Mark Engelberg

unread,
Mar 23, 2009, 10:41:07 PM3/23/09
to clo...@googlegroups.com
On Mon, Mar 23, 2009 at 7:31 PM, Vincent Foley <vfo...@gmail.com> wrote:
> More generally, is it possible that I'm just doing this whole thing
> wrong?  That using vectors to represent binary fields and records in a
> declarative is just a bad idea and that I should try and explore lower-
> level alternatives?
>


This reminds me... I would LOVE to see in Clojure something like
Erlang's bit-packing/destructuring syntax for working with bit
manipulation in a high-level way.

http://www.erlang.se/euc/00/bit_syntax.html

Kyle Schaffrick

unread,
Mar 24, 2009, 5:43:08 PM3/24/09
to clo...@googlegroups.com

I'm actually casually hacking on something like this right now, as I
want to attempt to port a radar processing code I wrote into Clojure.
(It's currently implemented in modern C++ with a functional-ish style)

The WSR-88D file format is *extremely* bizarre and convoluted and
requires a lot of bit-fiddling. Half or more of the current code is
dedicated to converting this format into something sane. I actually hate
that part of my C++ implementation the most because it's all
hand-written unpacking and endianness-fixing of byte arrays which seems
impossible to clean up any further.

-So- .. What I actually have so far is a macro that lets you basically
write Erlang bit-syntax in S-exps and it will expand into a vector that
you can use as the binding list of a let form. The IP header unpacking
example in Erlang's documentation looks something like this:

(let (bits dgram
[ip-ver 4] [hdr-len 4] svc-type [tot-len 16]
[id 16] [flags 3] [frag-off 13]
ttl proto [hdr-cksum 16]
[src-ip 32] [dst-ip 32] [rest-dgram :binary])
'stuff)

Not shown: floats, signedness, and endianness (defaults are all the same
as Erlang).

It's "okay", but it looks unnatural and I don't really like doing it
that way very much. The source and destination names are backwards
(dgram is being destructured into ip-ver, hdr-len and so on), and the
syntax hijacks the whole let form, because I don't think you can write a
macro that automatically splices into its parent form.

So I had been thinking of composing an email about what the likelihood
of seeing extensibility API for destructuring would be. The idea being
that the UI for the bit-syntax library would allow for this custom
destructuring to be pervasive, look a little more conventional, and let
you mix it with regular binding forms (instead of having to use an
explicit let that is completely hijacked by the bit-syntax).

You could of course write something that would unpack bits into a
vector, and then destructure the vector on the left-hand side, but that
separates the names which will be bound (on the left) from their field
specifiers (passed to the macro on the right). I found this difficult to
read when the specifier list gets large, so I implemented it this way so
the bound names are next to their specifications (just like Erlang's).

Details: the macro expands into a bunch of sequential let bindings that
progressively tease apart the blob using some helper functions. I
haven't actually implemented the functions that do this teasing yet
because that requires me to go digging in Java API docs to see what they
provide for dealing with bit-bashing, and I haven't had the inclination
to do that just yet :)

So, is there any chance of extensible destructuring? What would such an
API look like? I have thought about it a lot, but the minutiae of doing
this generally enough to be useful, but simple enough to be
implementable are probably beyond my grasp.

-Kyle

Vincent Foley

unread,
Mar 31, 2009, 8:25:56 AM3/31/09
to Clojure
For those interested, I managed to improve the performance of my
original program from 2 minutes 40 seconds to decode 1000+ files down
to 2 minutes. I'm still far from my goal, but it's an improvement,
especially since the code is shorter and (IMO) cleaner. You can see
it here: http://bitbucket.org/gnuvince/clj-starcraft/src/tip/src/starcraft/replay/parse.clj

And here's another question, running the program with -Xprof shows
that nearly 20% of my execution time is spent calling
java.lang.reflect.Array.set. Is there something wrong with the way I
type hint my array in make-reader?

Thanks,

Vincent.

David Nolen

unread,
Mar 31, 2009, 9:57:25 AM3/31/09
to clo...@googlegroups.com
Did you try using aset-int instead of aset?

Vincent Foley

unread,
Mar 31, 2009, 10:17:26 AM3/31/09
to Clojure
No, but in my defense I did not know such a function existed :) I'll
give it a whirl and report back!

On Mar 31, 9:57 am, David Nolen <dnolen.li...@gmail.com> wrote:
> Did you try using aset-int instead of aset?
>
> On Tue, Mar 31, 2009 at 8:25 AM, Vincent Foley <vfo...@gmail.com> wrote:
>
> > For those interested, I managed to improve the performance of my
> > original program from 2 minutes 40 seconds to decode 1000+ files down
> > to 2 minutes.  I'm still far from my goal, but it's an improvement,
> > especially since the code is shorter and (IMO) cleaner.  You can see
> > it here:
> >http://bitbucket.org/gnuvince/clj-starcraft/src/tip/src/starcraft/rep...

Christophe Grand

unread,
Mar 31, 2009, 10:46:10 AM3/31/09
to clo...@googlegroups.com
Did you try to coerce the result of (~mask-fn ...) with int?
(or use aset-int as suggested by David)

Vincent Foley

unread,
Mar 31, 2009, 11:38:13 AM3/31/09
to Clojure
I tried using aset-int and I tried using int to coerce the result of
mask-fn, the input argument to mask-fn and few other things, but none
of that seems to make a difference so far. Mind you, this is an
aspect of Clojure that I find a little confusing, so I'm just putting
int calls here and there and looking at what happens.

Aaron Cohen

unread,
Mar 31, 2009, 1:24:57 PM3/31/09
to clo...@googlegroups.com
I'm sorry if I missed you mentioning it, but have you tried running
your code with (set! *warn-on-reflection* true) in effect?

-- Aaron

Aaron Cohen

unread,
Mar 31, 2009, 1:26:03 PM3/31/09
to clo...@googlegroups.com
On Tue, Mar 31, 2009 at 1:24 PM, Aaron Cohen <reml...@gmail.com> wrote:
> I'm sorry if I missed you mentioning it, but have you tried running
> your code with (set! *warn-on-reflection* true) in effect?
>

Ugh, I should have looked at your code before I sent that. There it
is on line 1. ;)

-- Aaron

David Nolen

unread,
Mar 31, 2009, 8:57:05 PM3/31/09
to clo...@googlegroups.com
Thanks to cl-format:

(fn
 [buf__2572__auto__ len__2573__auto__]
 (if (= len__2573__auto__ 1)
     (mask8 (. buf__2572__auto__ (get)))
     (let [arr__2574__auto__ (int-array len__2573__auto__)]
       (dotimes
  [i__2575__auto__ len__2573__auto__]
(aset-int
          arr__2574__auto__
          i__2575__auto__
          (mask8 (. buf__2572__auto__ (get)))))
       arr__2574__auto__)))

This is the expansion for (make-reader get mask8), where were you attempting putting the int coercion to to the mask-fn?

David

Vincent Foley

unread,
Mar 31, 2009, 9:09:08 PM3/31/09
to Clojure
I tried surrounding the call to the (. buf# (get)) method and putting
the coercion directly inside the mask8 and mask16 functions. Neither
worked. I want to mention at this point that I have *warn-on-
reflection* set to true for the little script that uses the library
and it doesn't report any call to methods that it can't resolve.

Here's the complete -Xprof output, if it helps.


Flat profile of 176.10 secs (11351 total ticks): main

Interpreted + native Method
4.5% 511 + 0 java.lang.Integer.hashCode
1.4% 160 + 0 java.lang.Integer.intValue
0.8% 91 + 0 starcraft.replay.unpack
$decode_command_block__94.invoke
0.7% 80 + 0 clojure.lang.Numbers.int_array
0.2% 25 + 0 clojure.lang.PersistentVector.pushTail
0.1% 15 + 2 java.lang.ClassLoader.defineClass1
0.1% 16 + 0
hu.belicza.andras.bwhf.control.BinReplayUnpacker.esi28
0.1% 4 + 11 clojure.core__init.load
0.1% 10 + 0 clojure.lang.PersistentVector.cons
0.1% 8 + 0 starcraft.replay.actions$fn__71.invoke
0.1% 8 + 0
hu.belicza.andras.bwhf.control.BinReplayUnpacker.unpackSection
0.1% 0 + 7 java.lang.reflect.Array.setInt
0.1% 7 + 0 clojure.lang.PersistentHashMap
$BitmapIndexedNode.create
0.1% 7 + 0 clojure.lang.RestFn.invoke
0.1% 7 + 0 clojure.lang.RestFn.invoke
0.1% 7 + 0 starcraft.replay.unpack
$decode_commands__99.invoke
0.1% 7 + 0 starcraft.replay.parse
$parse_buffer__53$fn__56.invoke
0.1% 6 + 0 clojure.lang.AFn.applyToHelper
0.1% 6 + 0 clojure.lang.PersistentArrayMap.assoc
0.1% 6 + 0 clojure.lang.PersistentHashMap
$BitmapIndexedNode.assoc
0.0% 0 + 5 java.lang.reflect.Array.newArray
0.0% 0 + 5 java.lang.Class.forName0
0.0% 0 + 5 java.util.zip.Inflater.inflateBytes
0.0% 5 + 0 java.lang.AbstractStringBuilder.<init>
0.0% 5 + 0 java.util.Arrays.copyOfRange
10.9% 1157 + 76 Total interpreted (including elided)

Compiled + native Method
10.4% 1183 + 1 starcraft.replay.parse$fn__23$fn__49.invoke
10.0% 1123 + 17 starcraft.replay.unpack
$decode_command_block__94.invoke
9.2% 1043 + 0 clojure.core$next__3096.invoke
8.9% 1014 + 0 starcraft.replay.parse
$parse_buffer__53$fn__56.invoke
5.5% 626 + 0 clojure.lang.PersistentArrayMap.assoc
4.3% 474 + 17 clojure.lang.PersistentArrayMap.assoc
4.1% 464 + 7 clojure.lang.RestFn.invoke
2.9% 333 + 0 clojure.lang.Cons.next
2.5% 288 + 0 clojure.lang.RT.seq
2.4% 269 + 0 clojure.lang.AFn.applyToHelper
2.2% 249 + 0
hu.belicza.andras.bwhf.control.BinReplayUnpacker.unpackRepChunk
1.8% 202 + 0 clojure.core$seq__3112.invoke
1.6% 174 + 3 clojure.lang.RestFn.applyTo
1.3% 140 + 2 clojure.lang.APersistentMap.cons
1.2% 130 + 1 clojure.core$spread__3225.invoke
1.1% 127 + 0 clojure.lang.PersistentStructMap.valAt
0.8% 93 + 0 clojure.core$reduce__3304.invoke
0.6% 66 + 2 starcraft.replay.unpack
$decode_commands__99.invoke
0.6% 63 + 0 clojure.lang.PersistentArrayMap.valAt
0.1% 13 + 1 clojure.core$conj__3100.invoke
0.1% 9 + 0 clojure.lang.APersistentMap.invoke
0.1% 3 + 6 starcraft.replay.parse
$fn__23$read_shorts__37.invoke
0.1% 8 + 0 clojure.core$nthnext__4405.invoke
0.1% 0 + 7 clojure.lang.ArraySeq.next
0.0% 0 + 5 clojure.lang.APersistentVector.assoc
72.3% 8126 + 76 Total compiled (including elided)

Stub + native Method
15.1% 0 + 1711 java.lang.reflect.Array.setInt
1.2% 0 + 135 java.lang.System.arraycopy
0.3% 0 + 31 java.lang.reflect.Array.set
0.1% 0 + 15 java.io.FileInputStream.readBytes
0.1% 0 + 13 java.lang.reflect.Array.get
0.1% 0 + 7 java.lang.Object.getClass
0.0% 0 + 1 java.lang.Thread.currentThread
16.9% 0 + 1913 Total stub

Thread-local ticks:
0.0% 1 Class loader
0.0% 2 Unknown: no last frame


Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM

Thread-local ticks:
100.0% 1 Blocked (of total)


Global summary of 176.12 seconds:
100.0% 11603 Received ticks
2.1% 246 Received GC ticks
4.3% 495 Compilation
0.0% 2 Other VM operations
0.0% 1 Class loader
0.0% 2 Unknown code
176.257 secs

David Nolen

unread,
Mar 31, 2009, 9:38:05 PM3/31/09
to clo...@googlegroups.com
Did you try

(int (mask8 (. buf__2572__auto__ (get))))

?

Your macro should like this:

(defmacro make-reader
  [get-fn mask-fn]
  `(fn [#^ByteBuffer buf# len#]
     (if (= len# 1)
       (~mask-fn (. buf# (~get-fn)))
       (let [#^"[I" arr# (int-array len#)]
         (dotimes [i# len#]
           (aset-int arr# i# (int (~mask-fn (. buf# (~get-fn))))))
         arr#))))

Vincent Foley

unread,
Mar 31, 2009, 10:00:50 PM3/31/09
to Clojure
I tried it just now; it made no difference. Nevertheless, thank you
for you help and time!

David Nolen

unread,
Mar 31, 2009, 11:04:47 PM3/31/09
to clo...@googlegroups.com
 15.1%     0  +  1711    java.lang.reflect.Array.setInt

Is definitely pointing at the aset-int as being the time gobbler, I think the expression in the macro should be this

(aset-int (ints arr#) i# (int (~mask-fn (. buf# (~get-fn)))))

to be extra safe.

David Nolen

unread,
Mar 31, 2009, 11:17:12 PM3/31/09
to clo...@googlegroups.com
As as a side note, putting type hints into the body of the macro expansion doesn't do anything. These will be discarded. You will need to coerce types via other methods, investigating how ints is implemented in core.clj is a good place to start.

Vincent Foley

unread,
Apr 1, 2009, 11:14:47 AM4/1/09
to Clojure
No change. I think made a small script that spends a large amount of
time in java.lang.reflect.Array.setInt too:

(set! *warn-on-reflection* true)
(time
(dotimes [_ 10000]
(let [#^ints arr (int-array 200)]
(dotimes [i 200]
(aset-int arr i i)))))

So maybe I should try to see what I can do about other functions.
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages