Incanter perfomance issues

5 views
Skip to first unread message

signalseeker

unread,
Sep 16, 2009, 11:31:07 AM9/16/09
to Incanter
Hi,

I am very excited to have found incanter and I think this project
fills a huge void in terms of statistical/data analysis capabilities
on the JVM. I have played with it a little bit and everything seems to
work very well except that I seem to have performance issues with
slightly big data sets.

(I am new to both clojure and incanter, so it is more than likely that
I am doing stupid here.)

For eg. Here is a simple function that simulates a random walk. It is
quite fast for small number of realizations, but the perfomance
exponentially degrades as soon you go beyond 1000 realizations.

(defn random-walk-plot([size] (xy-plot (range size) (cumulative-sum
(sample-normal size)))))

user=> (dotimes [_ 5] (time (random-walk-plot 100)))
"Elapsed time: 7.809 msecs"
"Elapsed time: 7.839 msecs"
"Elapsed time: 6.87 msecs"
"Elapsed time: 6.775 msecs"
"Elapsed time: 7.591 msecs"
nil
user=> (dotimes [_ 5] (time (random-walk-plot 1000)))
"Elapsed time: 145.313 msecs"
"Elapsed time: 171.574 msecs"
"Elapsed time: 138.853 msecs"
"Elapsed time: 195.246 msecs"
"Elapsed time: 194.948 msecs"
nil
user=> (dotimes [_ 5] (time (random-walk-plot 10000)))
"Elapsed time: 13682.717 msecs"
"Elapsed time: 15114.414 msecs"
"Elapsed time: 13696.328 msecs"
"Elapsed time: 14654.608 msecs"
"Elapsed time: 14857.395 msecs"
nil
user=>

Am I doing something very wrong?

Cheers,

--
/s

liebke

unread,
Sep 16, 2009, 12:59:36 PM9/16/09
to Incanter

> Am I doing something very wrong?
>

No, you're not doing anything wrong, it looks like the problem is with
the implementation of cumulative-sum. My implementation is a terrible
hack. I've fixed it a bit (and just pushed it out to Github), which
improved the performance, on the 10,000 sample in particular.

My timings:

user=> (dotimes [_ 5] (time (random-walk-plot 10000)))
"Elapsed time: 2812.732 msecs"
"Elapsed time: 2204.218 msecs"
"Elapsed time: 2349.617 msecs"
"Elapsed time: 2077.014 msecs"
"Elapsed time: 2191.742 msecs"
nil

user=> (dotimes [_ 5] (time (random-walk-plot 1000)))
"Elapsed time: 42.859 msecs"
"Elapsed time: 38.165 msecs"
"Elapsed time: 39.342 msecs"
"Elapsed time: 37.942 msecs"
"Elapsed time: 40.004 msecs"
nil

user=> (dotimes [_ 5] (time (random-walk-plot 100)))
"Elapsed time: 3.689 msecs"
"Elapsed time: 3.622 msecs"
"Elapsed time: 3.636 msecs"
"Elapsed time: 3.661 msecs"
"Elapsed time: 3.631 msecs"

David

signalseeker

unread,
Sep 16, 2009, 1:14:33 PM9/16/09
to Incanter
Thanks, it is definitely better now. Could you recommend a profiler to
work with clojure?

The datasets I work with are quite huge(stock market tick data), so I
am very much interested in better performance.

I would love for this to replace the need to use R/python.

Cheers,

--
/s

David Edgar Liebke

unread,
Sep 16, 2009, 2:44:21 PM9/16/09
to inca...@googlegroups.com
> Thanks, it is definitely better now. Could you recommend a profiler to
> work with clojure?
>

I haven't used it myself, but VisualVM may work for you,
https://visualvm.dev.java.net/

David

Reply all
Reply to author
Forward
0 new messages