poor dataset performance for 100k items

55 views
Skip to first unread message

Joe B

unread,
Jun 4, 2020, 3:38:28 PM6/4/20
to Incanter


I've started trying Incanter but I've run into some really difficult performance issues with relatively small data sizes.  

Maybe Incanter/clojure just isn’t meant to be used in this way, but I figured I would ask first.  This is the most basic form of the problem:

(def a (incanter.core/to-dataset {:a (range 10000) :b (range 5 10005)}))
(time (count (incanter.core/$ :a a)))
Elapsed time: 2188.47 msecs

Which isn’t awful, but its not great. I guess.

But if I increase the size of that dataset to just 100k instead of 10k, the same operation takes much longer:

(def a (incanter.core/to-dataset {:a (range 100000) :b (range 5 100005)}))
(time (count (incanter.core/$ :a a)))
Elapsed time: 238673.793 msecs 

The increase in time is not linear.  Hopefully this is something obvious that I don't understand and is easily improved.

Thanks in advance for any advice.


Daniel Slutsky

unread,
Jun 5, 2020, 1:10:39 AM6/5/20
to inca...@googlegroups.com
Hi Joseph!

I do not know much about the state of Incanter, but there is a new library, partially inspired by it, called tech.ml.dataset .
In terms of performance, I think things are better there.

One related project is a nice wrapper around it, inspired by R's dplyr library:
It is under development, but things are looking very good already.

One good place to discuss such questions is the #data-science stream at the Clojurians Zulip Chat.
The authors of these libraries are there and are very helpful.

--

---
You received this message because you are subscribed to the Google Groups "Incanter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to incanter+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/incanter/dbae6054-1cc1-4793-bc41-de214a381642o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages