Hi everyone,
I'm not sure if this is the right place to put feedback (my first feedback ;-)). I've rediscovered Incanter as a tool to query smaller datasets. Really useful. Now I'm running into some performance issues and I thought what if I rewrite some of the Incanter operations to use transducers. That was not too hard actually and it makes a huge difference for my use case, but I did notice that Incanter seems to be just on 1.6 and is not compatible with 1.7.
This is what happens with 1.5.6 and 1.9.0:
user=> (require '[incanter.core])
WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
nil
With 1.9.0 I also had to add [org.jblas/jblas "1.2.3"] to project.clj otherwise I got:
user=> (require '[incanter.core])
ClassNotFoundException org.jblas.DoubleMatrix java.net.URLClassLoader$1.run (URLClassLoader.java:366)
I'm sure this is nothing once you know it and you are used to these kind of issues (I am kind of), but as an outsider of Incanter it feels the project is a bit abandoned. Especially since the README says nothing about 1.9.0 and the Wiki with plans for 2.0 is also a bit outdated.
So I guess all I want to say is, thank you for all the hard work, but don't forget the marketing :-)
Cheers,
Jeroen
Btw here is my first version of something with transducers in case someone is interested
https://gist.github.com/jeroenvandijk/2c6521b37411bf2a2737. It allows to read bigger datasets (that do not fit in memory) where the goal is to reduce the dataset to something that fits in memory. By using transducers you can have (almost) the same dsl, but with a lot better performance. Also in my approach, $rollup removes data as quickly as possible instead of collecting everything before reducing. In concrete numbers, I was trying to read a 180MB Avro file with 1.5 million rows and 20 columns on a MacBook pro and it would never finish before heap space was exploding (with already 2GB of heapspace). With the transducer approach it finished in around 15 seconds with almost no growth in memory usage. Nothing scientific, but it is pretty easy to see why this makes sense I think.
Op donderdag 12 februari 2015 07:59:28 UTC+1 schreef Mike Anderson: