First preview of Incanter 2.0 (aka Incanter 1.9.0) has been released

390 views
Skip to first unread message

Alex Ott

unread,
Dec 28, 2014, 2:08:39 PM12/28/14
to inca...@googlegroups.com
Hello all

I just pushed the changes from the develop branch to clojars as Incanter 1.9.0.  This release contains following major changes (full changelog is available at https://github.com/incanter/incanter/blob/develop/Changes.md):
 - Full integration with core.matrix with vectorz as default implementation.  So now you can use the implementation the suits you better - if you need performance you can use Clatrix, but if you don't want to have native dependencies, you can use vectorz, ndarray, or other implementation
 - Datasets are now based on core.matrix's Dataset, so you get all functionality that available there (including the labeled rows available in the latest release of core.matrix)

These major changes break existing API, so please consult changelog for more details.

Please report all bugs that you find in this release as Github's Issues with label 2.0

The other functionality (like, reworking the read-dataset) will be available in next releases...

If you have new ideas on what should be included into Incanter 2.0, please discuss them here, or create issue on Github.

Thank you all people who participated in preparation of this release!

I want to wish all of you Happy New Year & happy hacking! :-)

--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Nils Grunwald

unread,
Dec 29, 2014, 4:41:11 AM12/29/14
to inca...@googlegroups.com
Thanks a lot, this is great!

Nils

A

unread,
Jan 28, 2015, 3:08:18 PM1/28/15
to inca...@googlegroups.com
This is great!  I see the "develop" branch but what must I do to try it out? ( Is there a script somewhere that will build all the uberjars? I guess I could then use the include the uberjars into a local repo...?)

Many thanks,
Avram

Aleksandr Sorokoumov

unread,
Jan 28, 2015, 3:14:27 PM1/28/15
to inca...@googlegroups.com
Hello Avram,

All you need to do is:
2. open the directory: cd incanter
2. checkout "develop" branch: git checkout develop
3. install modules: lein modules install
4. open repl: lein repl

Best regards,
Aleksandr

A

unread,
Jan 29, 2015, 3:22:43 PM1/29/15
to inca...@googlegroups.com
I was missing "lein modules install"
Thank-you,
-Avram



On Sunday, December 28, 2014 at 11:08:39 AM UTC-8, Alex Ott wrote:

A

unread,
Jan 29, 2015, 3:23:41 PM1/29/15
to inca...@googlegroups.com
I was missing "lein modules install"
Thank-you,
-Avram



Mike Anderson

unread,
Feb 12, 2015, 1:59:28 AM2/12/15
to inca...@googlegroups.com
This is awesome - congratulations on the big milestone 

I look forward to putting it through its paces over the coming days. 

As always, if anyone is able dedicate some time to testing vectorz-clj and core.matrix then any feedback would be very helpful - I'm keen to ensure we have a rock-solid foundation for Incanter 2.0.

Jeroen van Dijk

unread,
Oct 1, 2015, 8:01:35 AM10/1/15
to Incanter
Hi everyone, 

I'm not sure if this is the right place to put feedback (my first feedback ;-)). I've rediscovered Incanter as a tool to query smaller datasets. Really useful. Now I'm running into some performance issues and I thought what if I rewrite some of the Incanter operations to use transducers. That was not too hard actually and it makes a huge difference for my use case, but I did notice that Incanter seems to be just on 1.6 and is not compatible with 1.7.

This is what happens with 1.5.6 and 1.9.0:

user=> (require '[incanter.core])
WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
nil

With 1.9.0 I also had to add  [org.jblas/jblas "1.2.3"] to project.clj otherwise I got: 

user=> (require '[incanter.core])

ClassNotFoundException org.jblas.DoubleMatrix  java.net.URLClassLoader$1.run (URLClassLoader.java:366)

I'm sure this is nothing once you know it and you are used to these kind of issues (I am kind of), but as an outsider of Incanter it feels the project is a bit abandoned. Especially since the README says nothing about 1.9.0 and the Wiki with plans for 2.0 is also a bit outdated. 

So I guess all I want to say is, thank you for all the hard work, but don't forget the marketing :-)

Cheers,
Jeroen

Btw here is my first version of something with transducers in case someone is interested https://gist.github.com/jeroenvandijk/2c6521b37411bf2a2737. It allows to read bigger datasets (that do not fit in memory) where the goal is to reduce the dataset to something that fits in memory. By using transducers you can have (almost) the same dsl, but with a lot better performance. Also in my approach, $rollup removes data as quickly as possible instead of collecting everything before reducing.  In concrete numbers, I was trying to read a 180MB Avro file with 1.5 million rows  and 20 columns on a MacBook pro and it would never finish before heap space was exploding (with already 2GB of heapspace). With the transducer approach it finished in around 15 seconds with almost no growth in memory usage. Nothing scientific, but it is pretty easy to see why this makes sense I think.



Op donderdag 12 februari 2015 07:59:28 UTC+1 schreef Mike Anderson:

Mike Anderson

unread,
Oct 1, 2015, 9:38:22 PM10/1/15
to Incanter
Hi Jeroen,

Great that you are getting value out of Incanter!

Some thoughts on performance:

1. You may find that vectorz-clj is faster than Clatrix/JBlas for many operations. Currently I think it is faster (often significantly) for everything except large matrix operations

2. More recent versions of core.matrix include a lot of performance improvements. We should probably update Incanter to use these ASAP

3. Transducers probably aren't the optimal way to improve performance in many cases. While they are better than the usual Clojure lazy sequence operations, the boxing / function invocation overhead is still significant on a per-element level. You really want to be doing primitive operations on arrays, which is what implementations like vectorz-clj are actually doing under the hood. Perhaps you can post some of the operations where you are seeing poor performance and I'm happy to look and see if there is a better solution?

4. If you find an specific issue that has poor performance due to array / matrix operations in core.matrix, please file as an issue and I'll look into it: https://github.com/mikera/core.matrix/issues

5. I'm very responsive with PRs / issues if you want to get anything upstream into core.matrix or vectorz-clj, e.g. stuff around loading large datasets

And I agree we need to do more marketing. Incanter is a great tool.... I'll try and do some more blogging and doc updates on my part.

  Mike.

Jeroen van Dijk

unread,
Oct 2, 2015, 11:42:57 AM10/2/15
to inca...@googlegroups.com
Hi Mike,

Thanks for your reply.


1. You may find that vectorz-clj is faster than Clatrix/JBlas for many operations. Currently I think it is faster (often significantly) for everything except large matrix operations

I'm currently only doing some simple groupings of data. I didn't experiment with different backends, so I'm not sure if that could have solved my problem. I was having problems with memory because my code was holding onto the head of a really large sequence. This sequence could therefore not be garbage collected and this caused heapspace problems. The default $rollup function in incanter also doesn't aggressively reduce the data which could also have cause/magnified this issue.


 
2. More recent versions of core.matrix include a lot of performance improvements. We should probably update Incanter to use these ASAP

Yeah not sure if this is related but I guess this is always good. 
 
3. Transducers probably aren't the optimal way to improve performance in many cases. While they are better than the usual Clojure lazy sequence operations, the boxing / function invocation overhead is still significant on a per-element level. You really want to be doing primitive operations on arrays, which is what implementations like vectorz-clj are actually doing under the hood. Perhaps you can post some of the operations where you are seeing poor performance and I'm happy to look and see if there is a better solution?

I'll try to create a proper example case. So far I've only had problems when I was reading a large file without tweaking any Incanter code. I've updated my transducer gist of Incanter and it works really well so far. You can use it together with the old dsl without noticing the difference in usage, I really like it. It probably needs real benchmarks to say how much of a difference it really makes.



4. If you find an specific issue that has poor performance due to array / matrix operations in core.matrix, please file as an issue and I'll look into it: https://github.com/mikera/core.matrix/issues

 Thanks


5. I'm very responsive with PRs / issues if you want to get anything upstream into core.matrix or vectorz-clj, e.g. stuff around loading large datasets

And I agree we need to do more marketing. Incanter is a great tool.... I'll try and do some more blogging and doc updates on my part.


Great to hear! I think it is indeed a really useful tool! I'm sure there is a lot more to be discovered for me personally actually.

Jeroen
 
  Mike.

 

--

---
You received this message because you are subscribed to a topic in the Google Groups "Incanter" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/incanter/KCm4_TeJ87Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to incanter+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages