Large Datasets?

techrod

unread,

Nov 3, 2009, 3:52:21 AM11/3/09

to Incanter

One of the biggest problems with R is that it wasn't built from the
ground up to handle large datasets in it's statistical algorithms
(e.g. LM, GLM, GAM, etc).

To be specific it wasn't designed to work with datasets too large to
fit into memory or algorithms generating data (e.g. Hessian matrix)
too big for RAM.

Has any thought been given to making Incanter more scalable with large
datasets than R or is that not a consideration of the project?

If Incanter takes off like R did then it would be a shame not to make
it more scalable for large datasets, especially as it inherits
clojure's scalability over processing cores (concurrency).

liebke

unread,

Nov 3, 2009, 8:53:36 AM11/3/09

to Incanter

I agree, large data support is important, and I would like to improve
Incanter in this area.

I've recently been exploring one approach using Hadoop and Cascading
with the cascading-clojure library by Bradford Cross. Another approach
is to integrate something like the MOA library (Massive Online
Analysis: http://www.cs.waikato.ac.nz/~abifet/MOA/), which is related
to the Weka machine learning library but focused on data stream mining
algorithms that scale well on large data.

I am interested in other suggestions and approaches.

David

cburroughs

unread,

Nov 20, 2009, 10:32:01 AM11/20/09

to Incanter

Has anyone used Incanter with vanilla Hadoop (either with clojure-
hadoop or straight Java interop), or only on top of Cascading?

bradford cross

unread,

Nov 20, 2009, 12:12:57 PM11/20/09

to inca...@googlegroups.com

On Fri, Nov 20, 2009 at 7:32 AM, cburroughs <chris.b...@gmail.com> wrote:

Has anyone used Incanter with vanilla Hadoop (either with clojure-
hadoop or straight Java interop), or only on top of Cascading?

Hi Chris, we are not using incanter in our hadoop jobs yet, but we will be very soon. There will be a lot more on this forthcoming.

Dimitry Gashinsky

unread,

Nov 20, 2009, 12:27:09 PM11/20/09

to inca...@googlegroups.com

Hi,

I am sorry this is not exactly incanter question. I've seen a
reference in this tread to the "cascading-clojure library by Bradford
Cross" but I could never find it. Is it open source? I am trying to
use cascading from clojure and it would be really cool to use some
nice wrapper for it.

Regards,
DiG

bradford cross

unread,

Nov 20, 2009, 12:42:04 PM11/20/09

to inca...@googlegroups.com

On Fri, Nov 20, 2009 at 9:27 AM, Dimitry Gashinsky <dim...@gashinsky.com> wrote:

Hi,

I am sorry this is not exactly incanter question. I've seen a
reference in this tread to the "cascading-clojure library by Bradford
Cross" but I could never find it. Is it open source? I am trying to
use cascading from clojure and it would be really cool to use some
nice wrapper for it.

We're working on getting it out right now, there are just a couple more things we want to do before open sourcing it. We're incubating it in a private repo, I can add you if you like, but you must be warned that it is about to undgero radical API changes over the next couple weeks. :-)

Reply all

Reply to author

Forward