I agree, large data support is important, and I would like to improve
Incanter in this area.
I've recently been exploring one approach using Hadoop and Cascading
with the cascading-clojure library by Bradford Cross. Another approach
is to integrate something like the MOA library (Massive Online
Analysis:
http://www.cs.waikato.ac.nz/~abifet/MOA/), which is related
to the Weka machine learning library but focused on data stream mining
algorithms that scale well on large data.
I am interested in other suggestions and approaches.
David