Incanter Roadmap

63 views
Skip to first unread message

liebke

unread,
Sep 3, 2009, 4:30:21 PM9/3/09
to Incanter
I want to discuss plans for future Incanter development, and I’m
looking for volunteers interested in contributing to any of the
following projects, as well as suggestions for other improvements.

My list of priorities, in no particular order:

1. Expose additional features of the underlying Java libraries. For
instance, I would like to expose more of the chart customizability of
JFreeChart in the incanter.chart library, e.g. enabling annotations of
categorical charts, allowing users to set the scale on axes,
customizing colors, etc..

2. Create new functions based on the already included Java libraries.
For instance, I would like to improve incanter.optimize by a)
including the nonlinear optimization routines available in Parallel
Colt (see: http://sites.google.com/site/piotrwendykier/software/parallelcolt
and http://www1.fpl.fs.fed.us/optimization.html), b) writing new
routines in Clojure, and c) improving the existing routines.

3. Create an incanter.viz library, consisting of Processing-based data
visualizations.

4. Integrate the Weka machine learning library: http://www.cs.waikato.ac.nz/~ml/index.html

5. Provide additional statistical methods.

6. Optimize existing functions; in general, I have favored ease-of-use
and expressibility over performance, but there is A LOT of room to
optimize without compromising usability.

Any other suggestions or feedback is welcome.

David

liebke

unread,
Sep 4, 2009, 10:36:32 PM9/4/09
to Incanter
I just wanted to post some comments Mark Reid left on the blog
regarding the roadmap:

> Integrating the Weka ML library is an interesting idea but the
> algorithms in Weka are a little dated. I suspect that you will
> have to do a lot of converting back and forth between Weka’s
> data formats and Incanter’s.

> Some other Java ML libraries you may want to consider are
> Rapid Miner (formerly YALE) and Ling Pipe (though this is
> more for text processing).

Thanks for the suggestions on Java ML libraries, your opinion on this
carries a lot of weight.


> One other suggestion I have is you may want to make it
> easier to use sparse matrix features of Parallel Colt from
> Incanter. I started out wanting to implement a simple SGD
> algorithm on top of Incanter but found it was easier just to
> use Parallel Colt’s sparse matrix library directly.

I’ll add that to the TODO list.


> If you’re amenable, I’d be interested in adding some online
> and boosting algorithms to Incanter.

I am very amenable, and look forward to any and all contributions from
you!

-David

bradford

unread,
Sep 18, 2009, 1:58:36 AM9/18/09
to Incanter


On Sep 4, 7:36 pm, liebke <lie...@gmail.com> wrote:
> I just wanted to post some comments Mark Reid left on the blog
> regarding the roadmap:
>
> > Integrating the Weka ML library is an interesting idea but the
> > algorithms in Weka are a little dated. I suspect that you will
> > have to do a lot of converting back and forth between Weka’s
> > data formats and Incanter’s.
> > Some other Java ML libraries you may want to consider are
> > Rapid Miner (formerly YALE) and Ling Pipe (though this is
> > more for text processing).
>
> Thanks for the suggestions on Java ML libraries, your opinion on this
> carries a lot of weight.

Something else to consider is: http://lucene.apache.org/mahout/

However, all this is better off to be modularized. More on that in
next email.

bradford

unread,
Sep 18, 2009, 2:21:50 AM9/18/09
to Incanter
I dug fairly deeply into Incanter in the past 2 days.

I'd really like to use it in my work. I like a lot of waht is going
on with it. There are a few issues I have but they are addressable by
taking actions below.

1) The biggest issue is that I need it to be more modularized. Think
of it like R and its packages. We can use maven or ivy for dependency
management. I am happy to help with this if the work is going to make
it back into master. If we don't do this, it makes it impractical to
depend of incanter for a real project (because it forces you to depend
on all incaters' deps, which is a very large number of deps, most of
which may not be needed - if you just want to use charting of
visualization for example.)

2) There is some non-idomatic clojure stuff going on.
-One thing is that there is java code side by side with clojure
code. The culprit is Matrix.java. Another matter is that we can
probably just port Matrix.java to clojure. :-)
-It seems like there may some macros used where we could get away
with functions - such as in charts.clj. I may be missing something,
but it seems like these could be fns?

3) It would be cool to provide more options for transforming
different data structure into charts. The nice thing about clojure is
that data structures give you lots of options with respect to how you
want to represent your data. I could envision an intermediate set of
functions that transform clojure data structures into
XYSeriesCollections and such. For example, I just hacked this thing
up - warning, may not actually work, although it seems to. :-) We may
be able to come up with some tricks that allow you to chart timeseries
whether they are represented in maps, vectors, maps of vectors, etc.
Alternatively, we may be able to find what is the best idiomatic
clojure representation for different kinds of data.

(defn to-series-collection [name map-of-series]
(let [collection (XYSeriesCollection.)
xyseries {}]
(doall
(for [[x ys] map-of-series
[series-name y] ys]
(if-let [series (xyseries series-name)]
(.add series x y)
(assoc xyseries
series-name
(doto (XYSeries. series-name) (.add x y))))))
(doall (for [series (vals xyseries)]
(.addSeries collection)))
collection))

4) I'd like to start of some wrappers for stats in apache commons-
math. It may be that some parts of commons-math are better than colt
- that is what I have found in the past. Again, IMO, the way to go is
to have seperate modules in Incanter so people only depend on what
they need and want to use, be it stats, charts, visualization,
matrices, etc. Having core depend on colt, swing, processing, etc.
makes it a pretty heavy proposition to use incanter from real
projects. It forces you down the path of using incanter as an
application, which seems unfortunate because it could be useful in a
variety of contexts.

Best!

Brad

On Sep 3, 1:30 pm, liebke <lie...@gmail.com> wrote:
> I want to discuss plans for future Incanter development, and I’m
> looking for volunteers interested in contributing to any of the
> following projects, as well as suggestions for other improvements.
>
> My list of priorities, in no particular order:
>
> 1. Expose additional features of the underlying Java libraries. For
> instance, I would like to expose more of the chart customizability of
> JFreeChart in the incanter.chart library, e.g. enabling annotations of
> categorical charts, allowing users to set the scale on axes,
> customizing colors, etc..
>
> 2. Create new functions based on the already included Java libraries.
> For instance, I would like to improve incanter.optimize by a)
> including the nonlinear optimization routines available in Parallel
> Colt (see:http://sites.google.com/site/piotrwendykier/software/parallelcolt
> andhttp://www1.fpl.fs.fed.us/optimization.html), b) writing new

David Edgar Liebke

unread,
Sep 18, 2009, 9:03:34 AM9/18/09
to inca...@googlegroups.com
Brad,

This is excellent feedback!

1) I agree Incanter would benefit from being more modular, and I am
interested in any further ideas you have on making it happen.

I think you correctly identified the issue; is Incanter an application
or a set of libraries? Up to now, I have developed it as an
application because that is what R is, and that is how I use it. But
an obvious advantage of using Clojure, instead of R, is because it is
a better language for general application development, and this
requires Incanter to be a set of libraries and not an application. I
obviously want it to be both :)


2) The chart functions are macros, so that they can use the
expressions passed to them (without evaluation) as the default chart
labels; this is based on the behavior from R.


3) I'm all for additional flexibility in the data structures that can
be passed to the chart functions.


4) Hmm, Integrating the Apache Commons-math is a bit tricky. It was
the underlying numeric library in an earlier incarnation of Incanter.
It does have some nice functionality but it was slow compared to Colt
and was missing some features I needed. Seamlessly supporting both
libraries might work, except in the case of matrices, which are Colt
matrix objects. Conversion would be required for any Apache Commons
function with a matrix argument.

And then you have a dependency on two numeric libraries, unless you
modularize it so that you have two slightly incompatible stats
libraries each depending on either Apache Commons or Parallel Colt. I
think a strong argument would have to be made for the benefits of
Apache Commons-math before I thought that was worthwhile.


Overall, I think most of the changes you're recommending are not only
good, but necessary, and I look forward to working with you to make
them happen.

David

P.S. Congratulations on Flightcaster, it is awesome!

bradford cross

unread,
Sep 18, 2009, 11:14:51 AM9/18/09
to inca...@googlegroups.com
On Fri, Sep 18, 2009 at 6:03 AM, David Edgar Liebke <lie...@gmail.com> wrote:

Brad,

This is excellent feedback!

1) I agree Incanter would benefit from being more modular, and I am
interested in any further ideas you have on making it happen.

I think you correctly identified the issue; is Incanter an application
or a set of libraries? Up to now, I have developed it as an
application because that is what R is, and that is how I use it. But
an obvious advantage of using Clojure, instead of R, is because it is
a better language for general application development, and this
requires Incanter to be a set of libraries and not an application. I
obviously want it to be both :)

Agree.  The easiest way to do this is use something like clojure-pom and modularize the project so we can build (and depend on) individual targets or build and package "all" (application style.)

I can work on this - I created a brank on github to work on it already.  What is your typical workflow for git?  How should I submit changes for review?



2) The chart functions are macros, so that they can use the
expressions passed to them (without evaluation) as the default chart
labels; this is based on the behavior from R.

Ah...
 


3) I'm all for additional flexibility in the data structures that can
be passed to the chart functions.

Cool, I'll start by adding things on demand and submitting them for review when they become fully (or at least partially) baked.
 


4) Hmm, Integrating the Apache Commons-math is a bit tricky. It was
the underlying numeric library in an earlier incarnation of Incanter.
It does have some nice functionality but it was slow compared to Colt
and was missing some features I needed. Seamlessly supporting both
libraries might work, except in the case of matrices, which are Colt
matrix objects. Conversion would be required for any Apache Commons
function with a matrix argument.

And then you have a dependency on two numeric libraries, unless you
modularize it so that you have two slightly incompatible stats
libraries each depending on either Apache Commons or Parallel Colt. I
think a strong argument would have to be made for the benefits of
Apache Commons-math before I thought that was worthwhile.

Interesting that colt is faster.  Let's not add anything speculatively.  If we find stuff we want and colt doesn't have, then we can look around to find it at that time.
 


Overall, I think most of the changes you're recommending are not only
good, but necessary, and I look forward to working with you to make
them happen.

Let the hacking commence!
 

David

P.S. Congratulations on Flightcaster, it is awesome!

Thanks!
 

David Edgar Liebke

unread,
Sep 18, 2009, 11:46:06 AM9/18/09
to inca...@googlegroups.com
> Agree.  The easiest way to do this is use something like clojure-pom and
> modularize the project so we can build (and depend on) individual targets or
> build and package "all" (application style.)


Okay then, so you're going to drag me into the world of Maven?! I'll
try not to kick and scream too much :) -- I suppose it is time I
learn it.


>
> I can work on this - I created a brank on github to work on it already.
> What is your typical workflow for git?  How should I submit changes for
> review?

Just submit a pull request when you're ready.


> Let the hacking commence!
>

Sounds great!

David

bradford cross

unread,
Sep 18, 2009, 11:52:47 AM9/18/09
to inca...@googlegroups.com
On Fri, Sep 18, 2009 at 8:46 AM, David Edgar Liebke <lie...@gmail.com> wrote:

> Agree.  The easiest way to do this is use something like clojure-pom and
> modularize the project so we can build (and depend on) individual targets or
> build and package "all" (application style.)


Okay then, so you're going to drag me into the world of Maven?! I'll
try not to kick and scream too much :)  -- I suppose it is time I
learn it.


That's how I felt initially as well, but it's not that evil if it is used for simple things in the right way.  All we want is simple build and dependency management, so it is pretty straightforward, especially when using clojure-pom.

egl

unread,
Sep 19, 2009, 11:34:54 PM9/19/09
to Incanter
You should think carefully about governance: an understandable
governance structure is one of the things that encourages users and
volunteers to commit themselves. Open source projects can work with a
variety of structures, e.g. Benevolent Dictator For Life (Python) and
committee (R?), but they need some agreed-on mechanisms for making
design decisions and controlling quality.

If you don't want to use the Benevolent Dictator model, the R
structure might be worth studying: enhancement proposals, bug
tracking, testing, committers, etc.

On Sep 3, 1:30 pm, liebke <lie...@gmail.com> wrote:
> I want to discuss plans for future Incanter development, and I’m
> looking for volunteers interested in contributing to any of the
> following projects, as well as suggestions for other improvements.
>
> My list of priorities, in no particular order:
>
> 1. Expose additional features of the underlying Java libraries. For
> instance, I would like to expose more of the chart customizability of
> JFreeChart in the incanter.chart library, e.g. enabling annotations of
> categorical charts, allowing users to set the scale on axes,
> customizing colors, etc..
>
> 2. Create new functions based on the already included Java libraries.
> For instance, I would like to improve incanter.optimize by a)
> including the nonlinear optimization routines available in Parallel
> Colt (see:http://sites.google.com/site/piotrwendykier/software/parallelcolt
> andhttp://www1.fpl.fs.fed.us/optimization.html), b) writing new

David Edgar Liebke

unread,
Sep 19, 2009, 11:47:48 PM9/19/09
to inca...@googlegroups.com
Good points and certainly something to keep in mind as the number of
contributors grow.

egl

unread,
Sep 28, 2009, 3:58:41 PM9/28/09
to Incanter
Agree with #1: I used incanter to debug an optimization algorithm
this weekend.
Fortunately the algorithm and its drivers were isolated enough that I
could
do this under incanter-as-application, but in a more complex
environment this
could easily get unworkable.
Reply all
Reply to author
Forward
0 new messages