Q: Improving core.matrix implementations

154 views
Skip to first unread message

Mike Anderson

unread,
Jun 30, 2015, 4:43:44 PM6/30/15
to numerica...@googlegroups.com
Hi all,

I'm interested in views on what should be the focus in terms of improving core.matrix implementations. I'm going to spend some time on Clojure numerical coding over the next few months, so would like to be aware of what people would find most useful so that I can improve things where I can.

Just as a reminder: We currently have three reasonably complete core.matrix implementations:
- The default core.matrix implementations which work with Java arrays, nested Clojure vectors and custom NDArrays (arbitrary element types) - https://github.com/mikera/core.matrix
- Clatrix - a native implementation that uses BLAS via JBlas for accelerated operations (double values only) - https://github.com/tel/clatrix
- vectorz-clj - a pure-JVM N-dimensional array implementation (double values only) - https://github.com/mikera/vectorz-clj

Some additional context:
- Of the implementations above, vectorz-clj is usually the fastest for most operations, although Clatrix wins for a few operations on large matrices thanks to the BLAS optimisations (matrix multiply etc.)
- The work that Dragan Djuric has done on Neanderthal has shown that it's possible to get even better performance than Clatrix with smarter JNI interfaces. This doesn't yet have core.matrix support, but it could easily be added (just some protocol implementations needed). 
- I've experimented a bit and think that it would be relatively easy to add the same native optimisations as Neanderthal to vectorz-clj (see e.g.: https://github.com/mikera/vectorz-native which uses the neanderthal-atlas JNI bindings to add native support to Vectorz)
- There are a bunch of experimental core.matrix implementations out there (e.g. https://github.com/mikera/core.matrix.complex for complex numbers)

I can see a bunch of opportunities to improve core.matrix implementations, e.g.:
a) Focus on pure-JVM implementation (improve vectorz-clj)
b) Focus on native implementation (either improve Clatrix, make Neanderthal work as a core.matrix implementation, or create a vectorz-clj native implementation)
c) Focus on alternative implementations (complex numbers, Spark integration, GPU implementations etc.)
d) Extend core.matrix itself (both in terms of API and default implementations for Clojure data structures)
e) Forget implementations and work on documentation, examples etc.

I'd love to hear views on what people think is most valuable right now. 

As always, actual improvements happen based on what people are interested and motivated to contribute, but understanding what is in most demand is a good start! 

  Mike.

Shriphani Palakodety

unread,
Jun 30, 2015, 5:00:18 PM6/30/15
to numerica...@googlegroups.com
Mike,

How about we try to achieve parity with numpy and assist with getting incanter on core.matrix?

Shriphani

--
You received this message because you are subscribed to the Google Groups "Numerical Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numerical-cloj...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Anderson

unread,
Jun 30, 2015, 5:55:11 PM6/30/15
to numerica...@googlegroups.com, shrip...@gmail.com
Thanks for the input Shriphani, that certainly makes sense. 

Any particular areas where you feel we are lacking vs. numpy? (I know there are still a few gaps, just curious what is most valuable for you?)

The Incanter 1.9 development branch is already running on core.matrix, just needs a bit of a push (especially testing, documentation) to make a nice 2.0 release
To unsubscribe from this group and stop receiving emails from it, send an email to numerical-clojure+unsubscribe@googlegroups.com.

kovas boguta

unread,
Jun 30, 2015, 7:50:27 PM6/30/15
to numerica...@googlegroups.com
Speaking of GPU implementations:

Might be worth looking into wrapping 

which has posted some pretty impressive benchmarks in ML applications.. see the homepage http://bid2.berkeley.edu/bid-data-project/



To unsubscribe from this group and stop receiving emails from it, send an email to numerical-cloj...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Numerical Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numerical-cloj...@googlegroups.com.

Shriphani Palakodety

unread,
Jun 30, 2015, 8:09:49 PM6/30/15
to numerica...@googlegroups.com
Kovas,

A GPU backed core.matrix is a brilliant idea. Definitely a boost for the clj+ml ecosystem.

Mike,
Off the top of my head, random sampling (sampling from nd gaussian etc): http://docs.scipy.org/doc/numpy/reference/routines.random.html

Solving systems of equations?

Best,
S.

Mars0i

unread,
Jun 30, 2015, 8:33:23 PM6/30/15
to numerica...@googlegroups.com
I don't have strong feelings about this, and I have been using matrices less lately than in the past, so don't take my vote to count for very much.  I was struck by Dragan Djuric's claims about Neanderthal's speed, even though benchmarks are only ... well, benchmarks.  I'd be in favor of either getting Neanderthal to work with core.matrix, or adding similar optimizations to clatrix or vectorz-clj--whatever's easier and/or seems like it will run fastest on non-huge matrices.  I have no idea what's involved in any of these tasks, though.

Thanks Mike!

Mars0i

unread,
Jun 30, 2015, 8:36:06 PM6/30/15
to numerica...@googlegroups.com
On Tuesday, June 30, 2015 at 7:09:49 PM UTC-5, Shriphani Palakodety wrote:
Off the top of my head, random sampling (sampling from nd gaussian etc): http://docs.scipy.org/doc/numpy/reference/routines.random.html

Incanter does some of this.  (Or does numpy integrate sampling from distributions with matrices in some way?  I don't know numpy.)

Alexey Cherkaev

unread,
Jul 1, 2015, 4:34:15 AM7/1/15
to numerica...@googlegroups.com
Hi Mike,

I am interested in higher-level code added to `core.matrix`, most of it, unfortunately, is not trivial:
  • LU, QR, SVD decompositions (I think some of it is already there?)
  • Eigenvectors and eigenvalues of the matrix
  • Matrix condition number
Regards,
Alexey

Mike Anderson

unread,
Jul 1, 2015, 5:02:35 AM7/1/15
to numerica...@googlegroups.com, alexey....@gmail.com
Hi Alexey,

Thanks for the comments!

We should have LU, QR, SVD and Eigen-decompositions all working right now (at least for Clatrix and vectorz-clj). Can you take a look and let me know if there are any issues / things you think are wrong or missing?


We don't have an implementation for condition number yet... but I think that can be derived from the SVD decomposition, at least according to http://mathworld.wolfram.com/ConditionNumber.html. I created a new issue here: https://github.com/mikera/core.matrix/issues/242

  Mike.

Mike Anderson

unread,
Jul 1, 2015, 5:11:29 AM7/1/15
to numerica...@googlegroups.com, mars...@logical.net
On Wednesday, 1 July 2015 01:33:23 UTC+1, Mars0i wrote:
I don't have strong feelings about this, and I have been using matrices less lately than in the past, so don't take my vote to count for very much.  I was struck by Dragan Djuric's claims about Neanderthal's speed, even though benchmarks are only ... well, benchmarks.  I'd be in favor of either getting Neanderthal to work with core.matrix, or adding similar optimizations to clatrix or vectorz-clj--whatever's easier and/or seems like it will run fastest on non-huge matrices.  I have no idea what's involved in any of these tasks, though.

Thanks Mike!

I already created an example repo that shows how to integrate Neanderthal's ATLAS bindings into vectorz-clj and it looks pretty easy: 

The main issue to be honest I see right now is getting some kind of decent working cross-platform build. Dragan does some clever stuff with JNI that I'm not sure is easy to port.

netlib-java might also be a good option: https://github.com/fommil/netlib-java





 
Reply all
Reply to author
Forward
0 new messages