ANN: Sparse matrix support for Clojure with vectorz-clj 0.28.0

Mike Anderson

unread,

Dec 27, 2014, 4:56:55 AM12/27/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

Here is a little belated Christmas present for Clojure data aficionados:

;; setup

(use 'clojure.core.matrix)

(set-current-implementation :vectorz)

;; create a big sparse matrix with a trillion elements (initially zero)

(def A (new-sparse-array [1000000 1000000]))

;; we are hopefully smart enough to avoid printing the whole array!

A

=> #<SparseRowMatrix Large matrix with shape: [1000000,1000000]>

;; mutable setter operations supported so that you can set individual sparse elements

(dotimes [i 1000]

(mset! A (rand-int 1000000) (rand-int 1000000) (rand-int 100)))

;; all standard core.matrix operations supported

(esum A)

=> 50479.0

;; efficient addition

(time (add A A))

=> "Elapsed time: 12.849859 msecs"

;; matrix multiplication / inner products actually complete in sensible time

;; (i.e. much faster than than the usual O(n^3) which might take a few thousand years)

(time (mmul A (transpose A)))

=> "Elapsed time: 2673.085171 msecs"

Some nice things to note about the implementation:

- Everything goes through the core.matrix API, so your code won't have to change to use sparse matrices :-)

- Sparse matrices are 100% interoperable with non-sparse (dense) matrices

- Sparse arrays are fully mutable. Management of storage / indexing happens automatically

- It isn't just matrices - you can have sparse vectors, N-dimensional arrays etc.

- Code is pure JVM - no native dependencies to worry about

This is all still very much alpha - so any comments / patches / more rigorous testing much appreciated!

Mars0i

unread,

Dec 28, 2014, 8:15:32 AM12/28/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

This is cool. Kind of makes me wish my sparse matrices were larger than 300x300. I'll experiment with it anyway.

Matt Revelle

unread,

Dec 28, 2014, 1:12:05 PM12/28/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.

On Saturday, December 27, 2014 4:56:55 AM UTC-5, Mike Anderson wrote:

Mike Anderson

unread,

Dec 28, 2014, 7:28:03 PM12/28/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

Interesting idea. The challenge is that I'm not sure how to add representation specification in an implementation independent way. It's a quirk of vectorz that it has both indexed and hashed storage, I probably wouldn't expect any other implementations to have that. Likewise row and column oriented storage are fairly obvious choices but I still wouldn't expect every implementation to support both.

Any idea how you would specify the API?

I guess we could simply pass an optional map argument of options, but behaviour would be completely implementation specific.

On Monday, 29 December 2014 02:12:05 UTC+8, Matt Revelle wrote:

Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.

I definitely haven't covered all the potential code paths - in particular a lot of things aren't yet optimised for sparse operations. So any review / patches would be appreciated!

Matt Revelle

unread,

Dec 28, 2014, 8:43:54 PM12/28/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

On Dec 28, 2014, at 7:28 PM, Mike Anderson <mike.r.an...@gmail.com> wrote:

Interesting idea. The challenge is that I'm not sure how to add representation specification in an implementation independent way. It's a quirk of vectorz that it has both indexed and hashed storage, I probably wouldn't expect any other implementations to have that. Likewise row and column oriented storage are fairly obvious choices but I still wouldn't expect every implementation to support both.

Any idea how you would specify the API?

I guess we could simply pass an optional map argument of options, but behaviour would be completely implementation specific.

I think the map is the way to go. You’re probably correct about few other implementations having as many options, but adding a map of “preferences” seems like a good option. Creating a sparse matrix might then look like:

;; preferences as a separate arg

(new-sparse-array [100000 100000] :vectorz {:order :row :indexed true})

;; an alternative, preferences combined with implementation selection

(new-sparse-array [100000 100000] {:impl :vectorz :order :row :indexed true})

Implementations should throw an exception if they don’t support (or understand) the preferences.

On Monday, 29 December 2014 02:12:05 UTC+8, Matt Revelle wrote:
Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.

I definitely haven't covered all the potential code paths - in particular a lot of things aren't yet optimised for sparse operations. So any review / patches would be appreciated!

I did some optimization of sparse ops but the code probably needs to be cleaned up before submitting (e.g., generalized and/or moved to the correct level in class hierarchy). Those changes were made hastily when I needed to quickly get a program running fast.

A branch containing all performance changes based on an older revision of the develop branch is available here:

https://github.com/mattrepl/vectorz/tree/sparse-speed

There is a related sparse-speed branch in my forks of vectorz-clj and core.matrix.

We should also look into other sparse array representations for Vectorz from: Matlab, MTJ (https://github.com/fommil/matrix-toolkits-java, specifically the LinkedSparseMatrix for row and column ops), etc.

-Matt

--
You received this message because you are subscribed to a topic in the Google Groups "Numerical Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/numerical-clojure/LLpq4WHx-k8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to numerical-cloj...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Anderson

unread,

Dec 28, 2014, 9:58:14 PM12/28/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

Looks like you have some good changes in your Vectorz branch - any chance you could tidy up and make a PR?

I like the idea of specialised getSlices and getColumns in particular - these should be much faster than getting the slices one-by-one if the data is very sparse.

To unsubscribe from this group and all its topics, send an email to numerical-clojure+unsubscribe@googlegroups.com.

Matt Revelle

unread,

Dec 29, 2014, 8:56:30 AM12/29/14

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com

Yes, will do.

To unsubscribe from this group and all its topics, send an email to numerical-cloj...@googlegroups.com.

Mike Anderson

unread,

Jan 10, 2015, 11:40:18 PM1/10/15

to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com