ANN: Sparse matrix support for Clojure with vectorz-clj 0.28.0

211 views
Skip to first unread message

Mike Anderson

unread,
Dec 27, 2014, 4:56:55 AM12/27/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Here is a little belated Christmas present for Clojure data aficionados:

;; setup
(use 'clojure.core.matrix)
(set-current-implementation :vectorz)

;; create a big sparse matrix with a trillion elements (initially zero)
(def A (new-sparse-array [1000000 1000000]))

;; we are hopefully smart enough to avoid printing the whole array!
A
=> #<SparseRowMatrix Large matrix with shape: [1000000,1000000]>

;; mutable setter operations supported so that you can set individual sparse elements
(dotimes [i 1000]
     (mset! A (rand-int 1000000) (rand-int 1000000) (rand-int 100)))

;; all standard core.matrix operations supported
(esum A)
=> 50479.0

;; efficient addition
(time (add A A))
=> "Elapsed time: 12.849859 msecs"

;; matrix multiplication / inner products actually complete in sensible time
;; (i.e. much faster than than the usual O(n^3) which might take a few thousand years)
(time (mmul A (transpose A)))
=> "Elapsed time: 2673.085171 msecs"


Some nice things to note about the implementation:
- Everything goes through the core.matrix API, so your code won't have to change to use sparse matrices :-)
- Sparse matrices are 100% interoperable with non-sparse (dense) matrices
- Sparse arrays are fully mutable. Management of storage / indexing happens automatically
- It isn't just matrices - you can have sparse vectors, N-dimensional arrays etc.
- Code is pure JVM - no native dependencies to worry about

This is all still very much alpha - so any comments / patches / more rigorous testing much appreciated!



Mars0i

unread,
Dec 28, 2014, 8:15:32 AM12/28/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
This is cool.  Kind of makes me wish my sparse matrices were larger than 300x300.  I'll experiment with it anyway.

Matt Revelle

unread,
Dec 28, 2014, 1:12:05 PM12/28/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.


On Saturday, December 27, 2014 4:56:55 AM UTC-5, Mike Anderson wrote:

Mike Anderson

unread,
Dec 28, 2014, 7:28:03 PM12/28/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Interesting idea. The challenge is that I'm not sure how to add representation specification in an implementation independent way. It's a quirk of vectorz that it has both indexed and hashed storage, I probably wouldn't expect any other implementations to have that. Likewise row and column oriented storage are fairly obvious choices but I still wouldn't expect every implementation to support both.

Any idea how you would specify the API?

I guess we could simply pass an optional map argument of options, but behaviour would be completely implementation specific. 


On Monday, 29 December 2014 02:12:05 UTC+8, Matt Revelle wrote:
Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.

I definitely haven't covered all the potential code paths - in particular a lot of things aren't yet optimised for sparse operations. So any review / patches would be appreciated!

Matt Revelle

unread,
Dec 28, 2014, 8:43:54 PM12/28/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
On Dec 28, 2014, at 7:28 PM, Mike Anderson <mike.r.an...@gmail.com> wrote:

Interesting idea. The challenge is that I'm not sure how to add representation specification in an implementation independent way. It's a quirk of vectorz that it has both indexed and hashed storage, I probably wouldn't expect any other implementations to have that. Likewise row and column oriented storage are fairly obvious choices but I still wouldn't expect every implementation to support both.

Any idea how you would specify the API?

I guess we could simply pass an optional map argument of options, but behaviour would be completely implementation specific. 

I think the map is the way to go. You’re probably correct about few other implementations having as many options, but adding a map of “preferences” seems like a good option. Creating a sparse matrix might then look like:

;; preferences as a separate arg
(new-sparse-array [100000 100000] :vectorz {:order :row :indexed true})

;; an alternative, preferences combined with implementation selection
(new-sparse-array [100000 100000] {:impl :vectorz :order :row :indexed true})

Implementations should throw an exception if they don’t support (or understand) the preferences.

On Monday, 29 December 2014 02:12:05 UTC+8, Matt Revelle wrote:
Glad to see the addition of new-sparse-array to core.matrix. It looks like it defaults to SparseRowMatrix for the Vectorz implementation? Should the API provide a way to specify which sparse matrix representation (e.g., row- vs column-based, indexed vs hashed) should be used? I'd suggest a 3-arity new-sparse-array which takes a keyword indicating the representation to use as well as a new function which returns a list of available representations for a specific implementation.

I think at this point you incorporated (looks like we have some duplication too, doh) all the changes I had made for sparse matrix support in Vectorz, but will verify.

I definitely haven't covered all the potential code paths - in particular a lot of things aren't yet optimised for sparse operations. So any review / patches would be appreciated!

I did some optimization of sparse ops but the code probably needs to be cleaned up before submitting (e.g., generalized and/or moved to the correct level in class hierarchy). Those changes were made hastily when I needed to quickly get a program running fast.

A  branch containing all performance changes based on an older revision of the develop branch is available here:

There is a related sparse-speed branch in my forks of vectorz-clj and core.matrix.

We should also look into other sparse array representations for Vectorz from: Matlab, MTJ (https://github.com/fommil/matrix-toolkits-java, specifically the LinkedSparseMatrix for row and column ops), etc.

-Matt

--
You received this message because you are subscribed to a topic in the Google Groups "Numerical Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/numerical-clojure/LLpq4WHx-k8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to numerical-cloj...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Anderson

unread,
Dec 28, 2014, 9:58:14 PM12/28/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Looks like you have some good changes in your Vectorz branch - any chance you could tidy up and make a PR?

I like the idea of specialised getSlices and getColumns in particular - these should be much faster than getting the slices one-by-one if the data is very sparse.
To unsubscribe from this group and all its topics, send an email to numerical-clojure+unsubscribe@googlegroups.com.

Matt Revelle

unread,
Dec 29, 2014, 8:56:30 AM12/29/14
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Yes, will do.

To unsubscribe from this group and all its topics, send an email to numerical-cloj...@googlegroups.com.

Mike Anderson

unread,
Jan 10, 2015, 11:40:18 PM1/10/15
to numerica...@googlegroups.com, inca...@googlegroups.com, clo...@googlegroups.com
Thanks Matt! I've just release Vectorz 0.45.0 including your changes.

A lot of sparse operations are much faster now!
Reply all
Reply to author
Forward
0 new messages