sparse matrices in Clojure - data scientist new to Clojure

133 views
Skip to first unread message

maria

unread,
Dec 16, 2016, 9:59:28 AM12/16/16
to Numerical Clojure
Hi,

I am a data scientist and a Clojure/JVM newbie. I have a Python background, where I used the scipy sparse matrices library a lot, since I mostly work on natural language processing applications where the data is mostly in sparse format.

I have been trying your core.matrix vectorz sparse array implementation (thank you for your great contribution!), but I have found that after doing some benchmarks, I get some confusing results. In particular, I found that doing a mmul operation with 2 sparse matrices (68 x 72743) and (72743 x 100) was prohibitively slow, whereas doing multiple MMUL operation on a sparse (1 x 72743) array by a full matrix (72743 x 100) was much much faster.

Do you have any insights of why that would be the case?

Also, do you have any sources or documentation I could read? I am new to Clojure, so still finding it a bit difficult to go directly to the source code.

In any case, thanks so much for your contribution again.


Mike Anderson

unread,
Dec 20, 2016, 4:46:01 AM12/20/16
to Numerical Clojure
Hi Maria,

Glad you are finding the vectorz implementation useful!

You're probably running into cases that aren't yet optimised. Performance of sparse matrix operations is currently very dependent on the layout of the data, and in particular the types of the first and second argument. This isn't yet very well documented as I'm still experimenting with sparse implementations, and only optimising specific cases as and when I run into performance issues.

Basically what is happening is that the core.matrix operations do some quick checks and then delegate top the relevant Java method, in this case MatrixClassName.innerProduct(...). You may get some insights from reading the Java source code for Vectorz, see e.g. : https://github.com/mikera/vectorz/blob/develop/src/main/java/mikera/matrixx/impl/SparseRowMatrix.java

If you file an issue (on https://github.com/mikera/vectorz/issues ) and let me know the *exact* types and sizes of the two matrices being multiplied (e.g. then I can take a look and maybe optimise those cases. 

You can get the types of the matrix arguments by calling the build-in `type` function, e.g.

(type (sparse (new-array :vectorz [100 100])))
=> mikera.matrixx.impl.SparseRowMatrix

Mars0i

unread,
Jan 3, 2017, 12:27:29 PM1/3/17
to Numerical Clojure
Maria, if you want general Clojure information, there are a number of useful links here: http://clojure.org/community/resources .  It's difficult to suggest learning materials for someone without knowing them personally.  I have the impression that Clojure for the Brave and True is a good source for beginners, but that it nevertheless goes into deeper topics.  It's available online and as a printed book.  Clojure Programming is my favorite Clojure book.  It's clearly written, and covers topics I've seen covered nowhere else, even in books that are considered advanced.  I suspect that the introductory material in this book goes a little bit more quickly than Brave and True, but you might prefer that.  There are many other good sources, and no doubt others have different preferences.
Reply all
Reply to author
Forward
0 new messages