Help optimizing sparse matrix code

Joshua Tokle

unread,

Nov 10, 2014, 4:03:34 PM11/10/14

to julia...@googlegroups.com

Hello! I'm trying to replace an existing matlab code with julia and I'm having trouble matching the performance of the original code. The matlab code is here:
    https://github.com/jotok/InventorDisambiguator/blob/julia/Disambig.m

The program clusters inventors from a database of patent applications. The input data is a sparse boolean matrix (named XX in the script), where each row defines an inventor and each column defines a feature. For example, the jth column might correspond to a feature "first name is John". If there is a 1 in the XX[i, j], this means that inventor i's first name is John. Given an inventor i, we find similar inventors by identifying rows in the matrix that agree with XX[i, :] on a given column and then applying element-wise boolean operations to the rows. In the code, for a given value of `index`, C_lastname holds the unique column in XX corresponding to a "last name" feature such that XX[index, :] equals 1. C_firstname holds the unique column in XX corresponding to a "first name" feature such that XX[index, :] equals 1. And so on. The following code snippet finds all rows in the matrix that agree with XX[index, :] on full name and one of patent assignee name, inventory city, or patent class:

    lump_index_2 = step & ((C_assignee | C_city | C_class))

The `step` variable is an indicator that's used to prevent the same inventors from being considered multiple times. My attempt at a literal translation of this code to julia is here:
    https://github.com/jotok/InventorDisambiguator/blob/julia/disambig.jl

The matrix X is of type SparseMatrixCSC{Int64, Int64}. Boolean operations aren't supported for sparse matrices in julia, so I fake it with integer arithmetic. The line that corresponds to the matlab code above is

    lump_index_2 = find(step .* (C_name .* (C_assignee + C_city + C_class)))

The reason I grouped it this way is that initially `step` will be a "sparse" vector of all 1's, and I thought it might help to do the truly sparse arithmetic first.

I've been testing this code on a Windows 2008 Server. The test data contains 45,763 inventors and 274,578 possible features (in other words, XX is an 45,763 x 274,58 sparse matrix). The matlab program consistently takes about 70 seconds to run on this data. The julia version shows a lot of variation: it's taken as little as 60 seconds and as much as 10 minutes. However, most runs take around 3.5 to 4 minutes. I pasted one output from the sampling profiler here [1]. If I'm reading this correctly, it looks like the program is spending most of its time performing element-wise multiplication of the indicator vectors I described above.

I would be grateful for any suggestions that would bring the performance of the julia program in line with the matlab version. I've heard that the last time the matlab code was run on the full data set it took a couple days, so a slow-down of 3-4x is a signficant burden. I did attempt to write a more idiomatic julia version using Dicts and Sets, but it's slower than the version that uses sparse matrix operations:
    https://github.com/jotok/InventorDisambiguator/blob/julia/disambig2.jl

Thank you!
Josh

[1] https://gist.github.com/jotok/6b469a1dc0ff9529caf5

Milan Bouchet-Valat

unread,

Nov 10, 2014, 5:05:29 PM11/10/14

to julia...@googlegroups.com

You should be able to get a speedup by replacing this line with an
explicit `for` loop. First, you'll avoid memory allocation (one for each
+ or .* operation). Second, you'll be able to return as soon as the
index is found, instead of computing the value for all elements (IIUC
you're only looking for one index, right?).

My two cents

Mauro

unread,

Nov 11, 2014, 3:49:00 AM11/11/14

to julia...@googlegroups.com

I didn't look at your code but it sounds like you are doing row wise
operations. However, the sparse matrices in Julia (and in Matlab too, I
think) are much faster at column-wise access:

XX[:,i] is fast
XX[i,:] is slow

If you have to do both, then you can consider doing column-wise first
then transpose and do columns again.

Joshua Tokle

unread,

Nov 11, 2014, 7:52:09 AM11/11/14

to julia...@googlegroups.com

Thanks for your response. Because the operations are on sparse matrices I'm pretty sure the arithmetic is already more optimized than something I would write:
https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl#L530
I did actually spend some time searching for sparse matrix arithmetic algorithms, but I didn't come up with anything that seemed like it would be an improvement.

It seems that the issue may be with garbage collection. I'm going to post a top-level reply with more on this.

Joshua Tokle

unread,

Nov 11, 2014, 7:55:12 AM11/11/14

to julia...@googlegroups.com

Thanks. The instances of XX[i, :] that appear in my post here are just pseudo-code. In the actual implementation only column-wise slices are used.

Joshua Tokle

unread,

Nov 11, 2014, 8:02:06 AM11/11/14

to julia...@googlegroups.com

I should have posted the output from @time in the initial post. The version of the code using sparse matrices is reporting 87% gc time. I thought I could fix the problem in the second version of the code that uses Sets instead of sparse matrices. Indeed, v2 only reports about 20% gc time, but the overall run time is still worse (about 1.6x the sparse matrix version).

In the sparse matrix version, several sparse vectors are being created in each loop iteration -- I suspect this is driving the garbage collection. If I was using dense arrays I would try preallocating buffers, but I don't know how to do this in a sparse setting. Are there any tricks for optimizing the create/destruction of ephemeral structs?

On Monday, November 10, 2014 5:05:29 PM UTC-5, Milan Bouchet-Valat wrote:

Milan Bouchet-Valat

unread,

Nov 11, 2014, 9:43:50 AM11/11/14

to julia...@googlegroups.com

Le mardi 11 novembre 2014 à 04:52 -0800, Joshua Tokle a écrit :
> Thanks for your response. Because the operations are on sparse
> matrices I'm pretty sure the arithmetic is already more optimized than
> something I would write:
>
> https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl#L530
> I did actually spend some time searching for sparse matrix arithmetic
> algorithms, but I didn't come up with anything that seemed like it
> would be an improvement.

Of course the algorithm is optimized for sparse matrices, but every call
to + or .* creates a copy of the matrix, which is not efficient at all.

> It seems that the issue may be with garbage collection. I'm going to
> post a top-level reply with more on this.

I'm not an expert of sparse matrices, but one efficient way of avoid
temporary copies and garbage collection is to manually unroll your loop,
as I advised in my first message. Now I must admit it's more complex
than it seems if the structure of nonzeros is not always the same in the
various matrices you want to combine. Is that the case?

Regards

Regards

Joshua Tokle

unread,

Nov 11, 2014, 9:54:55 AM11/11/14

to julia...@googlegroups.com

I think I see: you're suggesting a single loop that performs the a*b*(c+d+e) operation element-wise, and this will prevent the allocation of intermediate results. Is that right?

Yes, the sparsity pattern will differ between the vectors I'm operating on, but maybe I can adapt the code from sparsematrix.jl to my special case.

Milan Bouchet-Valat

unread,

Nov 11, 2014, 10:11:22 AM11/11/14

to julia...@googlegroups.com

Le mardi 11 novembre 2014 à 09:54 -0500, Joshua Tokle a écrit :
> I think I see: you're suggesting a single loop that performs the
> a*b*(c+d+e) operation element-wise, and this will prevent the
> allocation of intermediate results. Is that right?

Yes.

> Yes, the sparsity pattern will differ between the vectors I'm
> operating on, but maybe I can adapt the code from sparsematrix.jl to
> my special case.

Indeed, this makes my proposition more challenging. A general algorithm
to combine several sparse matrices would be very useful.

Regards

Tony Kelman

unread,

Nov 11, 2014, 11:25:10 PM11/11/14

to julia...@googlegroups.com

At this point the most effective way to write decently-performing sparse matrix code in Julia is to familiarize yourself with the compressed sparse column format (http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29) and work directly on the colptr, rowval, and nzval vectors. This isn't much better than writing sparse matrix code in C (just keep in mind that the arrays are 1-based in Julia, whereas if you were writing a mex file or standalone C code they would be 0-based), but it's not too terrible once you've done it a few times.

Any time you are trying to do a standard elementwise, binary, or reduction operation across dimensions - min, max, sum, sumabs, sumabs2, etc - on sparse matrices and you see the current Julia version is performing poorly because the fallback version is dense, you can implement it manually without too much trouble by operating directly on the colptr, rowval, and nzval vectors. Pull requests to add additional sparse-specialized operations to sparsematrix.jl are welcome. I'd personally like to see a more general design of the array system long-term that would reduce the number of specialized methods that need to be written for each sparse matrix format, but that's a ways off.

Joshua Tokle

unread,

Nov 12, 2014, 2:39:15 PM11/12/14

to julia...@googlegroups.com

I unrolled the loop as suggested and it brought the runtime down to 85-100 seconds from around 220 seconds. Thanks! We're now approaching the speed of the original matlab code, which runs in about 70 seconds. I'll try to think of other was to bypass the unnecessary creation of temporary objects.

Out of curiosity, why do you suppose the matlab code performs so well on the original implementation compared to julia? Better sparse matrix operations? Is it better optimized for temporary objects? Something else?

Tony Kelman

unread,

Nov 12, 2014, 3:37:28 PM11/12/14

to julia...@googlegroups.com

Virtually all operations on sparse matrices in Matlab are dispatching out to specialized C code, largely (but not entirely) from Tim Davis' SuiteSparse library. There are Julia bindings to some pieces of that same library - CHOLMOD for sparse Cholesky factorization and UMFPACK for sparse LU factorization, but not too much else. We don't use SuiteSparse in Julia for things like indexing, slicing, reductions, sparse matrix-vector multiplication, sparse matrix-matrix multiplication, etc. Those operations are implemented in pure Julia, and as you've seen it's not very well optimized right now. And there are many operations for which no one has written a specialized method for SparseMatrixCSC yet, so it dispatches to the generic AbstractArray version which plainly does not work correctly for large sparse matrices. The short term solution is "write more of those methods," and every little bit helps.

Douglas Bates

unread,

Nov 12, 2014, 4:06:06 PM11/12/14

to julia...@googlegroups.com

Also, there is a disadvantage in leaning too hard on CHOLMOD or UMFPACK or CSparse code in that the licenses are GPL or LGPL. The goal is eventually to eliminate those dependencies.

Viral Shah

unread,

Nov 13, 2014, 10:36:23 PM11/13/14

to julia...@googlegroups.com

There is no reason we should be slower than Matlab's C implementation by more than a factor of 2, ideally 1.5. Not considering GC time, which is being addressed separately anyways, if you conclusively find certain sparse matrix operations slower than Matlab, please file an issue. There are always some code paths and certain operations that may not have been fully optimized just yet.

-viral

Viral Shah

unread,

Nov 13, 2014, 10:38:16 PM11/13/14

to julia...@googlegroups.com

Oops - correction. The goal is to be as fast or faster than Matlab/Octave/Scilab, but in certain cases, we may accept being within 1.5. So, report an issue if a particular sparse matrix operation is slower. Even if we can't address some such issues right away, it is good to have them in the issues to address later.

-viral

Reply all

Reply to author

Forward