slicing and sum over sparse matrices

369 views
Skip to first unread message

RJ Honicky

unread,
May 19, 2009, 3:45:54 PM5/19/09
to cvx...@googlegroups.com
Hi all,

First of all, thanks to all who contribute to this package! It's
generally fantastic, and very flexible for sparse matrix
manipulation. I'm doing some gaussian process regression with it.

I'm trying to get the diagonal of the product of two very large,
sparse matrices.

Since there is no function to sum by axes, I'm first using cvxopt.mul
and then I'm slicing by the unique indices of the columns of the
result and summing up the results.

This works fine for small matrices, but on large matrices, I'm getting
a memory error, on the slice. It just says "Memory Error:" with no
further message (on Ubuntu 9.0.x).

Also, the sum function doesn't seem to actually do anything (the docs
say it should sum all the non-zero values in the matrix), so I have to
using __reduce__() to get the data values and sum them.

Here's the code:

def product_diagonal(A,B):

assert(A.size[0] == B.size[1])

M = cvxopt.mul(A.T,B)
M_r = M.__reduce__()
I = unique(M_r[1][2])

if len(I)==0:
return cvxopt.spmatrix([0.],[0],[0], (1,A.size[0] ))

return cvxopt.spmatrix([ sum(M[:,i].__reduce__()[1][0]) for i in
I],
cvxopt.matrix(0, (1,I.shape[0])),
I,
(1,A.size[0] ))

Any ideas?

rj

Joachim Dahl

unread,
May 19, 2009, 4:10:49 PM5/19/09
to cvx...@googlegroups.com
Are you sure that the sum of a sparse matrix does nothing? On my machine,
I get this:

>>> print A
[ 2.00e+00 1.23e+00 -1.63e+00 0 0 1.53e+00]
[ 0 0 0 -6.57e-01 1.09e+00 0 ]
[-1.09e+00 2.32e+00 -1.37e+00 -1.30e+00 -7.02e-01 -8.58e-01]
[-5.60e-01 1.11e+00 1.29e+00 1.03e+00 -1.07e+00 -2.05e+00]
[ 7.19e-01 -1.13e+00 5.12e-01 -5.16e-01 1.73e+00 -1.41e+00]
[ 0 1.37e+00 -6.13e-01 0 0 0 ]

>>> sum(A)
0.98032373529189965



By far, the fastest way to this indexing is if you know
the sparsity pattern of (C = A*B) in advance. Then I
would subtract the diagonal as
>>> d = (A*B).V[I],
where I is an index-list into (A*B).V. If you frequently
need to carry out this indexing for the same
sparsity patterns, it's probably worthwhile to
find I, and compute it like that.

Alternatively, you can use single-argument indexing,
i.e., index A*B in column-order mode as
>>> d = (A*B)[::n+1]
Since (A*B) is stored in compressed-column storage, this is
not particularly efficient, but we tried to make indexing
and slicing moderately efficient given those constraints.

If you obtain out-of-memory errors, I suspect you are
indexing the sparse matrix using two arguments, e.g.,
>>> ck = (A*B)[I,k]
In this indexing mode, we actually unpack column k
as a dense column, and subsequently perform fast
indexing into that one. For very large matrices,
you will run out of memory, but we decided on this
compromise to have fast two-argument index for
moderate size sparse matrices.

In a future release, we can consider optimizing the
slice expression
>>> A[::n+1]
for a square sparse matrix of dimension n, as this
is probably frequently useful.

Joachim

RJ Honicky

unread,
May 20, 2009, 3:17:19 AM5/20/09
to cvx...@googlegroups.com
Hi there Joachim,

Oh, duh on the sum part of the question: I was importing scipy using

from scipy import *

so I was actually using scipy's sum function. It works fine when I
use the builtin

I'm actually not using the * operator, since that would explicitly
calculate all of the elements of the matrix. Rather, I'm only
calculating the diagonal using cvxopt.mul to do element-wise
multiplication, and then summing by column. I don't know the sparsity
pattern of the matrices. In numpy, with dense matrices, I would use:

d = numpy.sum(A*B, axis=1)

to do the equivalent thing. Interestingly, I can set a breakpoint
right before the final line, and then try to figure out where the
memory error is coming from:

Surprisingly the following generates a MemoryError:

[ sum(M[:,i]) for i in I[:1]]

but M[:,0]

does not, although I[0] == 0.

This happens even when M is relatively small, say 1000x500000 (with
say 1% non-zero)

Even if the slice is converted to a dense matrix, it should be trivial
to allocate a 1000x1 matrix.

I actually gave up on this approach, and just added add a getter to
the C implementation of spmatrix. Here's a patch in case anyone's
interested (I'm sure its a sloppy one, and its actually a reverse
patch accidentally, but fwiw):

2606,2618d2605
< static PyObject *spmatrix_sum_columns(spmatrix *self, void *closure)
< {
< matrix *A = Matrix_New( SP_NCOLS(self), 1, DOUBLE);
< if (!A) return PyErr_NoMemory();
<
< int_t k, j;
< for (k=0; k<SP_NCOLS(self); k++)
< for (j=SP_COL(self)[k]; j<SP_COL(self)[k+1]; j++)
< MAT_BUFD(A)[k] += SP_VALD(self)[j];
<
< return (PyObject *)A;
< }
<
2669d2655
< {"col_sum", (getter) spmatrix_sum_columns, NULL, "sum of columns"},

Apply it to sparse.c, and you can use A.col_sum to get the sum of the
columns of a sparse matrix. So cvxopt.mul(A,B).col_sum gives you the
diagonal of the product of sparse A and B without unnecessary
calculations.

rj

Joachim Dahl

unread,
May 20, 2009, 3:51:51 AM5/20/09
to cvx...@googlegroups.com
what version of CVXOPT are you using, and are you using a 32 or 64bit
version?

Your example is surprising, and seems like a bug. On my machine I don't
seem to have that problem:

>>> A=spmatrix([],[],[],(100000,10000000))
>>> s = [ sum(A[:,k]) for k in range(A.size[1]) ]

doesn't give a memory error.

Regarding the column sum: I think the same be achieved as

base.gemv(A, matrix(1.0, (A.size[0],1), y, 'T')

This function is similar to blas.gemv for default values of
strides etc., but allows A to be sparse.

I am amazed that anyone is actually able to patch the C code!

Joachim Dahl

unread,
May 20, 2009, 1:03:03 PM5/20/09
to cvx...@googlegroups.com
Perhaps I have misunderstood the objective, but if it
is to compute diag(A*B),  then best way to do that in CVXOPT
is probably to use base.gemm() with partial=True and C a sparse
diagonal matrix, which updates only the
elements in the sparsity pattern of C. For example:

>>> from cvxopt import *
>>> from cvxopt import base
>>> A = spmatrix([1,2,3,4,5,6,7],[0,3,1,3,2,3,3],[0,0,1,1,2,2,3])
>>> B = A.T; B.V += 1
>>> print A
[ 1.00e+00     0         0         0    ]
[    0      3.00e+00     0         0    ]
[    0         0      5.00e+00     0    ]
[ 2.00e+00  4.00e+00  6.00e+00  7.00e+00]

>>> print B
[ 2.00e+00     0         0      3.00e+00]
[    0      4.00e+00     0      5.00e+00]
[    0         0      6.00e+00  7.00e+00]
[    0         0         0      8.00e+00]

>>> print A*B
[ 2.00e+00     0         0      3.00e+00]
[    0      1.20e+01     0      1.50e+01]
[    0         0      3.00e+01  3.50e+01]
[ 4.00e+00  1.60e+01  3.60e+01  1.24e+02]

>>> C = spmatrix(0.0,range(4),range(4))
>>> base.gemm(A,B,C,partial=True)
>>> print C
[ 2.00e+00     0         0         0    ]
[    0      1.20e+01     0         0    ]
[    0         0      3.00e+01     0    ]
[    0         0         0      1.24e+02]

>>>

Afterwards,  you have the diagonal elements in C.V.

Joachim

RJ Honicky

unread,
May 21, 2009, 3:02:29 PM5/21/09
to cvx...@googlegroups.com
Hi there Joachim,

First of all, thanks for the pointers to the blas-like functions. I
didn't see them. That's exactly what I need. I guess I should have
RTFM :)

Here's how you can reproduce the bug in CVXOPT 1.1.1 built from a
tarball (also ubuntu package python-cvxopt):

In [1]: from scipy import *

In [2]: import sprandmtrx

In [3]: A = sprandmtrx.sp_rand(1000,100000, 0.01)

In [4]: I = unique(A.J)

In [5]: I[0]
Out[5]: 0

In [6]: A[:,0]
Out[6]: <1000x1 sparse matrix, tc='d', nnz=11>

In [7]: A[:,I[0]]
Segmentation fault

honicky@grouchy:/proc$ cat version
Linux version 2.6.28-11-server (buildd@crested) (gcc version 4.3.3
(Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009

Note that this is a SMP kernel. Provoking it this way produces a
segfault, but I think it's the same bug.

This doesn't occur on macos x with the same version of CVXOPT.

rj

Joachim Dahl

unread,
May 21, 2009, 3:47:08 PM5/21/09
to cvx...@googlegroups.com
thanks for the bug report - it's less critical than I first thought;
CVXOPT seems
to choke on the numpy integer used for indexing (it ought to just report
invalid argument in the slice):

>>> type(I[0])
<type 'numpy.int64'>

If you typecast the index argument into an int or a cvxopt.matrix, then the
crash disappears.

We will fix it in the next release.

RJ Honicky

unread,
May 22, 2009, 12:42:59 AM5/22/09
to cvx...@googlegroups.com
Hi Joachim,

I just confirmed that this also fixes the Memory Error I was seeing.
Thanks a lot for all your help!

rj
Reply all
Reply to author
Forward
0 new messages