[SciPy-User] Can I copy a sparse matrix into an existing dense numpy matrix?

33 views
Skip to first unread message

Conrad Lee

unread,
Feb 5, 2012, 10:05:39 AM2/5/12
to scipy...@scipy.org

Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory.

Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations that can't be performed on sparse matrices.

Can I load one of these sparse matrices into A rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() and .todense() methods which are available on scipy sparse matrices do not seem to take an optional dense array argument, but maybe there is some other way to do this.

(I've also started a stackoverflow version of this question here.)

Thanks,

Conrad lee

Warren Weckesser

unread,
Feb 5, 2012, 10:21:01 AM2/5/12
to SciPy Users List

If your sparse matrix is in coo format, you can use fancy indexing to assign the values to the existing array.  For example:

In [29]: import scipy.sparse as sp

In [30]: import numpy as np

In [31]: a = sp.coo_matrix([[0,0,1,0],[0,0,0,0],[2,0,3,0],[0,4,0,0]])

In [32]: d = np.zeros((4,4), dtype=np.int32)

In [33]: a.todense()
Out[33]:
matrix([[0, 0, 1, 0],
        [0, 0, 0, 0],
        [2, 0, 3, 0],
        [0, 4, 0, 0]])

In [34]: d[a.row, a.col] = a.data

In [35]: d
Out[35]:
array([[0, 0, 1, 0],
       [0, 0, 0, 0],
       [2, 0, 3, 0],
       [0, 4, 0, 0]])


Warren

Jonathan Guyer

unread,
Feb 6, 2012, 9:12:19 AM2/6/12
to SciPy Users List

On Feb 5, 2012, at 10:21 AM, Warren Weckesser wrote:

>
>
> On Sun, Feb 5, 2012 at 9:05 AM, Conrad Lee <conr...@gmail.com> wrote:
> Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory.
>
> Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations that can't be performed on sparse matrices.
>
> Can I load one of these sparse matrices into A rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() and .todense() methods which are available on scipy sparse matrices do not seem to take an optional dense array argument, but maybe there is some other way to do this.
>
> (I've also started a stackoverflow version of this question here.)
>
> Thanks,
>
> Conrad lee
>
>
>
> If your sparse matrix is in coo format, you can use fancy indexing to assign the values to the existing array.

Although, unless your sparsity pattern doesn't change (which it may not), you'll need to zero the entire dense array before reassigning, which will also take "a non-negligible amount of time".

_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Conrad Lee

unread,
Feb 6, 2012, 9:56:22 AM2/6/12
to SciPy Users List
Warren, thanks for the suggestion with the COO matrix.  In general I'm storing sparse matrices in the CSR format for quick multiplication, so your approach would mean that I have to convert to a COO matrix every time, but that conversion is pretty quick.

Although, unless your sparsity pattern doesn't change (which it may not), you'll need to zero the entire dense array before reassigning, which will also take "a non-negligible amount of time".

Zeroing out a matrix seems to happen very quickly, probably because it's a vectorized operation taking advantage of the SIMD instructions on modern processors.  As far as I understand it, allocating huge amounts of memory requires slower operations.  I did a quick and dirty benchmark, and zeroing takes a small fraction of the time of allocating.

Jonathan Guyer

unread,
Feb 6, 2012, 10:11:33 AM2/6/12
to SciPy Users List

On Feb 6, 2012, at 9:56 AM, Conrad Lee wrote:

> I did a quick and dirty benchmark, and zeroing takes a small fraction of the time of allocating.

Good to know.

Warren Weckesser

unread,
Feb 6, 2012, 11:21:09 AM2/6/12
to SciPy Users List
On Mon, Feb 6, 2012 at 8:56 AM, Conrad Lee <conr...@gmail.com> wrote:
Warren, thanks for the suggestion with the COO matrix.  In general I'm storing sparse matrices in the CSR format for quick multiplication, so your approach would mean that I have to convert to a COO matrix every time, but that conversion is pretty quick.


Conrad,

Here's an example of how you could do the assignment directly with a CSR matrix:

import numpy as np
from scipy.sparse import csr_matrix

# 'c' is a sparse matrix in CSR format.
c = csr_matrix([[0,0,1,0,0,0],
                [0,2,0,3,0,0],
                [0,0,0,0,0,0],
                [4,0,0,0,5,0]])

# 'a' is the dense array into which we'll copy the nonzero
# elements of 'c'
a = np.zeros(c.shape, dtype=c.dtype)

# The next line is the key part: it converts c.indptr into
# the row indices in the dense array. (c.indices already has
# the columns.)
rows = sum((m*[k] for k, m in enumerate(np.diff(c.indptr))), [])

a[rows, c.indices] = c.data

print c.todense()
print a
print np.all(c.todense() == a)


This might be more efficient than converting to COO.

Warren


Reply all
Reply to author
Forward
0 new messages