sparse matrices and cython

689 views
Skip to first unread message

Ian

unread,
Aug 24, 2009, 1:47:46 PM8/24/09
to APAM Python Users
Does anyone have experience populating sparse python matrices using
Cython code? Here's the issue:
Cython loops are optimized when the data structures are C-types.
However, the scipy.sparse matrices are python objects. I could just
construct a sparse C-array, but then I can't take advantage of the
nice python methods (e.g. slicing) for populating matrices. Anyone
know of a solution? Also, dot products with sparse matrices (in
python) are much slower than non-sparse, e.g.
In [1]: import scipy as sp; from scipy import sparse
In [2]: A = sparse.lil_matrix((1000,1000))
In [3]: A.setdiag(sp.rand(1000))
In [4]: A = A.tocsr()
In [5]: x = sp.rand(1000)
In [6]: timeit sp.dot(A,x)
10 loops, best of 3: 110 ms per loop
In [7]: timeit sp.dot(A.todense(),x)
100 loops, best of 3: 3.33 ms per loop

Jake Hofman

unread,
Aug 24, 2009, 1:53:14 PM8/24/09
to apam-pyt...@googlegroups.com
if you're just looking to construct sparse matrices efficiently, maybe
sparse.dok_matrix would be useful?

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.dok_matrix.html
"Dictionary Of Keys based sparse matrix. This is an efficient
structure for constructing sparse matrices incrementally."

if not, maybe this msg (or previous in the thread) would be helpful?

http://mail.scipy.org/pipermail/scipy-user/2006-August/008998.html

btw, weave is also nice for doing inline C in python code.

http://www.scipy.org/Weave

-j

Ethan Coon

unread,
Aug 24, 2009, 2:40:47 PM8/24/09
to apam-pyt...@googlegroups.com
I've been using PETSc (and petsc4py) for things which I care about speed
of matrix inversions/multiplications:

import numpy, scipy
from petsc4py import PETSc
A = PETSc.Mat().create()
A.setType('aij')
A.setSizes(1000)
u = A.getVecLeft()
u[...] = scipy.rand(1000)
u.assemble()
A.setDiagonal(u)
u.zeroEntries()
u[...] = scipy.rand(1000)
v = u.duplicate()
timeit A.mult(u,v)
100000 loops, best of 3: 14.7 µs per loop

(for comparison, on my machine...)
In [20]: A = sparse.lil_matrix((1000,1000))
In [21]: A.setdiag(sp.rand(1000))
In [22]: A = A.tocsr()
In [23]: x = sp.rand(1000)
In [24]: timeit sp.dot(A,x)
10 loops, best of 3: 195 ms per loop

In [25]: timeit sp.dot(A.todense(),x)
100 loops, best of 3: 4.32 ms per loop

If you want the speed of C, you have to use C-arrays for your matrices
and vectors, which means either giving up the pythonic slicing or giving
up the speed of C.

Ethan

--
-------------------------------------------
Ethan Coon
DOE CSGF - Graduate Student
Dept. Applied Physics & Applied Mathematics
Columbia University
212-854-0415

http://www.ldeo.columbia.edu/~ecoon/
-------------------------------------------

Ethan Coon

unread,
Aug 24, 2009, 3:08:32 PM8/24/09
to apam-pyt...@googlegroups.com
On Mon, 2009-08-24 at 10:47 -0700, Ian wrote:
> Does anyone have experience populating sparse python matrices using
> Cython code? Here's the issue:
> Cython loops are optimized when the data structures are C-types.
> However, the scipy.sparse matrices are python objects. I could just
> construct a sparse C-array, but then I can't take advantage of the
> nice python methods (e.g. slicing) for populating matrices. Anyone
> know of a solution?

On another note, realize that using a pythonic/sliced array in Cython
would likely be slower than just calculating the values in C and then
inserting them via slices in python. The Cython code-generator would
recognize that the sliced arrays were python, then convert the values to
python data types to insert them. You'll want to populate python data
structures with python slicing and python arrays, or populate C data
structures with looped indexing and calculated values, not a mix of the
two.

That said, it's probably fastest to create a set of C data arrays in
Cython without slicing, and then use them as input to csr_matrix() (or
PETSc.Mat.setValues() )

Ethan


> Also, dot products with sparse matrices (in
> python) are much slower than non-sparse, e.g.
> In [1]: import scipy as sp; from scipy import sparse
> In [2]: A = sparse.lil_matrix((1000,1000))
> In [3]: A.setdiag(sp.rand(1000))
> In [4]: A = A.tocsr()
> In [5]: x = sp.rand(1000)
> In [6]: timeit sp.dot(A,x)
> 10 loops, best of 3: 110 ms per loop
> In [7]: timeit sp.dot(A.todense(),x)
> 100 loops, best of 3: 3.33 ms per loop
>
>

Reply all
Reply to author
Forward
0 new messages