Scipy sparse large matrix multiplication and PyTables

358 views
Skip to first unread message

Philipp Singer

unread,
Aug 4, 2014, 1:58:58 PM8/4/14
to pytable...@googlegroups.com
Hi!

I am currently struggling with very large scale matrix multiplications. I am currently having a scipy.sparse.csr_matrix with shape (350363, 2526183) and have to multiply it with its transpose. The scipy sparse matrix easily fits into memory due to sparsity. However, the resulting matrix of X * X.T is very dense and does not fit in memory. So I thought of using PyTables for this approach. 

For updating the PyTables array I am using a chunked approach discussed here:

However, it is veryyyyy slow, as slicing sparse matrices is not the fastest way. Also, my data is immense.

Does anyone know how to best approach this?

Best,
Philipp

Anthony Scopatz

unread,
Aug 4, 2014, 4:56:55 PM8/4/14
to Philipp Singer, pytable...@googlegroups.com
Hi Phillip, 

Do you have something that works and is just slow? If so could you send us a minimal script that we could take a look at? I think this might be slow no matter what but maybe there is a way to make it less slow.

Be Well
Anthony


--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-user...@googlegroups.com.
To post to this group, send email to pytable...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philipp Singer

unread,
Aug 5, 2014, 4:50:26 AM8/5/14
to pytable...@googlegroups.com
Sure, this is my current approach:

a = np.random.rand(300,200)
b = a.T

f = tb.openFile('dot.h5', 'w')
filters = tb.Filters(complevel=5, complib='blosc')
out = f.createCArray(f.root, 'out', tb.Atom.from_dtype(a.dtype),
       shape=(l, n), filters=filters)

_MB = 2**20
OOC_BUFFER_SIZE = 1028*_MB * 2

buffersize = OOC_BUFFER_SIZE
bl = math.sqrt(buffersize / out.dtype.itemsize)
bl = 2**int(math.log(bl, 2))
for i in range(0, l, bl):
    out[:,i:min(i+bl, l)] = a.dot(b[:,i:min(i+bl, l)])


Hope have not forgotten something import while copying.

Best,
Philipp

Elliott Ash

unread,
Oct 3, 2016, 9:29:27 PM10/3/16
to pytables-users
Did you ever figure out the best way to do this? the column slices on PyTables are pretty slow. How to get a transpose of a CArray object?
Reply all
Reply to author
Forward
0 new messages