Save numpy.dot result matrix directly to HDF5 backed table without saving in memory

33 views
Skip to first unread message

logstar Z

unread,
Sep 5, 2019, 7:49:47 PM9/5/19
to pytables-users
Hi,

I am trying to compute a (400000, 400000) dot product. As it will not fit into the memory, I am wondering if I could directly dump the dot product result matrix to disk as a HDF5 file.

I tried the following, but it did not work:

```
import tables
import numpy as np

x = np.random.random((400000, 3))

f = tables.open_file('test.hdf', 'w')
atom = tables.Atom.from_dtype(x.dtype)
out_arr = f.create_array(f.root, 'somename1', atom=atom, shape=(400000, 400000))

np.dot(x, x.T, out=out_arr)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-3cd3f13a902d> in <module>()
----> 1 np.dot(x, x.T, out=out_arr)

TypeError: 'out' must be an array
```

I also tried to use np.memmap as output array. It worked, but it only uses one process for the computation, which may take a very long time.

I would appreciate if someone could give me some pointers or solutions.

Thank you.

Daπid

unread,
Sep 6, 2019, 2:12:00 AM9/6/19
to logstar Z, pytables-users
That is half a terabyte (in single precision), are you sure you need the full matrix? Can you skip it somehow?

If you want to use memory-mapped arrays in parallel, make sure you have numpy linked against a parallelised BLAS. np.show_config should help you there. Another option is to roll the multiplication yourself in C, I don't think BLAS will be of much help here. Pythran would be my accelerator of choice here, but there are others.


/David.

--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pytables-users/2fed769c-fe9f-49dd-be84-d4c0eb97629d%40googlegroups.com.

logstar Z

unread,
Sep 6, 2019, 4:54:23 PM9/6/19
to pytables-users
Thank you for the pointers. I need the full dot product matrix, and I cannot skip it in my current method. 

I will look into the BLAS linking and pythran. Hopefully, I will have some luck on them. I will also try to come up with some alternative methods to avoid the dot product computation.

To unsubscribe from this group and stop receiving emails from it, send an email to pytable...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages