How to save a Python dtype('object') in an HDF5 file?

6,250 views
Skip to first unread message

nicot...@googlemail.com

unread,
Jul 23, 2013, 10:22:31 AM7/23/13
to h5...@googlegroups.com

If have a problem storing np.arrays but i was not able to fix it by now. Maybe someone has an idea.

a = np.arange(-180,181)
b = np.arange(-82,83)

grid = [ [ np.zeros((0), dtype=float32) for i in range(len(b)) ] for j in range(len(a)) ]
 
So i am creating a nested list of lists with a np.array which I fill with several values.
 

so for grid[0][0] I might have np.array((1,2.3,2.5, dtype=float32) and if I try to export it to an hdf5 file via

f = h5py.File('test.hdf5','w')
f.create_dataset('grid', data=grid)

I receive:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\...\python-2.7.3.amd64\lib\site-packages\h5py\_hl\group.py", line 71, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "C:\...\python-2.7.3.amd64\lib\site-packages\h5py\_hl\dataset.py", line 89, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5t.pyx", line 1361, in h5py.h5t.py_create (h5py\h5t.c:12530)
  File "h5t.pyx", line 1433, in h5py.h5t.py_create (h5py\h5t.c:12380)
TypeError: Object dtype dtype('object') has no native HDF5 equivalent

I tried to change the np.array to a list  so that:

grid = [ [ [] for i in range(len(b)) ] for j in range(len(a)) ]
grid[0][0] = [1,2.3,2.5]
 
and each entry of the grid is a list, but still the export to test.hdf5 fails with the same Error Code.
 
Has anyone an idea how to store 3-dimensional arrays (or nested lists) with h5py. Unfortunately the doc does not provide such a case.
Thanks in advance!

Thomas Caswell

unread,
Jul 23, 2013, 8:55:15 PM7/23/13
to h5...@googlegroups.com
I am confused by what you are trying to do.  

hdf doesn't know how to deal with python objects (such as list).  h5py maps hdf functionality on to numpy, but the capabilities of hdf are a subset of the numpy capabilities.   You can not save to disk a numpy array where the dtype is `object` or something that can not be cleanly coerced to a numpy array with a supported dtype.

Saving n-dimensional arrays to files is easy, for example

f = h5py.File('test.hdf5','w', driver='core') # core so we don't really make a file, in memory only
grid = numpy.random.rand(5, 5, 5)
f.create_dataset('grid', data=grid)

--
Thomas Caswell
tcas...@gmail.com

Nico Trebbin

unread,
Jul 24, 2013, 5:02:38 AM7/24/13
to h5...@googlegroups.com
What I am trying to do, is to create a 2d "grid" with 361x165 elements, where I store several (different length!) np.arrays as values for the 3d dimension. In the end I have an array (or in my case nested list) with the shape = (361,165,var), where var is a np.array with the shape (varL,). Due to the fact that my 3rd dimension is variable and not fixed for each grid point, I can't use a ndarray right from the beginning, because there might be gridpoints that still have an empty np.array after appending several values.
 
So I think the problem is, that my third dimension is not fixed in length and therefore h5py fails, right?
Any idea how to circumvent this problem?


2013/7/24 Thomas Caswell <tcas...@gmail.com>

--
You received this message because you are subscribed to a topic in the Google Groups "h5py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h5py/2htGgZ2EHPI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h5py+uns...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

nicot...@googlemail.com

unread,
Jul 24, 2013, 7:18:47 AM7/24/13
to h5...@googlegroups.com
Ok, I somehow solved it.
 
h5py seems to have a problem with my variable 3rd dimension. If I fill up each grid point with 'nan', h5py is able to store the data.
For the completness here is my solution:
 
1) Search for the max len of np.arrays
 
max_len = np.zeros((0,1), dtype=int)
for d1 in range(len(a)):
    for d2 in range(len(b)):
        max_len = np.append(max_len, np.size(grid[d1][d2]))
max_len = np.max(max_len)
 
2) Appending nan's to fill each array up to max_len
 
for d1 in range(len(a)):
    for d2 in range(len(b)):
        totalgrid[d1][d2] = np.append(totalgrid[d1][d2], np.array([nan]*(max_dim-len(grid[d1][d2]))))
 

Andrew Collette

unread,
Jul 24, 2013, 9:22:31 AM7/24/13
to h5...@googlegroups.com
Hi Nico,

On Wed, Jul 24, 2013 at 5:02 AM, Nico Trebbin
<nicot...@googlemail.com> wrote:
> What I am trying to do, is to create a 2d "grid" with 361x165 elements,
> where I store several (different length!) np.arrays as values for the 3d
> dimension. In the end I have an array (or in my case nested list) with the
> shape = (361,165,var), where var is a np.array with the shape (varL,). Due
> to the fact that my 3rd dimension is variable and not fixed for each grid
> point, I can't use a ndarray right from the beginning, because there might
> be gridpoints that still have an empty np.array after appending several
> values.

You're correct that right now h5py does not support such "ragged"
arrays (which is annoying, because HDF5 does). The native HDF5 way of
representing this is very close to your array-of-arrays solution; it
would be a 361x165 dataset of "variable-length" vectors. Right now
the only variable-length types supported in h5py are strings.

There's a tracking issue (#48) a pull request (#291) at GitHub which
adds read support for generic variable-length types, although it
requires a large amount of testing. Community efforts for this
feature are always welcome!

Andrew
Reply all
Reply to author
Forward
0 new messages