How to read sub dataset, a partial dataset

Solimyr

unread,

May 9, 2013, 11:47:49 AM5/9/13

to h5...@googlegroups.com

Dear All,

I'm new to h5py. I want to load an array organized like this: /root/level1. This is my code:

import h5py
import numpy
f = h5py.File('C:\\Temp\\CSKS2_SCS_B_S2_21_HH_RD_SF_20100731165458_20100731165505.h5','r')
group = f['root'/level1']
print group

The result of the print is:

<HDF5 dataset "level1": shape (10000, 11000, 2), type "<i2">

If I wanna store the two numpy arrays of 10k x 11k, how could I do? I tried this below but I know that is missing something, something that explain how to take that.

my_array = f['root'/level1'].value

Here the result:

Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
my_array = f['root'/level1'].value
File "C:\Python27_32\lib\site-packages\h5py\_hl\dataset.py", line 141, in value
return self[()]
File "C:\Python27_32\lib\site-packages\h5py\_hl\dataset.py", line 354, in __getitem__
arr = numpy.ndarray(mshape, new_dtype, order='C')
MemoryError

I checked on the docs without success. Do you know how could I do?

Thanks All,

Solimyr

Andrew Collette

unread,

May 9, 2013, 1:06:34 PM5/9/13

to h5...@googlegroups.com

Hi,

>> <HDF5 dataset "level1": shape (10000, 11000, 2), type "<i2">

>> my_array = f['root'/level1'].value

>> in __getitem__
>> arr = numpy.ndarray(mshape, new_dtype, order='C')
>> MemoryError

This means you don't have enough memory to read in the whole dataset
at once. You have 10,000 x 11,000 x 2 x (2 bytes) = 420 MB, which
evidently is too much to allocate.

Your best bet is to load only the parts you want, using the standard
NumPy slicing syntax. There's a section in the docs here:

http://www.h5py.org/docs/high/dataset.html#slicing-access

HTH,
Andrew

Solimyr

unread,

May 9, 2013, 5:40:39 PM5/9/13

to h5...@googlegroups.com

Done! Thank You Very Much!

Solimyr

Reply all

Reply to author

Forward