How to read fields from compound dataset dynamically

656 views
Skip to first unread message

Jennifer Olsen

unread,
May 23, 2013, 11:07:29 PM5/23/13
to h5...@googlegroups.com
Hi,

I am trying to read a subset of the columns in a dataset (compound type).  However, when I try to do so using a list, I get an error.  See below

import h5py
import numpy

f = h5py.File('myfile.hdf5')
my_dtype = numpy.dtype([('field1', 'i'), ('field2', 'f')])
ds = f.create_dataset('ds', (3,3), dtype=my_dtype)

ds['field1','field2'] works just as expected ....

What I really want is to be able to select using a list of field names.  Any way I can accomplish this?  Looping through the list and requesting a field at a time is undesirable since it will reread the file every time (I assume, since it takes almost the same amount of time to read 1 field as it does 2 fields).

when I do this, I get an error.  I also tried ds[*names], but that doesn't work either.  See below:

names = ['field1', 'field2']
ds[names]

----

In [1]: import h5py
In [2]: import numpy
In [3]: f = h5py.File('myfile.hdf5')
In [4]: my_dtype = numpy.dtype([('field1', 'i'), ('field2', 'f')])
In [5]: ds = f.create_dataset('ds', (3,3), dtype=my_dtype)
In [6]: ds['field1','field2']
Out[6]:
array([[(0, 0.0), (0, 0.0), (0, 0.0)],
       [(0, 0.0), (0, 0.0), (0, 0.0)],
       [(0, 0.0), (0, 0.0), (0, 0.0)]],
      dtype=[('field1', '<i4'), ('field2', '<f4')])

In [7]: names = ['field1', 'field2']
In [8]: ds[names]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-e6921c938272> in <module>()
----> 1 ds[names]

/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.pyc in __getitem__(self, args)
    312
    313         # Perform the dataspace selection.
--> 314         selection = sel.select(self.shape, args, dsid=self.id)
    315
    316         if selection.nselect == 0:

/usr/lib/python2.7/dist-packages/h5py/_hl/selections.pyc in select(shape, args, dsid)
     88             except Exception:
     89                 sel = FancySelection(shape)
---> 90                 sel[args]
     91                 return sel
     92

/usr/lib/python2.7/dist-packages/h5py/_hl/selections.pyc in __getitem__(self, args)
    449         self._id.select_none()
    450         for idx, vector in enumerate(argvector):
--> 451             start, count, step, scalar = _handle_simple(self.shape, vector)
    452             self._id.select_hyperslab(start, count, step, op=h5s.SELECT_OR)
    453

/usr/lib/python2.7/dist-packages/h5py/_hl/selections.pyc in _handle_simple(shape, args)
    514         else:
    515             try:
--> 516                 x,y,z = _translate_int(int(arg), length)
    517                 s = True
    518             except TypeError:

ValueError: invalid literal for int() with base 10: 'field1'


In [9]: h5py.version.version
Out[9]: '2.0.1'


Thank you in advance

Andrew Collette

unread,
May 23, 2013, 11:35:17 PM5/23/13
to h5...@googlegroups.com
Hi,

> I am trying to read a subset of the columns in a dataset (compound type).
> However, when I try to do so using a list, I get an error. See below

> What I really want is to be able to select using a list of field names. Any
> way I can accomplish this? Looping through the list and requesting a field
> at a time is undesirable since it will reread the file every time (I assume,
> since it takes almost the same amount of time to read 1 field as it does 2
> fields).
>
> when I do this, I get an error. I also tried ds[*names], but that doesn't
> work either. See below:
>
> names = ['field1', 'field2']
> ds[names]

It seems crazy, but:

ds[tuple(names)]

will work fine.

This is an artifact of how __getitem__ works in Python; rather than
function-style *args, the method is always defined with a single
argument, like so:

def __getitem__(self, args)

When you index with multiple axes, e.g. foo[1,2,3], you're actually
passing a tuple (1,2,3) via this argument. There's actually code in
h5py's __getitem__ which checks the type of the argument; that's why
other sequence items (a list, in this case) don't work.

Andrew

Jennifer Olsen

unread,
May 24, 2013, 1:44:07 AM5/24/13
to h5...@googlegroups.com


On Thursday, May 23, 2013 10:35:17 PM UTC-5, Andrew Collette wrote:
Hi,

> I am trying to read a subset of the columns in a dataset (compound type).
> However, when I try to do so using a list, I get an error.  See below

> What I really want is to be able to select using a list of field names.  Any
> way I can accomplish this?  Looping through the list and requesting a field
> at a time is undesirable since it will reread the file every time (I assume,
> since it takes almost the same amount of time to read 1 field as it does 2
> fields).
>
> when I do this, I get an error.  I also tried ds[*names], but that doesn't
> work either.  See below:
>
> names = ['field1', 'field2']
> ds[names]

It seems crazy, but:

ds[tuple(names)]

will work fine.

Thank you for the solution.  I'd like to add that this for future googlers: 

if you want to add a slice to select a subset of the data as well, try:

t=tuple(names+[slice(0,1)])
ds[t]
Reply all
Reply to author
Forward
0 new messages