How to change dataset values of existing file

7,241 views
Skip to first unread message

Jim Parker

unread,
Dec 6, 2013, 8:13:13 PM12/6/13
to h5...@googlegroups.com
Hello,
  I have .hdf5 files that I would like to modify a few values in (metadata that was incorrectly recorded).  I can read and parse them fine, but the data is immutable.

I cannot find any documentation on how to change values.  The only reference I found was a single post on this site in 2009 that indicated one had to create an new dataset (.create_dataset)
https://groups.google.com/forum/#!searchin/h5py/change$20value/h5py/aAqmKqtDygY/f5jyNKgInTMJ

Is this still the only way to change a value in an existing dataset?

Cheers,
--Jim

Jim Parker

unread,
Dec 6, 2013, 8:50:54 PM12/6/13
to h5...@googlegroups.com
Ok,
  I probably should have been more specific on the question. 
The dataset in question is a scalar array of a compound type (see below).

I can easily edit data in the "IR Data" dataset,
however, I'm lost as to how to modify the "MetaData" dataset

Cheers,
--Jim

HDF5 "sub17_cam10001.hdf5" {
GROUP "/" {
   DATASET "IR Data" {
      DATATYPE  H5T_STD_U16LE
      DATASPACE  SIMPLE { ( 307, 240, 320 ) / ( 307, 240, 320 ) }
   }
   DATASET "MetaData" {
      DATATYPE  H5T_COMPOUND {
         H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         } "originalFilename";
         H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         } "subject";
         H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         } "timeDay";
         H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         } "pdStDev";
         H5T_IEEE_F64LE "ambTemp(C)";
         H5T_IEEE_F64LE "range(m)";
         H5T_IEEE_F64LE "atmTemp(C)";
         H5T_IEEE_F64LE "relEmissivity";
         H5T_IEEE_F64LE "pd(W/cm2)";
         H5T_IEEE_F64LE "humidity (%)";
         H5T_STD_I16LE "startShotFrame";
         H5T_STD_I16LE "endShotFrame";
         H5T_IEEE_F64LE "startShotTime";
         H5T_IEEE_F64LE "endShotTime";
     }
      DATASPACE  SCALAR

Andrew Collette

unread,
Dec 6, 2013, 11:43:15 PM12/6/13
to h5...@googlegroups.com
Hi Jim,

> I probably should have been more specific on the question.
> The dataset in question is a scalar array of a compound type (see below).
>
> I can easily edit data in the "IR Data" dataset,
> however, I'm lost as to how to modify the "MetaData" dataset

For updating (or reading) a compound dataset with h5py 2.2, you can
simply use the field name as a slicing argument:

dset = f["MetaData"]
dset["endShotTime"] = 42

One cool thing is that for multidimensional compound datasets, as
opposed to your scalar example, you can also freely mix field names
and regular slicing arguments. For example, if you had a dataset of
shape (100,100) with fields "x" and "y":

another_dset["x", 0, 0:10] = 42

would update a total of 10 elements in field "x", and leave field "y" alone.

Andrew

Jim Parker

unread,
Dec 7, 2013, 11:02:08 AM12/7/13
to h5...@googlegroups.com
Andrew,
  Thanks for the quick reply.
 
I should have checked if a newer version was available...
I upgraded from h5py 2.0.1 to 2.2
I'm using hdf5 1.8.4 that comes with Ubuntu 12.04
and python 2.7.3 also native to Ubuntu

Unfortunately, that did not fix my problem, but did give me a new error...
If I try:

import h5py
h5=h5py.File('sub17_cam10001.hdf5')
dset=h5['MetaData']
dset['endShotTime']=4

I get a Segmentation fault.

Under 2.0.1, the error was
*** RuntimeError: unable to create link (Links: Unable to initialize object)

Cheers,
--Jim

Andrew Collette

unread,
Dec 9, 2013, 11:49:36 AM12/9/13
to h5...@googlegroups.com
Hi Jim,

> Unfortunately, that did not fix my problem, but did give me a new error...
> If I try:
>
> import h5py
> h5=h5py.File('sub17_cam10001.hdf5')
> dset=h5['MetaData']
> dset['endShotTime']=4
>
> I get a Segmentation fault.
>
> Under 2.0.1, the error was
> *** RuntimeError: unable to create link (Links: Unable to initialize object)

Very strange... the 2.0.1 error is what you get when e.g. trying to
create a new object in the file when there's another of the same name.
And of course there should never be segfaults...

Is there any way you can send me an example file? If it's too big to
email (andrew dot collette at gmail dot com) let me know off-list and
we can work out some way to upload it.

Andrew

Jim Parker

unread,
Dec 9, 2013, 2:42:41 PM12/9/13
to h5...@googlegroups.com
Andrew,
  A sample file was sent to the email you provided.  I stripped the array data as a) it made the file 68 MB, b) that part works as desired.

Cheers,
--Jim

Jim Parker

unread,
Dec 9, 2013, 3:07:50 PM12/9/13
to h5...@googlegroups.com
Andrew,
  Perhaps this helps your troubleshooting


import h5py
h5=h5py.File('sub17_cam10001.hdf5')
ctype = h5['MetaData'].dtype
stuff = h5['MetaData'][()]
hh=list(stuff)
hh[0]='MakeRandomChange...'
gg=tuple(hh)

dset=h5.create_dataset('alt',(), ctype)
h5['alt']=gg
***fails with

RuntimeError: unable to create link (Links: Unable to initialize object)

however, if I make the dataset a non-scalar type, i.e. list-1
dset=h5.create_dataset('alt2',(1,), ctype)
h5['alt2']=gg

h5['alt2'][()] 
--prints out the data in gg

the following will not work even with the list-1 version
h5['alt2']['clt']='clt11'

the value of ['clt'] does not change.

The only way I could edit the data is to generate a completely new compound object and assign to
h5['alt2'], i.e. edit 'hh' and make a new tuple to assign to h5['alt2'] 

Cheers,
--Jim

Andrew Collette

unread,
Dec 9, 2013, 4:26:19 PM12/9/13
to h5...@googlegroups.com
Hi Jim (and others reading),

You've uncovered a very nasty bug in h5py's type conversion code.
I've created a critical issue at GitHub:

https://github.com/h5py/h5py/issues/372

Basically, in order to work around a longstanding type-conversion
issue in HDF5 itself (HDFFV-1063), we have a module in h5py which
manually converts certain kinds of data one element at a time. This
(very complex) module appears to have a bug which is triggered by the
use of field names during write. In your case it results in an
immediate segfault; in others it may lead to damage to other fields in
the type.

I'm putting together a bugfix release right now, which should be
available for download by the end of the day.

Andrew

Jim Parker

unread,
Dec 9, 2013, 5:15:33 PM12/9/13
to h5...@googlegroups.com
Andrew,
  I look forward to seeing the bugfix.

FWIW, I did find a method to edit the scalar array (that didn't seg fault).  Basically, if I make a copy of the structured object and use the assignment

h5['MetaData'][()]=gg

instead of
h5['MetaData']=gg

But hopefully, your bug fix will make it unnecessary to generate the entire object and allow assignments to each item in the object, ie.
h5['MetaData']['clt']='clt11'

Cheers,
--Jim

Andrew Collette

unread,
Dec 9, 2013, 6:04:12 PM12/9/13
to h5...@googlegroups.com
Hi Jim,

> FWIW, I did find a method to edit the scalar array (that didn't seg fault).
> Basically, if I make a copy of the structured object and use the assignment
>
> h5['MetaData'][()]=gg

Yes, this is the old (< h5py 2.2) way of updating compound types;
read the whole thing, modify it and write it back out.

> But hopefully, your bug fix will make it unnecessary to generate the entire
> object and allow assignments to each item in the object, ie.
> h5['MetaData']['clt']='clt11'

Yes, that's what this feature (new in 2.2) was for... unfortunately
this bug is subtle is cases which don't use variable-length strings,
so it looks like we missed it.

Andrew

Jim Parker

unread,
Dec 9, 2013, 11:21:34 PM12/9/13
to h5...@googlegroups.com
Andrew,
   With the update, I can modify elements that are of type int or float, but the variable length strings give a ValueError

Specifically,

<for integer fields>
In [5]: h5['MetaData']['startShotFrame']
Out[5]: 78

In [6]: h5['MetaData']['startShotFrame']=8

In [7]: h5['MetaData']['startShotFrame']
Out[7]: 8

<for floats>
In [9]: h5['MetaData']['pd(W/cm2)']
Out[9]: 2.3999999999999999

In [10]: h5['MetaData']['pd(W/cm2)']=4

In [11]: h5['MetaData']['pd(W/cm2)']
Out[11]: 4.0

All fine, but for strings
In [12]: type(h5['MetaData']['timeDay'])
Out[12]: str

In [3]: h5['MetaData']['timeDay']
Out[3]: '7/26/2012 8:44 AM'

In [4]: h5['MetaData']['timeDay']=''hello"
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/surfer/Desktop/<ipython-input-4-f244e3ca06b7> in <module>()
----> 1 h5['MetaData']['timeDay']='hello'

/usr/local/lib/python2.7/dist-packages/h5py/_hl/dataset.pyc in __setitem__(self, args, val)
    479             val = numpy.asarray(val, dtype=dtype, order='C')
    480             if cast_compound:
--> 481                 val = val.astype(numpy.dtype([(names[0], dtype)]))
    482         else:
    483             val = numpy.asarray(val, order='C')

ValueError: Setting void-array with object members using buffer.

Cheers,
--Jim

Andrew Collette

unread,
Dec 10, 2013, 1:53:21 AM12/10/13
to h5...@googlegroups.com
Hi Jim,

> With the update, I can modify elements that are of type int or float, but
> the variable length strings give a ValueError

Thanks for letting me know... the 2.2.1 update was basically an
emergency fix to avoid crashes/data corruption. I'm not surprised
there are some types for which it falls over. It may be a few days
before I can investigate this personally (flying out to a meeting
tomorrow), but the good news is that it seems to happen on the
pure-Python side, which should be much easier to fix than the
Cython/HDF5 internals.

Again, thanks for bringing this issue to our attention!

Andrew

Jim Parker

unread,
Dec 10, 2013, 9:42:45 AM12/10/13
to h5...@googlegroups.com
Andrew,
  You are welcome, but the thanks should go to you for making h5py available!  With the help you have provided, I can move forward.  I look forward to the full solution.

Cheers,
--Jim
Reply all
Reply to author
Forward
0 new messages