Copying attributes

1,846 views
Skip to first unread message

Konrad Hinsen

unread,
Jun 10, 2013, 7:00:04 AM6/10/13
to h5...@googlegroups.com
Hi everyone,

I wonder what the best way is to copy attributes from one dataset to another.
The problem is that some values obtained by reading an attribute are not
legal for writing. Here's an example:

import h5py
import numpy as np

f = h5py.File('test.h5', 'w')

ds1 = f.create_dataset('foo1', data=42)
ds1.attrs.create('comments',
np.array(['nothing', 'really', 'important'], dtype=object),
shape=(3,), dtype=h5py.special_dtype(vlen=str))

ds2 = f.create_dataset('foo2', data=2*ds1[...])
for name, value in ds1.attrs.items():
ds2.attrs[name] = value

f.close()

In my real application ds1 would have been created by a different program,
so when I need to do the copy I have no information about how the attribute
was originally constructed.

Is there a solution that avoids special-case handling for variable-length strings?

Konrad.

Andrew Collette

unread,
Jun 10, 2013, 9:50:01 AM6/10/13
to h5...@googlegroups.com
Hi Konrad,

> I wonder what the best way is to copy attributes from one dataset to another.
> The problem is that some values obtained by reading an attribute are not
> legal for writing. Here's an example:

At the moment you will have to special-case things like
variable-length strings. I think the long-term solution is to improve
the type guessing so it can infer the contents of object arrays.

By the way, if you're copying a whole dataset over you may wish to use
Group.copy:

http://www.h5py.org/docs/high/group.html#h5py.Group.copy

which should copy over the object and all of its attributes, using the
HDF5 library directly.

Andrew

Konrad Hinsen

unread,
Jun 10, 2013, 12:18:39 PM6/10/13
to h5...@googlegroups.com
Hi Andrew,

> At the moment you will have to special-case things like
> variable-length strings. I think the long-term solution is to improve
> the type guessing so it can infer the contents of object arrays.

OK, at least I won't be bitten by bad conscience for special-casing ;-)

> By the way, if you're copying a whole dataset over you may wish to use
> Group.copy:
>
> http://www.h5py.org/docs/high/group.html#h5py.Group.copy

I am using that quite extensively, but when the goal is to remove the date
and keep only the attributes, it's definitely not a good choice ;-)

> which should copy over the object and all of its attributes, using the
> HDF5 library directly.

It works very well, except for object references. If you copy a group
containing lots of objects, some of which have references to others within
the same group, you'd expect that the references come over correctly, but
they don't. The copies reference are always invalid. That turns into a
serious performance problem, since the only way I found to fix this is to
add a recursive tree traversal that looks for references and fixes them.
That takes more time than the copy itself.

But I'll stop ranting, I'd rather have HDF5 with it little quirks than not
have HDF5 at all ;-)

Konrad.




Andrew Collette

unread,
Jun 10, 2013, 12:39:17 PM6/10/13
to h5...@googlegroups.com
Hi Konrad,

> It works very well, except for object references. If you copy a group
> containing lots of objects, some of which have references to others within
> the same group, you'd expect that the references come over correctly, but
> they don't. The copies reference are always invalid. That turns into a
> serious performance problem, since the only way I found to fix this is to
> add a recursive tree traversal that looks for references and fixes them.
> That takes more time than the copy itself.

That seems like a serious problem. Have you reported it to the HDF Group?

Andrew

Konrad Hinsen

unread,
Jun 13, 2013, 3:11:14 AM6/13/13
to h5...@googlegroups.com
--On 10 juin 2013 10:39:17 -0600 Andrew Collette
<andrew....@gmail.com> wrote:

>> It works very well, except for object references. If you copy a group
...

> That seems like a serious problem. Have you reported it to the HDF Group?

No, I thought this was the expected behavior, though I can't find any
documentation saying so, nor the opposite. I'll ask on the HDF forum.

Konrad.




Konrad Hinsen

unread,
Jun 15, 2013, 5:34:06 AM6/15/13
to h5...@googlegroups.com
--On 13 juin 2013 09:11:14 +0200 Konrad Hinsen
<google...@khinsen.fastmail.net> wrote:

>> That seems like a serious problem. Have you reported it to the HDF
>> Group?
>
> No, I thought this was the expected behavior, though I can't find any
> documentation saying so, nor the opposite. I'll ask on the HDF forum.

The reply I got says that the HDF5 documentation is wrong (and will be
fixed) in claiming that the references are updated. References in the copy
are set to zero by default, making them invalid. In order to get valid
references, the copy parameters have to include
H5O_COPY_EXPAND_REFERENCE_FLAG. This also copies the objects, which in my
situation happens anyway.

Now I am back to h5py. After a quick scan of the code, I come to the
conclusion that in order to set H5O_COPY_EXPAND_REFERENCE_FLAG, I need to
do the whole copy operation through the low-level API. Is that right?

Konrad.




Andrew Collette

unread,
Jun 15, 2013, 4:51:56 PM6/15/13
to h5...@googlegroups.com
Hi Konrad,

> Now I am back to h5py. After a quick scan of the code, I come to the
> conclusion that in order to set H5O_COPY_EXPAND_REFERENCE_FLAG, I need to do
> the whole copy operation through the low-level API. Is that right?

Yes, it looks that way, although thankfully both H5Ocopy and the H5P
functions are wrapped by h5py:

http://www.h5py.org/docs/low/h5o.html#h5py.h5o.copy
http://www.h5py.org/docs/low/h5p.html#h5py.h5p.PropCopyID.set_copy_object

I think this is a good case for adding some keyword options to
Group.copy, if anyone's interested in contributing such a feature.

Andrew

Konrad Hinsen

unread,
Jun 16, 2013, 3:31:23 AM6/16/13
to h5...@googlegroups.com
--On 15 juin 2013 14:51:56 -0600 Andrew Collette
<andrew....@gmail.com> wrote:

> I think this is a good case for adding some keyword options to
> Group.copy, if anyone's interested in contributing such a feature.

I volunteer to do this - it's no more work than doing it just for my own
code.

Konrad.

Message has been deleted

Vang Le Quy

unread,
Jan 20, 2014, 9:34:29 AM1/20/14
to h5...@googlegroups.com
FYI, I currently do it this way:

            target_file.copy(source_file[group_name],group_name,shallow=True)
            for sub_group in source_file[group_name].keys():
                if sub_group in target_file[group_name]:
                    target_file[group_name].__delitem__(sub_group)
Namely, first do shallow copy of the group to be quick, and then remove any subgroup/items below it. There is also "name" parameter in the copy function, which may allow to copy and change name at the same time.
Reply all
Reply to author
Forward
0 new messages