Is visititems expected to harvest hardlink?

37 views
Skip to first unread message

Trygve Aspenes

unread,
Apr 15, 2020, 11:07:06 AM4/15/20
to h5py
I have a hdf5 file with hardlink. I use visititems to loop over all groups and dataset to harvest the information.

As a test I create a test hdf5 file like this:

import h5py
import numpy as np
h
= h5py.File('test.h5', 'w')
# Create Group
g1
= h.create_group('test_group1')

# Add datasets
ds1_f
= g1.create_dataset('ds1_f',
                          shape
=(5, 10),
                          dtype
=np.float32,
                          data
=np.arange(5. * 10).reshape((5, 10)))
g2
= h.create_group('test_group2')
g2
['ds2_f'] = ds1_f
h
.close()


Resulting in a hdf5 file (From h5dump):

HDF5 "test.h5" {
GROUP
"/" {
   GROUP
"test_group1" {
      DATASET
"ds1_f" {
         DATATYPE  H5T_IEEE_F32LE
         DATASPACE  SIMPLE
{ ( 5, 10 ) / ( 5, 10 ) }
         DATA
{
         
(0,0): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
         
(1,0): 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
         
(2,0): 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
         
(3,0): 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
         
(4,0): 40, 41, 42, 43, 44, 45, 46, 47, 48, 49
         
}
     
}
   
}
   GROUP
"test_group2" {
      DATASET
"ds2_f" {
         HARDLINK
"/test_group1/ds1_f"
     
}
   
}
}
}

Then a small script using visititems:
import h5py

def visit_items(name, obj):
   
print(name, obj)

with h5py.File('test.h5', 'r') as fid:
    fid
.visititems(visit_items)
   
print(fid['/test_group1/ds1_f'])
   
print(fid['/test_group2/ds2_f'])

With this result:
test_group1 <HDF5 group "/test_group1" (1 members)>
test_group1
/ds1_f <HDF5 dataset "ds1_f": shape (5, 10), type "<f4">
test_group2
<HDF5 group "/test_group2" (1 members)>
<HDF5 dataset "ds1_f": shape (5, 10), type "<f4">
<HDF5 dataset "ds2_f": shape (5, 10), type "<f4">

So to me it looks like visititems does not find the hardlink dataset. Is this expected behavior?

Thomas Kluyver

unread,
Apr 15, 2020, 11:17:04 AM4/15/20
to h5...@googlegroups.com
I think that's expected behaviour. Visiting is meant to keep track of what it's seen and only visit each object once. It's possible to have cyclic links, so if it didn't do that, it could get stuck in an infinite loop.

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/e7b7fa19-ae31-4fcc-b9e8-b76e462142f8%40googlegroups.com.

Trygve Aspenes

unread,
Apr 15, 2020, 11:30:35 AM4/15/20
to h5py
Thanks

Ah ok. So it's no way to iterate over a hdf5 file with h5py to get an overview of the availbale dataset if one or more datsets actually is a hardlink?

To unsubscribe from this group and stop receiving emails from it, send an email to h5...@googlegroups.com.

Thomas Kluyver

unread,
Apr 15, 2020, 11:57:39 AM4/15/20
to h5...@googlegroups.com
If I've understood HDF5 correctly, then in your example, it's not one dataset and one hard link, it's two hard links pointing to one dataset. The output from h5dump appears to contradict this, but I'm inclined to think that's misleading unless someone can tell me otherwise.

If you want to identify "is this object something that occurs elsewhere in the same file?", you'd have to keep track of the object addresses you visit and check each new one against them. h5glance does this: https://github.com/European-XFEL/h5glance/blob/da6de67acf6c0a99f2f41e1f7aec372f5fe360bc/h5glance/terminal.py#L137-L145 (if you follow external links, which point to other files, you need to do more, because two objects may have the same address in different files).

Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/36433c91-fd8e-485a-8a5e-22b8fb03d0d1%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages