odd string attribute behavior

13 views
Skip to first unread message

ken.c....@gmail.com

unread,
Nov 25, 2020, 5:11:12 PM11/25/20
to pytables-users
I am working with an HDF5 file created by an upstream application (written in C++). It adds 7 attributes to a Group; 6 are "strings" and 1 is an integer array. I have discovered some odd behavior with the string attributes:
1) When I access with Groupname._v_attrs[name], one is is returned as type <class 'str'> and the other 5 are returned as type <class 'numpy.bytes_'>. I also checked with h5py, I get type <class 'str'> for all 6 string attributes. I contacted the developer, and all strings are created with similar C++ STRUCT definitions (only the length/STRSIZE of each changes).
2) There is a semi-related issue. The 5  numpy.bytes_  attributes are stored in the user set and the <class 'str'>  attribute is stored in the system attribute set. Any connection to the first issue? Did this get set by the C++ code? If so, what should I have the developer look for?

I'm not sure if this is a PyTables issue or something in the C++ code. (I know very little about C++ so can't provide more details.)

Thanks in advance for any tips.
Regards,
Ken

Antonio Valentino

unread,
Nov 26, 2020, 1:55:29 AM11/26/20
to pytable...@googlegroups.com
Dear Ken,

Il 25/11/20 23:11, ken.c....@gmail.com ha scritto:
> I am working with an HDF5 file created by an upstream application (written
> in C++). It adds 7 attributes to a Group; 6 are "strings" and 1 is an
> integer array. I have discovered some odd behavior with the string
> attributes:
> 1) When I access with *Groupname._v_attrs[name]*, one is is returned as
> type *<class 'str'>* and the other 5 are returned as type *<class
> 'numpy.bytes_'>*. I also checked with h5py, I get type *<class 'str'> *for
> all 6 string attributes. I contacted the developer, and all strings are
> created with similar C++ STRUCT definitions (only the length/STRSIZE of
> each changes).
> 2) There is a semi-related issue. The 5 *numpy.bytes_* attributes are
> stored in the *user* set and the *<class 'str'> * attribute is stored in
> the *system* attribute set. Any connection to the first issue? Did this get
> set by the C++ code? If so, what should I have the developer look for?
>
> I'm not sure if this is a PyTables issue or something in the C++ code. (I
> know very little about C++ so can't provide more details.)

Indeed it seems to be a strange behavior.
IMHO it is possible that user and sys attrs are treated in a different
way, I mean that code paths are different if I remember correctly.

Unfortunately it is very hard to figure out whats happen without looking
at data.

Is it possible for you to share the dataset (assuming it is not too big)?
Also an hdf5dump of the group attributes would help.

I recommend to open an issue on https://github.com/PyTables/PyTables.


Thanks

--
Antonio Valentino
Reply all
Reply to author
Forward
0 new messages