variable length string and endianness

107 views
Skip to first unread message

Bertrand B.

unread,
Apr 11, 2014, 12:33:43 PM4/11/14
to h5...@googlegroups.com
Hello everybody,

I have a problem to open a hdf5 file with h5py (using h5py-2.2.1) containing variable length strings when the file is created on a x86 cluster (little endian) and read on a power cluster (big endian).

I do not have any issue with fixed length strings.

I can read and write a file with h5py containing variable length strings if I read it using the same system I used to create the file. I have this error when I try to read the power file on the x86 cluster and the x86 file on the power cluster :

Failed to find converter for 8 -> PYTHON:OBJECT

I am able to dump the info from the files using h5dum without any problem (any cluster can dump the info from both files without any issue).

Comparing the byte information (the files have exactly the same size), only a few bytes are different between the 2 files. I was wondering if one of this byte was referring to the endianness of the file which could impact how the length of the strings are stored.

Here is the code I used to read and then write the file on both systems :

To write :
import h5py
file = h5py.File('vlstring_p7.h5','w')
str_type = h5py.new_vlen(str)
dataset = file.create_dataset("DSvariable",(4000,), dtype=str_type)
data=()
for i in range(1000):
    data += ("Parting", " is such", " sweet", " sorrow...")
dataset[...] = data
file.close()

To read:
import h5py
file = h5py.File('vlstring_p7.h5', 'r')
dataset = file['DSvariable']
data_out = dataset[...]
for i in range(4):	
    print "DSvariable[",i,"]", "'"+data_out[i]+"'", "has length", len(data_out[i])

print data_out
file.close()

I have joined the 2 files, one produced on a power7 cluster the other on a x86 cluster.

Thank you for your help,

Cheers,

Bertrand
vlstring_x86.h5
vlstring_p7.h5

Andrew Collette

unread,
Apr 16, 2014, 11:28:33 AM4/16/14
to h5...@googlegroups.com
Hi Bertrand,

> I have a problem to open a hdf5 file with h5py (using h5py-2.2.1) containing
> variable length strings when the file is created on a x86 cluster (little
> endian) and read on a power cluster (big endian).

Thanks for letting us know! I'm surprised this is an issue; both
types involved have no concept of endian-ness.

I've created a GitHub issue to track this:

https://github.com/h5py/h5py/issues/428

I have to caution you that it may be a while before we're able to fix
this, as I don't have access to a POWER machine.

Andrew
Reply all
Reply to author
Forward
0 new messages