Hello everybody,
I have a problem to open a hdf5 file with h5py (using h5py-2.2.1) containing variable length strings when the file is created on a x86 cluster (little endian) and read on a power cluster (big endian).
I do not have any issue with fixed length strings.
I can read and write a file with h5py containing variable length strings if I read it using the same system I used to create the file. I have this error when I try to read the power file on the x86 cluster and the x86 file on the power cluster :
Failed to find converter for 8 -> PYTHON:OBJECT
I am able to dump the info from the files using h5dum without any problem (any cluster can dump the info from both files without any issue).
Comparing the byte information (the files have exactly the same size), only a few bytes are different between the 2 files. I was wondering if one of this byte was referring to the endianness of the file which could impact how the length of the strings are stored.
Here is the code I used to read and then write the file on both systems :
To write :
import h5py
file = h5py.File('vlstring_p7.h5','w')
str_type = h5py.new_vlen(str)
dataset = file.create_dataset("DSvariable",(4000,), dtype=str_type)
data=()
for i in range(1000):
data += ("Parting", " is such", " sweet", " sorrow...")
dataset[...] = data
file.close()
To read:
import h5py
file = h5py.File('vlstring_p7.h5', 'r')
dataset = file['DSvariable']
data_out = dataset[...]
for i in range(4):
print "DSvariable[",i,"]", "'"+data_out[i]+"'", "has length", len(data_out[i])
print data_out
file.close()I have joined the 2 files, one produced on a power7 cluster the other on a x86 cluster.
Thank you for your help,
Cheers,
Bertrand