h5py 2 unicode string support for data with python 3

584 views
Skip to first unread message

Bruno Vieira

unread,
Aug 4, 2011, 10:20:52 AM8/4/11
to h5py
Hi,

I'm trying to save an array of characters to hdf5.
If a try to save the array to an h5py dataset with dtype 'U' I get "No
conversion path for dtype: dtype('<U0')".
With dtype 'S1' it works but then I get a dataset of byte char that I
need to np.char.decode() in order to get the unicode chars.
Is there an easier way around? Does h5py supports UTF-8 for data? The
doc page about unicode is empty (http://h5py.alfven.org/docs-2.0/
topics/unicode.html).

Thanks!
Bruno

Andrew Collette

unread,
Aug 5, 2011, 10:14:29 AM8/5/11
to h5...@googlegroups.com
Hi Bruno,

Right now the NumPy Unicode type (dtype "U") isn't supported, as there
isn't a wide-character type in HDF5. I am open to adding support for
this if a reasonable way can be found to represent UTF-32 strings
using the basic HDF5 types. A separate patch was contributed for
variable-length Unicode strings (e.g. the Python 2.X "unicode" type)
but it's not yet applied.

At the moment you will have to manually encode your data. Be careful
if you want to use utf-8, as the size of the final binary string you
get will likely be larger (more elements) than the size of your input
Unicode string.

Andrew

Reply all
Reply to author
Forward
0 new messages