Storing arbitrary binary files with opaque types

651 views
Skip to first unread message

cros...@gmail.com

unread,
Jan 24, 2013, 10:57:32 AM1/24/13
to h5...@googlegroups.com
Hi,

I know that it is technically possible to store arbitrary files in hdf5 
using opaque types, but I can't seem to implement a methodology in h5py 
where this works as I would have expected. Is there a straight forward 
way of doing this with the h5py api? How should accessing the arbitrary data work from h5py?

Thanks,
Alexander Crosby

RPS ASA
55 Village Square Drive
South Kingstown, RI 02879-8248
USA
Tel:     +1 (401) 789-6224
Fax:    +1 (401) 789-1932

Andrew Collette

unread,
Jan 24, 2013, 12:26:53 PM1/24/13
to h5...@googlegroups.com
Hi,

> I know that it is technically possible to store arbitrary files in hdf5
> using opaque types, but I can't seem to implement a methodology in h5py
> where this works as I would have expected. Is there a straight forward
> way of doing this with the h5py api? How should accessing the arbitrary data
> work from h5py?

Right now the h5py type mapping system can't represent opaque types.
You could create an opaque dataset using the low-level interface, but
there's no way to read and write the data.

There are ongoing discussions about how to support this feature, as
part of the discussion about improved Unicode support in future
versions of h5py. One possibility is to add support for NumPy void
(kind "V") arrays and scalars, and map these to opaque types in the
file. Another is to reclassify NumPy byte strings (kind "S") as
opaque, although this would be a big change. We welcome community
input on this topic.

If you absolutely need to store binary data right now, you could use
fixed-length NumPy byte strings (kind "S"), or numpy.uint8.

Andrew

Jeff Teeters

unread,
Jan 24, 2013, 1:08:11 PM1/24/13
to h5...@googlegroups.com
I've used code like the following to store jpeg images in HDF5 using h5py.  They are stored in the original jpeg binary format as a string data set.  Works great.  I think this will work for storing any binary data.  So maybe the opaque type is not necessary.

    # read in jpeg file.  filename== name of jpeg file
    fin = open(filename, 'rb')
    binary_data = fin.read()
    # create dataset containing jpeg binary data.  Assumes "group" is an h5py group object
    ds = group.create_dataset(dataset_name, data=binary_data)




Andrew

--



Andrew Collette

unread,
Jan 24, 2013, 1:16:30 PM1/24/13
to h5...@googlegroups.com
Hi,

> I've used code like the following to store jpeg images in HDF5 using h5py.
> They are stored in the original jpeg binary format as a string data set.
> Works great. I think this will work for storing any binary data. So maybe
> the opaque type is not necessary.

Yes, this will work fine. The only oddity is that as you point out it
stores the binary data in a string dataset, which in theory is
supposed to hold only encoded text. That's the main motivation to
adding support for opaque types: raw binary data isn't mixed up with
the string types.

Andrew

cros...@gmail.com

unread,
Jan 24, 2013, 1:56:44 PM1/24/13
to h5...@googlegroups.com
Thanks for the responses!
Reply all
Reply to author
Forward
0 new messages