How to store really large python bytes string into dataset

68 views
Skip to first unread message

anima...@gmail.com

unread,
Jun 19, 2023, 4:06:57 AM6/19/23
to h5py
Hello,
I do have a really large(5 MB) Python bytes type buffer object and others which are of the same time but much bigger in terms of size.

>>> print(type(buffer))which
>>> <class 'bytes'>
>>> print(buffer)
>>> b"POINTS\x02\x03\x00\x01\........."

When storing this into an h5 file:
dt = h5py.special_dtype(vlen=str)
with h5py.File("cube.h5", "w") as f:
    dset = f.create_dataset("cube", buffer, dt)
    dset.attrs["1"] = "hello"

I'm getting an error:
ValueError: Dimensionality is too large (dimensionality is too large)

How do I store it? and I believe retrieval would be easier if I'm not wrong.
with h5py.File('cube.hdf5','r') as f:
    print(f['cube'])
    print(f['cube'].attrs["1"])

Cheers

Prashant



Thomas Kluyver

unread,
Jun 19, 2023, 11:12:02 AM6/19/23
to h5py
Hi Prashant,

This Google group has been closed in favour of the h5py board on the HDF forum:

I thought I'd stopped everyone posting, but seemingly not. I've changed the settings now, so you probably won't be able to post again. Go to the forum if you want any follow-up.

To briefly respond to your question: it looks like you're trying to store another format inside HDF5, which usually isn't a great idea. It's probably better to either write this 'POINTS' format directly to a file (use something like zip if you want to combine multiple in one file), or design a meaningful way to store the information in HDF5 without this other format.

If you really do need to store some arbitrary bytes in an HDF5 file, you can't use HDF5 string types, because strings can't contain null bytes. You'd need either a 1D array of uint8 datatype, or an opaque datatype.

Best wishes,
Thomas
Reply all
Reply to author
Forward
0 new messages