Dear Togo,
The problem here is that the `File` constructor of the `h5py` library is
trying to call `seek` on the file handle `f` with a value of `whence`
that is not supported.
The `seek` allows them to effectively move the read position to a
certain byte in the stream.
However, in AiIDA, we cannot allow this because the disk object-store
library (which manages the file repository) stores multiple files in one
single big binary file.
If we were to allow reading beyond the boundaries of the file that was
requested, you could start reading the contents of other files.
The only solution currently is the workaround is the one you already
proposed: copying the file to a temporary file on disk.
Since it only needs to be temporary, I would maybe adapt the example
slightly differently:
import shutil
import tempfile
with node.open(mode='rb') as source:
with tempfile.TemporaryFile(mode='rb') as target:
shutil.copyfileobj(source, target) # Copy the content of
source to target in chunks
target.seek(0) # Make sure to reset the pointer to the
beginning of the stream
hf = h5py.File(target)
By using `tempfile` you don't have to worry whether the filename already
exists.
And by using `shutil.copyfileobj` you make sure the file is copied in
chunks and won't be read into memory entirely.
This is important if the file is potentially big and may not fit in
available memory.
It is a bit annoying and efficient to have to do this, but I don't think
there is another way for now.
It would maybe make sense to create a `Hdf5Data` class that subclasses
`SinglefileData` that does this trick automatically when accessing the
underlying file.
Hope that helps,
Sebastiaan