Re: Possible to Create Region References Outside of a File?

40 views
Skip to first unread message

Brendan Heberlein

unread,
Mar 21, 2018, 4:19:14 PM3/21/18
to h5...@googlegroups.com
Hello everyone,

Is there a way to create region references to a dataset from outside the file in which the dataset is stored? (e.g. from another file or, even better, from a temporary reference object in the python environment)

I have looked for this on the h5py documentation & on stackoverflow but have not found much to enlighten me.

This seems like it would be a very useful functionality which would provide somewhat of an analog to numpy's array views. Can anyone inform me on whether this is possible in current versions of h5py (& how it would be achieved), or weigh in on whether it may be a feature worth implementing in the future?

Thank you,
Brendan



On Tue, Mar 20, 2018 at 7:25 PM, Brendan Heberlein <bcon...@gmail.com> wrote:
Hello everyone,

Is there a way to create region references to a dataset from outside the file in which the dataset is stored? (e.g. from another file or, even better, from a temporary reference object in the python environment)

I have looked for this on the h5py documentation & on stackoverflow but have not found much to enlighten me.

This seems like it would be a very useful functionality which would provide somewhat of an analog to numpy's array views. Can anyone inform me on whether this is possible in current versions of h5py (& how it would be achieved), or weigh in on whether it may be a feature worth implementing in the future?

Thank you,
Brendan

bcon...@gmail.com

unread,
Aug 20, 2018, 5:11:48 AM8/20/18
to h5py
I don't know the best way to do this, but since nobody has offered any advice I will share a method I just discovered that makes this possible, albeit not as convenient as I might like.

It is possible to create a virtual file in memory, add a link to an object in an external file, and add a region reference in the virtual file that points to a region in the linked dataset.

This works something like this:

mem_obj = h5py.File(name='mem', driver='core', backing_store=False)
mem_obj['link'] = h5py.ExternalLink(file, "/path/to/dataset")
reg_ref = mem_obj['link'].regionref[a:b,c:d,:]
subset = mem_obj['link'][reg_ref]

I'd still love to hear anybody's suggestions if there is a better way to do this.

In addition, I think it may be worthwhile to add some functionality that makes it easier to accomplish this. In my work I may often want to use a subset of a large dataset for testing or to estimate statistics, and it is nice to be able to retain pointers to these data without having to modify the source file to add a region reference.
Reply all
Reply to author
Forward
0 new messages