On Tue, Sep 25, 2012 at 12:02 PM, Andrew Collette
<
andrew....@gmail.com> wrote:
>> My team and I need to tweak our design to ensure that h5py.File instances
>> are automatically closed when they are no longer needed.
>>
>> Since we probably aren't the first h5py users to face this issue, we figured
>> we should ask if you can recommend any design patterns for handling it. If
>> we share h5py Files and Groups between objects that may be destroyed at
>> different times, is there an easy way to ensure that the last owner of the
>> file closes it?
>
> At the moment this is a really hard problem. If you only want to
> auto-close File objects, then as you say it's a simple addition of
> __del__, or use weakrefs which point to a master File object, which
> you then close with a callback. I think the best way to handle this
> on my side is to add an enhancement ticket to close files when there
> are no remaining identifiers open. I can think of a few ways this
> might be done, but they all require mucking around in the Cython layer
> underneath h5py, and I'm not sure at present how that would interact
> with the identifier system. It's something I would consider for h5py
> 2.2.
When an ObjectID (group, dataset, etc) goes out of scope and is
deallocated, we remove the low level hdf5 id from the object registry.
However, FileIDs set their self.locked property to True, such that
h5py does not tell hdf5 to reclaim the id in its own object registry
(and therefore the FileID is also not deleted from the h5py object
registry). It might be possible to do something like:
f=File(...)
f.fid.locked = False
to make h5py auto-close the file object when it goes out of scope.
This might require overriding FileID.__dealloc__, along the lines of
def __dealloc__(self):
if not self.locked:
self.close()
super(FileID, self).__dealloc__()
in order to get the desired behavior. We might even expose an
"auto_close" kwarg in the high-level File constructor. On the other
hand, I'm sure we would get an endless series of questions like:
>>> f=File(..., auto_close=True)
>>> g=f['foo']
>>> g.file
>>> print f
"<Closed HDF5 file>"
"How come f is closed? It didn't go out of scope!"
Here is another half-baked idea. Any time a new ObjectID is created,
it calls h5i.inc_ref(h5i.get_file_id(...)). When any ObjectID goes out
of scope, it calls h5i.dec_ref(h5i.get_file_id(...)). Then we wouldn't
need to "lock" FileIDs, they would only be removed from the hdf5 and
h5py registries when their hdf5 reference count drops to zero. In
order for that to work, every time an ObjectID went out of scope, it
would have to call _objects.registry.cleanup() to find and remove any
potentially invalid FileIDs before the underlying hdf5 id was recycled
by the hdf5 library. I think this approach would probably yield ideal
behavior, but unsatisfactory performance.
Darren