Are h5py.File objects closed when deleted?

1,170 views
Skip to first unread message

stuarteberg

unread,
Sep 21, 2012, 1:39:44 PM9/21/12
to h5...@googlegroups.com
Greetings,

Is h5py supposed to automatically close File objects when they are deleted?  My colleague and I have found that on Mac, Linux, and Windows, the file is not automatically closed when __del__ is called.  What is the expected behavior?

For example, consider this sample program, which will print "Can't create file."

import h5py
f = h5py.File('test.h5')
#f.close() <-- Don't explicitly close
del f

# Was f closed?
try:
    f = h5py.File('test.h5', 'w') # This will fail if f was not closed.
except:
    print "Can't create file."

I am using today's rev of the h5py hg repo (changeset: 735:8198d1ff5b6d).

Best regards,
Stuart

Andrew Collette

unread,
Sep 21, 2012, 1:48:17 PM9/21/12
to h5...@googlegroups.com
Hi,

> Is h5py supposed to automatically close File objects when they are deleted?
> My colleague and I have found that on Mac, Linux, and Windows, the file is
> not automatically closed when __del__ is called. What is the expected
> behavior?

Yes, File objects will remain open until you explicitly call "close".
Since closing a File object also closes all open objects in the file,
this behavior is intended to prevent a case in accidentally "losing" a
File object invalidates groups and datasets in other parts of the
program. For example, this function would never work:

def get_root_group(filename):
""" Opens the specified file and returns the root group """
f = h5py.File(filename)
return f['/']

Particularly since we include a ".file" attribute on all groups and
datasets, this behavior was deemed too surprising. Groups and
datasets themselves are always automatically closed when deleted.

HTH,
Andrew

stuarteberg

unread,
Sep 21, 2012, 2:16:15 PM9/21/12
to h5...@googlegroups.com
Hi Andrew,

Thanks for your reply.  I can't resist asking the obvious follow-up question :-)

Why is the group/dataset ".file" attribute stored via weakref?  It it were a normal ref, its existence would be enough keep the file from getting deleted.  Once all references to the file and all its datasets/groups are discarded, then garbage collection would come along and delete the file.  At that point, there could be no harm in closing the file from within __del__.

Stuart

Andrew Collette

unread,
Sep 21, 2012, 4:14:55 PM9/21/12
to h5...@googlegroups.com
Hi,

> Thanks for your reply. I can't resist asking the obvious follow-up question
> :-)

Not at all; it's nice to have someone interested in the h5py internals. :)

> Why is the group/dataset ".file" attribute stored via weakref? It it were a
> normal ref, its existence would be enough keep the file from getting
> deleted. Once all references to the file and all its datasets/groups are
> discarded, then garbage collection would come along and delete the file. At
> that point, there could be no harm in closing the file from within __del__.

It's actually not a weakref. '.file' is a property which
auto-generates a File object bound to the correct low-level
identifier, using H5Iget_file_id. We actually used to simply assign
the original File instance in the constructor for every group,
dataset, etc., but quickly discovered that there are some objects for
which this doesn't work. For example, in the case of an external
link, group2 = group1['extlink'] should result in group1.file !=
group2.file. Simply assigning group2.file = group1.file in the
constructor would do the wrong thing. Likewise, when dereferencing
object references, it's very awkward to patch up the correct File
reference on the retrieved object(s).

One of the nice design features about h5py version 2 that makes this
work is that all the high-level classes (Group, File, Dataset, etc.)
are stateless on the Python side; they get their state from binding to
a low-level HDF5 identifier. So if you have two File identifiers f1 =
File(lowid) and f2 = File(lowid), f1 and f2 are completely equivalent,
except of course to object identity (id()). The File instance
returned by .file isn't the original instance created by the user, but
it hashes the same, compares the same, etc., because it points to the
same file object inside the HDF5 library.

Andrew

stuarteberg

unread,
Sep 21, 2012, 4:24:48 PM9/21/12
to h5...@googlegroups.com
Hi Andrew,

Thanks for the explanation.  By the way, I'm looking forward to the upcoming h5py 2.1 release...

Cheers,
Stuart

stuarteberg

unread,
Sep 24, 2012, 11:56:51 AM9/24/12
to h5...@googlegroups.com
Andrew,

I have one more follow-up question on this topic.

My team and I need to tweak our design to ensure that h5py.File instances are automatically closed when they are no longer needed.

Since we probably aren't the first h5py users to face this issue, we figured we should ask if you can recommend any design patterns for handling it.  If we share h5py Files and Groups between objects that may be destroyed at different times, is there an easy way to ensure that the last owner of the file closes it?

One option would be for us to subclass h5py.File to call close() from within __del__, but that gets a little ugly because Groups and Datasets would also need to be subclassed to keep a reference to their file handle.  As you mentioned, you abandoned this approach because it fails for some cases, like external links.  We don't use external links, but I suppose that could change in the future.  Is there a better general-purpose solution?

Thanks,
Stuart

Andrew Collette

unread,
Sep 25, 2012, 12:02:22 PM9/25/12
to h5...@googlegroups.com
Hi,

> My team and I need to tweak our design to ensure that h5py.File instances
> are automatically closed when they are no longer needed.
>
> Since we probably aren't the first h5py users to face this issue, we figured
> we should ask if you can recommend any design patterns for handling it. If
> we share h5py Files and Groups between objects that may be destroyed at
> different times, is there an easy way to ensure that the last owner of the
> file closes it?

At the moment this is a really hard problem. If you only want to
auto-close File objects, then as you say it's a simple addition of
__del__, or use weakrefs which point to a master File object, which
you then close with a callback. I think the best way to handle this
on my side is to add an enhancement ticket to close files when there
are no remaining identifiers open. I can think of a few ways this
might be done, but they all require mucking around in the Cython layer
underneath h5py, and I'm not sure at present how that would interact
with the identifier system. It's something I would consider for h5py
2.2.

An alternative would be to restructure your app, but I generally don't
like it when people tell me that. :)

Finally, I would parenthetically comment that doing things in __del__
is generally considered a bit risky; in some circumstances (reference
cycles, for example), __del__ many not even be called when an object
is destroyed. This has bitten me before so keep it in mind. The
canonical way around this is to use weakrefs and clean up when they're
all lost.

Andrew

Stuart Berg

unread,
Sep 25, 2012, 1:55:28 PM9/25/12
to h5...@googlegroups.com
Hi Andrew,

Thanks for the advice.  I see you created Issue 246 on this topic.  Thanks!

Best,
Stuart

Kamil Kisiel

unread,
Sep 27, 2012, 8:01:16 PM9/27/12
to h5...@googlegroups.com
For our applications we wrap opening of h5py files with a context manager that closes the file once the context exits.

If you want to do that without writing any code I believe that simply using contextlib.closing() should be enough. 

Andrew Collette

unread,
Sep 27, 2012, 11:33:13 PM9/27/12
to h5...@googlegroups.com
> For our applications we wrap opening of h5py files with a context manager
> that closes the file once the context exits.
>
> If you want to do that without writing any code I believe that simply using
> contextlib.closing() should be enough.

You can also use File objects directly as context manager, e.g:

with File('name.hdf5') as f:
print f.keys()

Andrew

Kamil Kisiel

unread,
Sep 28, 2012, 12:36:58 AM9/28/12
to h5...@googlegroups.com
Is that new in 2.0 ? I swear it wasn't there before :) 

Andrew Collette

unread,
Sep 28, 2012, 1:10:02 PM9/28/12
to h5...@googlegroups.com
> Is that new in 2.0 ? I swear it wasn't there before :)

Since 1.1 at least, but it could be documented better. :)

Andrew

Darren Dale

unread,
Oct 4, 2012, 10:28:45 AM10/4/12
to h5...@googlegroups.com
On Tue, Sep 25, 2012 at 12:02 PM, Andrew Collette
<andrew....@gmail.com> wrote:
>> My team and I need to tweak our design to ensure that h5py.File instances
>> are automatically closed when they are no longer needed.
>>
>> Since we probably aren't the first h5py users to face this issue, we figured
>> we should ask if you can recommend any design patterns for handling it. If
>> we share h5py Files and Groups between objects that may be destroyed at
>> different times, is there an easy way to ensure that the last owner of the
>> file closes it?
>
> At the moment this is a really hard problem. If you only want to
> auto-close File objects, then as you say it's a simple addition of
> __del__, or use weakrefs which point to a master File object, which
> you then close with a callback. I think the best way to handle this
> on my side is to add an enhancement ticket to close files when there
> are no remaining identifiers open. I can think of a few ways this
> might be done, but they all require mucking around in the Cython layer
> underneath h5py, and I'm not sure at present how that would interact
> with the identifier system. It's something I would consider for h5py
> 2.2.

When an ObjectID (group, dataset, etc) goes out of scope and is
deallocated, we remove the low level hdf5 id from the object registry.
However, FileIDs set their self.locked property to True, such that
h5py does not tell hdf5 to reclaim the id in its own object registry
(and therefore the FileID is also not deleted from the h5py object
registry). It might be possible to do something like:

f=File(...)
f.fid.locked = False

to make h5py auto-close the file object when it goes out of scope.
This might require overriding FileID.__dealloc__, along the lines of

def __dealloc__(self):
if not self.locked:
self.close()
super(FileID, self).__dealloc__()

in order to get the desired behavior. We might even expose an
"auto_close" kwarg in the high-level File constructor. On the other
hand, I'm sure we would get an endless series of questions like:

>>> f=File(..., auto_close=True)
>>> g=f['foo']
>>> g.file
>>> print f
"<Closed HDF5 file>"

"How come f is closed? It didn't go out of scope!"

Here is another half-baked idea. Any time a new ObjectID is created,
it calls h5i.inc_ref(h5i.get_file_id(...)). When any ObjectID goes out
of scope, it calls h5i.dec_ref(h5i.get_file_id(...)). Then we wouldn't
need to "lock" FileIDs, they would only be removed from the hdf5 and
h5py registries when their hdf5 reference count drops to zero. In
order for that to work, every time an ObjectID went out of scope, it
would have to call _objects.registry.cleanup() to find and remove any
potentially invalid FileIDs before the underlying hdf5 id was recycled
by the hdf5 library. I think this approach would probably yield ideal
behavior, but unsatisfactory performance.

Darren

Darren Dale

unread,
Oct 4, 2012, 11:38:42 AM10/4/12
to h5...@googlegroups.com
On Thu, Oct 4, 2012 at 10:28 AM, Darren Dale <dsda...@gmail.com> wrote:
> On Tue, Sep 25, 2012 at 12:02 PM, Andrew Collette
> <andrew....@gmail.com> wrote:
>>> My team and I need to tweak our design to ensure that h5py.File instances
>>> are automatically closed when they are no longer needed.
>>>
>>> Since we probably aren't the first h5py users to face this issue, we figured
>>> we should ask if you can recommend any design patterns for handling it. If
>>> we share h5py Files and Groups between objects that may be destroyed at
>>> different times, is there an easy way to ensure that the last owner of the
>>> file closes it?
>>
>> At the moment this is a really hard problem. If you only want to
>> auto-close File objects, then as you say it's a simple addition of
>> __del__, or use weakrefs which point to a master File object, which
>> you then close with a callback. I think the best way to handle this
>> on my side is to add an enhancement ticket to close files when there
>> are no remaining identifiers open. I can think of a few ways this
>> might be done, but they all require mucking around in the Cython layer
>> underneath h5py, and I'm not sure at present how that would interact
>> with the identifier system. It's something I would consider for h5py
>> 2.2.
>
> Here is another half-baked idea. Any time a new ObjectID is created,
> it calls h5i.inc_ref(h5i.get_file_id(...)). When any ObjectID goes out
> of scope, it calls h5i.dec_ref(h5i.get_file_id(...)). Then we wouldn't
> need to "lock" FileIDs, they would only be removed from the hdf5 and
> h5py registries when their hdf5 reference count drops to zero. In
> order for that to work, every time an ObjectID went out of scope, it
> would have to call _objects.registry.cleanup() to find and remove any
> potentially invalid FileIDs before the underlying hdf5 id was recycled
> by the hdf5 library. I think this approach would probably yield ideal
> behavior, but unsatisfactory performance.

Actually, performance should not be an issue. Rather than having to
search the entire registry for invalid objects, in
_objects.registry.__delitem__(), we would simply do something like:

fid = h5i.get_file_id(key)
if h5i.get_ref(key) == 0:
del self._data[key]

If this change in behavior is possible to implement, I agree it should
wait for a 2.2.0 feature release, rather than be included in a 2.1.1
bugfix.

Darren
Reply all
Reply to author
Forward
0 new messages