RE: Memory "leak" when reading and writing chunked data [SEC=UNCLASSIFIED]

226 views
Skip to first unread message

Stuart Baron-Hay

unread,
Jun 26, 2014, 6:37:26 PM6/26/14
to h5...@googlegroups.com
Hi Andrew

I tried your suggestions below but the issue still occurs.

I then tried reproducing the situation in C with HDF5 API and found that memory is allocated to the btree regularly as new data is read (call to H5FL_blk_malloc from H5D__btree_idx_get_addr). I posted my findings to the Hdf-forum and have received some feedback. I don't know if this is the culprit in my tests with h5py, but I'm presuming so.

Having said that though, if I loop through the file twice, reading each slice (1, 1, 300), then for HDF5 memory use grows during the first loop but is static for the second loop. Whereas implemented with h5py memory grows during both loops however characteristic is different between first and second loop. During first loop growth in memory usage is more sporadic, during second loop growth is at a very consistent rate.

Cheers
Stuart

-----Original Message-----
From: h5...@googlegroups.com [mailto:h5...@googlegroups.com] On Behalf Of Andrew Collette
Sent: Saturday, 21 June 2014 1:54 AM
To: h5...@googlegroups.com
Subject: Re: Memory "leak" when reading and writing chunked data

Hi Stuart,

> I have seen several references to "memory leaks" and the thread
> "Memory “leaking” when writing lots of data - Stefan Scherfke" is most
> similar to my situation.

Thanks for the extra data points. We weren't able to determine what the issue was last time, but I certainly would like to nail down what's happening.

Could I ask you to try the following:

1. Try the current master, although I think this is unlikely to help at the moment 2. Try with HDF5 1.8.13 (both master and the current 2.3 branch can now build against it) 3. Finally, try without gzip compression

I suspect the problem may be within HDF5, or possibly our use of HDF5.
If the above three don't help, there are some more advanced options we can try, relating to chunk allocation time/fill values.

Andrew

--
You received this message because you are subscribed to a topic in the Google Groups "h5py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h5py/kj6nbeubKfk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h5py+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Collette

unread,
Jun 27, 2014, 11:25:53 AM6/27/14
to h5...@googlegroups.com
Hi Stuart,

> I then tried reproducing the situation in C with HDF5 API and found that memory is allocated to the btree regularly as new data is read (call to H5FL_blk_malloc from H5D__btree_idx_get_addr). I posted my findings to the Hdf-forum and have received some feedback. I don't know if this is the culprit in my tests with h5py, but I'm presuming so.

Thanks for investigating further. There's nothing I can find in the
h5py code paths that would cause this only for chunked datasets, so I
suspect it really is a problem inside HDF5. Since you can reproduce
it from C, you might also file a bug directly with the HDF Group
(he...@hdfgroup.org) so it gets into their tracker.

Andrew
Reply all
Reply to author
Forward
0 new messages