Memory "leak" when reading and writing chunked data

35 views
Skip to first unread message

stuart bh

unread,
Jun 19, 2014, 9:40:08 PM6/19/14
to h5...@googlegroups.com
Greetings,

I have seen several references to "memory leaks" and the thread "Memory “leaking” when writing lots of data - Stefan Scherfke" is most similar to my situation.
When using chunked storage memory usage grows linearly with read and write operations and is never released until the process is killed by the kernel.
When I run the same test with no chunking memory use (as measured) is static.
My system specs are :
os : Red Hat Linux Server 6.5
python : 2.7.6
h5py : 2.3.0
hdf5 : 1.8.12
My test case involves 100 files with dataset shape (250, 400, 300) and iterating through each slice of (x, y, :) reading from one file and writing to another.
Attached are plots of memory usage for :
- chunked storage;
- chunked with cache size set to 0 which appears not to have an effect;
- no chunking ie contiguous storage
For our application chunked storage is a requirement so I am hoping there is a solution, maybe in latest release?
I can post code but essentially it is just read and write done enough times to make memory usage obvious.
Any suggestions would be greatly appreciated.

Cheers
Stuart 
output_1_1.png
output_2_1.png
output_3_1.png

Andrew Collette

unread,
Jun 20, 2014, 11:53:38 AM6/20/14
to h5...@googlegroups.com
Hi Stuart,

> I have seen several references to "memory leaks" and the thread "Memory
> “leaking” when writing lots of data - Stefan Scherfke" is most similar to my
> situation.

Thanks for the extra data points. We weren't able to determine what
the issue was last time, but I certainly would like to nail down
what's happening.

Could I ask you to try the following:

1. Try the current master, although I think this is unlikely to help
at the moment
2. Try with HDF5 1.8.13 (both master and the current 2.3 branch can
now build against it)
3. Finally, try without gzip compression

I suspect the problem may be within HDF5, or possibly our use of HDF5.
If the above three don't help, there are some more advanced options we
can try, relating to chunk allocation time/fill values.

Andrew
Reply all
Reply to author
Forward
0 new messages