HDF5 data corruption issues after a crash?

332 views
Skip to first unread message

Jinghan Sun

unread,
Oct 7, 2020, 2:26:16 AM10/7/20
to h5py
Hi,

I recently experienced HDF5 file corruption issues with some of my h5py programs. Therefore I wrote some simple sequential or parallel h5py benchmarks to tested them, as some other users reported similar problems. And it turned out that there are indeed file corruptions due to HDF5 operations. 

I just wonder if they are considered as bugs and does our HDF5 community expect to fix them? I did some studies on it and hope this could help developers to deal with them.

The benchmarks in my study involve simple operations such as 
  • dataset creation: foo.create_dataset()
  • dataset removal: del f['foo']
  • dataset rename, dataset resize
on an HDF5 file with several existing groups and datasets.

I tried to emulate crashes in the middle of execution, recover the file with h5clear and many of them cause crash consistency problems. For instance, some existing datasets (which are not modified by the benchmark) are inaccessible after a crash within the creation and deletion of another dataset. The rename and resize of a dataset could also be left inaccessible (e.g. Cannot read data from the resized dataset (wrong B-tree signature)).

Many of these problems remain even if I turned on SWMR mode.

Another thing I did is that I tried to identify their root causes in the level of low-level objects, e.g. a parent b-tree node is not modified together with its child node. I think I will not go into the details here but you can definitely contact me if you find this interesting and useful. 

Sincerely,
Jinghan




Thomas Kluyver

unread,
Oct 7, 2020, 4:05:35 AM10/7/20
to h5...@googlegroups.com
Hi Jinghan,

If there's a Python level crash (an uncaught exception), h5py & HDF5 should clean up after themselves and leave files in a valid state. If they don't, I'd consider that a bug in h5py. But if there's a crash which kills your program without cleanup happening - like a segfault, or doing 'kill -9' on it - I wouldn't be surprised if files are left in an invalid state. I don't know how far HDF5 tries to protect against that - you could ask on the HDF forum (https://forum.hdfgroup.org/ ) if you're interested.

Best wishes,
Thomas

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/2371d0e5-9806-4af7-bade-41fc39748557n%40googlegroups.com.

Jinghan Sun

unread,
Oct 7, 2020, 4:12:58 AM10/7/20
to h5py
Hi Thomas,

Thank you so much! The crashes I emulated belong to the second case. I will also contact the HDF5 community on this.

Jinghan
Reply all
Reply to author
Forward
0 new messages