Hi,
I recently experienced HDF5 file corruption issues with some of my h5py programs. Therefore I wrote some simple sequential or parallel h5py benchmarks to tested them, as some other users reported similar problems. And it turned out that there are indeed file corruptions due to HDF5 operations.
I just wonder if they are considered as bugs and does our HDF5 community expect to fix them? I did some studies on it and hope this could help developers to deal with them.
The benchmarks in my study involve simple operations such as
- dataset creation: foo.create_dataset()
- dataset removal: del f['foo']
- dataset rename, dataset resize
on an HDF5 file with several existing groups and datasets.
I tried to emulate crashes in the middle of execution, recover the file with h5clear and many of them cause crash consistency problems. For instance, some existing datasets (which are not modified by the benchmark) are inaccessible after a crash within the creation and deletion of another dataset. The rename and resize of a dataset could also be left inaccessible (e.g. Cannot read data from the resized dataset (wrong B-tree signature)).
Many of these problems remain even if I turned on SWMR mode.
Another thing I did is that I tried to identify their root causes in the level of low-level objects, e.g. a parent b-tree node is not modified together with its child node. I think I will not go into the details here but you can definitely contact me if you find this interesting and useful.
Sincerely,
Jinghan