truncating a partially corrupt h5 file

174 views
Skip to first unread message

Chloe Sheen

unread,
May 17, 2023, 7:20:17 PM5/17/23
to westpa-users

Hello WESTPA community,

This may be more of an hdf5/h5py question, but I wanted to ask if anyone else has dealt with this issue. I realized that a previous iteration was corrupted due to disk storage and attempted to roll back to the iteration before the corruption. I tried w_truncate, but it seems to be having trouble (KeyError: "Couldn't delete link (callback link pointer is NULL (specified link may be '.' or not exist))") because there were multiple iterations that were corrupted. I attempted w_truncate -n 1999 but it looks like iterations 2126 and 2128 still persist, and west_current_iteration won’t change from iteration 2183, which is the point when I detected the corruption. I’ve attached a portion of the h5 file. 

How should I go about fixing this and continuing my run from iteration 1999 without having to restart everything? 

Thanks,

Chloe

west_test.h5.zip

Jeremy Leung

unread,
May 17, 2023, 10:04:18 PM5/17/23
to westpa-users
Hi Chloe,

Thanks for providing the test file.  `west_current_iteration` is an attribute and I was able to modify it with the following code:
```
import h5py
f = h5py.File('west_test.h5','a')
f.attrs.keys()  # List of all the attributes
f.attrs['west_current_iteration']  # Should print 2183
f.attrs['west_current_iteration'] = 1999
f.attrs['west_current_iteration']  # Should print 1999 now
```

As for iter_00002126 and iter_00002128, it seems like they're corrupted datasets and it'll be difficult to manually clean up. I suggest creating a duplicate h5 file and copying everything but those iterations over.

To speed things up I would do the following steps. Make sure you store duplicate copies so you can always rollback changes. Some of the HDF5 tools here will help, like h5copy and h5repack.
1) Delete the iterations group and repack the file with h5repack.

2) Make the `iterations` group in the repacked file by copying it from a freshly initialized simulation (or from same setup that ran for 1-2 iterations) so you have all the attributes down as well. You can use h5copy to move things over, or make it from scratch.

3) Copy each iteration one by one, except for the broken two iterations. You can probably do it using h5copy or in python.

Here are some related references that might help:

All the best,

JL
Reply all
Reply to author
Forward
0 new messages