h5dump error: internal error (file h5dump.c:line 1471)

527 views
Skip to first unread message

Sreyoshi Sur

unread,
May 21, 2018, 12:41:42 PM5/21/18
to westpa-users
Hi,

I have been running westpa on a cluster. I was wondering if anyone faced this error while running h5dump on west.h5 file:
h5dump error: internal error (file h5dump.c:line 1471)
I cannot oipen west.h5 file using the python module h5py too. I keep getting this other error:
Unable to open object (Bad object header version number)
If anyone can look at my west.h5 file, I can provide a google drive link too.
Is there another way I can extract the weights of each trajectory? Because this simulation has been running for 500 iterations and I cant access the dara right now.

Thanks
Sreyoshi  Sur
Grossfield Lab
University of Rochester


Ali Sinan Saglam

unread,
May 21, 2018, 8:26:04 PM5/21/18
to westpa...@googlegroups.com
Hi Sreyoshi,

Can you send the google drive link so I can take a look? It's possible that the h5 file is corrupted if a process writing to the file is abruptly stopped, do you have a back up?

Best, 

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To post to this group, send email to westpa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Ali Sinan Saglam, PhD
Faeder lab, Room 3070, BST3
University of Pittsburgh
Pittsburgh, PA 15260

Sreyoshi Sur

unread,
May 22, 2018, 8:23:46 AM5/22/18
to westpa-users
Hi Ali,

I don't have a backup for this h5 file. Following is the link:

https://drive.google.com/file/d/1lnupt3Ng7L2zDlHxHI0Hq1QtzMfsJhik/view?usp=sharing

The issue is I am generating so much amount of data that I exceed my limits on scratch space in my cluster and I think that's when the processes stop and west.h5 is shwoing this kind of error.


Thanks
Sreyoshi Sur
Grossfield Lab
University of Rochester

Matthew Zwier

unread,
May 22, 2018, 8:43:06 AM5/22/18
to westpa...@googlegroups.com
Yes, that would likely result in a corrupted HDF5 file. Unfortunately, the HDF5 library does not support transactional storage that would allow for a graceful failure in this circumstance. You’ll have to exit if you’re about to fill up scratch. The quickest way would be to have your runseg.sh exit with a failure code if you’re about to overrun your space. That would trigger a graceful shutdown of WESTPA that you could restart.

Cheers,
Matt Z.

Sent from my mobile device. Please forgive any unusual brevity or egregious spelling errors.

Ali Sinan Saglam

unread,
May 22, 2018, 3:02:26 PM5/22/18
to westpa...@googlegroups.com
Hi Sreyoshi,

I see that the h5 file is not fully corrupted, I can open it with h5py and read most of the file using h5ls. However, iterations 449 to 452 are corrupted somehow and I can't open them. 

Primary suggestion I have is to restart the simulation from iteration 448. To do this you will have to make a manual copy of the h5 file unfortunately, we normally have a tool to delete iterations (w_truncate) but I can't seem to be able to delete your faulty iterations which means you are going to want to make an exact copy of the original file with iteration data up to 448 and use that to continue your simulations. This involves writing a script that copies each dataset in the original file to a new file, in the same format. If you haven't used h5py to make h5 files before I suggest checking the documentation, especially this section regarding datasets.

Like Matt mentioned, I also suggest you to write a post_iter script that a) makes a copy of your h5 file every X iterations, X depends on your situation (space constraints, how long each iteration takes etc.) and b) monitor the space left so you can exit a simulation if you can't run your full iteration. I suggest estimating how big 1 iteration is and give yourself a +10-20% leeway (also depends on how many people are writing to the same file system) and if you have less than that exit the simulation.

Best, 

Sreyoshi Sur

unread,
May 23, 2018, 1:43:21 PM5/23/18
to westpa-users
Hi,

Yes I have used h5py to create new h5files so I think I will be able to copy the old h5 file into a new one. I was trying to access iterations after 449 so I guess that is why I kept getting the error.
 I will add few failsafe lines at the end of runseg.sh so that I dont get corrupted datasets in west.h5.Thank you for your inputs.

Regards,
Sreyoshi Sur
University of Rochester

On Tuesday, May 22, 2018 at 3:02:26 PM UTC-4, Ali Sinan Saglam wrote:Hi

Adam

unread,
May 23, 2018, 1:55:17 PM5/23/18
to westpa...@googlegroups.com
Hi Sreyoshi,

As an extra measure, please try and run one of the analysis tools on
your new west.h5 file, such as w_assign. This will help ensure that
you didn't miss anything when copying.

You may also need to copy/modify the metadata on your west.h5, as
WESTPA stores information about the current iteration there. It may
complain that you're 'missing' data, as the metadata would still say
you're on some iteration past 449.

Best,

A.


---
Adam Pratt
Graduate Student in Chemistry
Chong Lab, Room 338, Eberly Hall
University of Pittsburgh
Pittsburgh, PA 15260


Reply all
Reply to author
Forward
0 new messages