Westpa HDF5 Framework with GROMACS?

276 views
Skip to first unread message

Hayden Scheiber

unread,
Aug 16, 2023, 2:04:38 PM8/16/23
to westpa-users
Hello Westpa community,

I would like to try using the HDF5 framework for saving trajectory and restart files. To do this, I have tried adapting tutorial 7.6 from the Combined Suite of Westpa Tutorials for use with my system. However, I am running my dynamics with GROMACS, not openMM as in the tutorial.

  • In my west.cfg file, I have added to the west.data.data_refs:
iteration:     $WEST_SIM_ROOT/traj_segs/iter_{n_iter:06d}.h5

  • In my get_pcoords.sh and runseg.sh files, I have tried adding (note I convert my solute trajectory output to .h5 format):
# This is the solute trajectory file saved to h5 format
cp $WEST_STRUCT_DATA_REF/seg.h5 $WEST_TRAJECTORY_RETURN

# # These are the restart files needed by gromacs for the next iteration
cp $WEST_SIM_ROOT/common_files/ref.pdb $WEST_RESTART_RETURN/ref.pdb
cp $WEST_STRUCT_DATA_REF/seg.trr $WEST_RESTART_RETURN/parent.trr

And removed copying over of the parent.trr file and ref.pdb files in runseg.sh, as is done when not using the HDF5 framework. However, it seems that WESTPA is not properly saving the trajectory or restart files. When I run init.sh, the iter_000000.h5 file ends up without any data, and WESTPA does not copy over the restart files. This despite my get_pcoord.sh log file showing that the trajectory and restart files are properly copied over to the $WEST_TRAJECTORY_RETURN and $WEST_RESTART_RETURN locations during ./init.sh.

 ./run.sh leads to "parent.trr not found" errors in all the segments. What am I doing wrong here? Do the restart files need to be in .xml format as in the tutorial?

Thanks,
Hayden

Anthony Bogetti

unread,
Aug 16, 2023, 2:21:57 PM8/16/23
to westpa...@googlegroups.com
Hi Hayden,

I have adapted the basic_nacl tutorial for use with GROMACS and the HDF5 framework, but need to get that example fully tested and uploaded. I was able to get everything to work with the following in my get_pcoord.sh:

cp $WEST_SIM_ROOT/common_files/nacl.top $WEST_TRAJECTORY_RETURN
cp $WEST_SIM_ROOT/bstates/bstate.gro $WEST_TRAJECTORY_RETURN
cp $WEST_SIM_ROOT/bstates/bstate.edr $WEST_TRAJECTORY_RETURN
cp $WEST_SIM_ROOT/bstates/bstate.trr $WEST_TRAJECTORY_RETURN

cp $WEST_SIM_ROOT/common_files/nacl.top $WEST_RESTART_RETURN
cp $WEST_SIM_ROOT/bstates/bstate.gro $WEST_RESTART_RETURN/parent.gro
cp $WEST_SIM_ROOT/bstates/bstate.edr $WEST_RESTART_RETURN/parent.edr
cp $WEST_SIM_ROOT/bstates/bstate.trr $WEST_RESTART_RETURN/parent.trr

and the following in my runseg.sh:

cp nacl.top $WEST_TRAJECTORY_RETURN
cp seg.gro $WEST_TRAJECTORY_RETURN
cp seg.trr $WEST_TRAJECTORY_RETURN
cp seg.edr $WEST_TRAJECTORY_RETURN

cp nacl.top $WEST_RESTART_RETURN
cp seg.gro $WEST_RESTART_RETURN/parent.gro
cp seg.trr $WEST_RESTART_RETURN/parent.trr
cp seg.edr $WEST_RESTART_RETURN/parent.edr

cp seg.log $WEST_LOG_RETURN

I found that all of these were needed to prevent errors in GROMACS restarting. I would try the above and let me know if you have any further errors. I can provide you with the entire directory that has worked for me if you would like a more in-depth example.

Let me know if you have any other questions.
Anthony

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/f4fe4cc3-914a-48a4-84d8-dd8c4825e400n%40googlegroups.com.

Hayden Scheiber

unread,
Aug 16, 2023, 4:05:30 PM8/16/23
to westpa-users
Hi Anthony,

Thank you for the quick reply. 
I was actually just trying to create a minimal working example. I had previously created my runseg.sh script to manually copy over all necessary restart files for gromacs (.trr, .gro, .edr, .top, etc). 
While trying to get the HDF5 framework working I started by only copying over a reference pdb file (which can be used as a topology in MDtraj) and the seg.trr file to $WEST_RESTART_RETURN.

In any case, I tried your exact suggestion but WESTPA is still producing an empty iter_000000.h5 file during init.sh, and it is not properly passing the restart files to the first iteration. 
Any chance I could view the config file for the tutorial you are developing?

Cheers,
Hayden

Hayden Scheiber

unread,
Aug 17, 2023, 1:14:01 PM8/17/23
to westpa-users

I have tried to adapt Anthony's example to my case, but I believe there is a bug present in WESTPA preventing the uptake of trajectory and restart data into HDF5 format. Additionally, I am seeing very unexpected behavior from the MAB scheme.

It likely has something to do with my particular setup, which uses advanced WESTPA modules and plugins. I have attached my west.cfgget_pcoord.sh, and runseg.sh files, as well as the four log files generated after running the simulation for 32 iterations (note the trajectory failed for an OS-level error unrelated to the simulation; please ignore iteration 33). The simulation was run on 2 nodes using the ZMQ work manager, so there is a log file for the master process, the MPI process, and the two node processes. Finally, I have attached the west.h5 output file as well as two iteration-level h5 files which are supposed to contain trajectory data, one from the 0th iteration (basis states) and one from the 1st iteration.

 

Description of my calculation:

  • I am attempting to perform an equilibrium weighted ensemble simulation of two proteins binding, similar to what was done in the barnase-barstar WE paper. I'm interested in calculating the on and off rates, as well as the binding affinity and comparing it with experiment.
  • For now, I am using the WEED plugin every 10 steps to help accelerate my system's probability distribution towards equilibrium.
  • I have defined three progress coordinates: the minimal protein-protein distance; the "binding RMSD" (similar to what is used in the barnase-barstar paper); and an integer progress coordinate that tracks history (it is 1 if a given trajectory was most recently in the initial state and -1 if the trajectory was most recently in the bound state, which I've defined to be small values of both other progress coordinates).
  • I have defined an outer Rectilinear bin scheme which contains two bins: one bin for trajectories most recently in the initial state and one bin for trajectories most recently in the bound state. Within each outer bin is a 4x7x1 MAB that ignores the history progress coordinate.
  • I have defined 256 basis states which include the two proteins spaced between 3.5 - 20 Angstroms apart, given random rotations, sampled configurations from a previous WE simulation, solvated in water/ions.
  • I am using GROMACS to propagate the dynamics.

 Issues in my simulation:

The MAB scheme is not mapping properly. For some reason, only 5 of 70 bins are occupied. I expected that around half my MAB bins would be occupied from the start, as all trajectories start out with history progress coordinate = 1. I believe this issue is likely related to using the 3rd history progress coordinate as it is not a continuous progress coordinate. It would be nice to have this issue fixed, but for now I can probably update my simulation to remove this third progress coordinate and save it instead as auxiliary data.

 

Restart and trajectory files are not being stored in HDF5 format, both during initialization and segment propagation. This despite following all available tutorials on how to do this. I am properly copying the trajectory and restart files to the $WEST_TRAJECTORY_RETURN and $WEST_RESTART_RETURN locations, but WESTPA is complaining with warnings like:

 

-- WARNING  [westpa.core.propagators.executable] -- could not write restart data for <Segment(0x7feae62f9c10) n_iter=1 seg_id=2 weight=0.0007677671665986625 parent_id=-238 wtg_parent_ids=(-238,) pcoord[0]=array([ 3.488344744.749756 ,  1.       ], dtype=float32) pcoord[-1]=array([0., 0., 0.], dtype=float32)>: restart data is not present

-- WARNING  [westpa.core.propagators.executable] -- In iteration 1. Assuming this is a start state and proceeding to skip reading restart from per-iteration HDF5 file for <Segment(0x7feae62f9c10) n_iter=1 seg_id=2 weight=0.0007677671665986625 parent_id=-238 wtg_parent_ids=(-238,) pcoord[0]=array([ 3.488344744.749756 ,  1.       ], dtype=float32) pcoord[-1]=array([0., 0., 0.], dtype=float32)>



Could this issue be related to the fact that I'm running a multi-node simulation, and each node has its own local temp file space? It's possible, although WESTPA is failing to save any trajectory or restart data at all. Note that to get around this issue, I continued to copy my restart files manually just to see what would happen after iteration 0. I found that WESTPA fails to save any trajectory/restart data at any iteration, including the 0th iteration. Any other advice?


Thanks,

Hayden

west-5008012-mpirun.log.txt
west-5008012.log.txt
runseg.sh
get_pcoord.sh
west-5008012-node-0.log.txt
west-5008012-node-1.log.txt

Hayden Scheiber

unread,
Aug 17, 2023, 1:28:39 PM8/17/23
to westpa-users
Some files failed to attach in the first message. Also the west.h5 file is too large, but it suffices to say it does not contain trajectory data, despite containing entries that point to where the trajectory data should be.

iter_000000.h5
iter_000001.h5
west.cfg

Hayden Scheiber

unread,
Aug 17, 2023, 1:56:36 PM8/17/23
to westpa-users
I can confirm that removing the 3rd "history" progress coordinate obviates the issue with MAB mis-mapping the pcoord space. After removing the third progress coordinate and re-running ./init.sh, 30/32 bins are occupied:

0 target state(s) present
Calculating progress coordinate values for basis states.
256 basis state(s) present
Calculating progress coordinate values for start states.
0 start state(s) present
Preparing initial states
################ MAB stats ################
minima in each dimension:      [3.4883447, 26.51096]
maxima in each dimension:      [28.935612, 85.02036]
direction in each dimension:   [-1, -1]
skip in each dimension:        [0, 0]
###########################################

        Total bins:            32
        Initial replicas:      240 in 30 bins, total weight = 1
        Total target replicas: 256
       
1-prob: -4.4409e-16
Simulation prepared.
30 of 32 (93.750000%) active bins are populated
per-bin minimum non-zero probability:       0.00251097
per-bin maximum probability:                0.119252
per-bin probability dynamic range (kT):     3.86057
per-segment minimum non-zero probability:   0.000274869
per-segment maximum non-zero probability:   0.0221154
per-segment probability dynamic range (kT): 4.38773
norm = 1, error in norm = 4.44089e-16 (2*epsilon)

Hayden Scheiber

unread,
Sep 20, 2023, 8:37:13 PM9/20/23
to westpa-users
Hi All,

Just an update here so anyone in the future dealing with this issue is aware. 
Turns out that in order to use the HDF5 framework for the 0th iteration, istates must be enabled. This was the problem I had. The HDF5 framework can indeed work with GROMACS.

If istates are not used, trajectory, restart, and log files will not be picked up by WESTPA into the iter_000000.h5 file, even if you pass them properly to $WEST_TRAJECTORY_RETURN, $WEST_RESTART_RETURN and $WEST_LOG_RETURN (respectively) in your get_pcoord.sh script.


Cheers,
Hayden

Leung, Jeremy

unread,
Sep 21, 2023, 9:46:55 AM9/21/23
to westpa...@googlegroups.com
Hi Hayden,

Thanks for the update. I believe the istates generation step bypasses the HDF5 Framework, which is probably why it worked.

I was also able to get the HDF5 framework, without using istates, though it requires fixing HDF5 framework's file-format-recognizing mechanism. .GRO, for example, can be used as topology and coordinates, but the WESTPA code previously defaulted it as topology only. Those changes are on the WESTPA GitHub and I expect them to be released with 2022.06.

-- JL

---
Jeremy M. G. Leung
PhD Candidate, Chemistry
Graduate Student Researcher, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

-- 
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.

Hayden Scheiber

unread,
Sep 21, 2023, 2:07:41 PM9/21/23
to westpa-users
Hi Jeremy,

Thanks for your response! I understand the issue now, it was a matter of ensuring that two files are returned to $WEST_TRAJECTORY_RETURN when not using istates, the first being a topology file. Seems that in westpa 2022.04, the HDF5 framework works differently for iteration 0 with bstates vs iteration 0 with istates (and all future iterations) as these can accept a single file containing both topology and coordinates. I can confirm that passing topology + trajectory files to $WEST_TRAJECTORY_RETURN allows WESTPA to properly save the iter_000000.h5 data.

Glad to hear you have resolved the issue in version 2022.06. Will upgrade once its released.

By the way, in your tutorial example you copy over a seg.edr file to $WEST_TRAJECTORY_RETURN. I found this is unnecessary, and the data stored in the seg.edr file is not saved in the iter_XXXXXX.h5 files. It would be nice if it were, as the energy info in this file can be valuable. As a workaround, I plan to save data contained in the seg.edr files (e.g. temperature, pressure, density, potential energy, etc) into westpa auxiliary datasets. Note that copying seg.edr to $WEST_RESTART_RETURN/parent.edr does work as expected, but according to the gromacs manual, passing parent.edr to gmx grompp is only useful if using the Nose-Hoover thermostat and/or Parrinello-Rahman barostat (but I recommend v-rescale thermostat and c-rescale barostat as they are ergodic, give correct fluctuations, and are stochastic).

Cheers,
Hayden
Message has been deleted

Jeremy Leung

unread,
Sep 22, 2023, 10:55:34 AM9/22/23
to westpa-users
Hi Hayden,

I'll require a little more information before I move on.

Since you're using multiple bstates, could you share your bstate.txt (the file you use with `--bstate-file` when you run `w_init`) and also share your gen_istate.sh as well? I want to take a look at how you organized your bstates folder.

Also I'm a little confused with the `$WEST_BSTATE_DATA_REF` variable you use in your get_pcoord.sh. Is it defined in your env.sh?

Best,

JL

Hayden Scheiber

unread,
Sep 22, 2023, 12:24:01 PM9/22/23
to westpa-users
Hi Jeremy,

Regarding my last message (now deleted) turns out it was an unrelated problem on my end, not a WESTPA bug. I have been having issues with overloading my cluster filesystem so I'm trying to optimize my calculations to produce trajectory files to local node SSD then only copy over the combined per-iteration h5 files to the shared filesystem. I was not pointing WESTPA to copy the restart files over to the correct location in the west.cfg file (west.data.data_refs.segment). Please excuse my error and disregard my previous email.

To answer your questions:
  • In testing mode I was actually just generating 1 basis state, so the bstate.txt file was very straightforward. Just one line with the usual three numbers, given below. My bstates folder has one subfolder called 000000 which would contain the necessary .trr file to initiate the next iteration, and pre-calculated progress coordinates in a pcoords.txt file.
0 1.0 000000
  • When testing the effect of enabling/disabling istates, I actually didn't manipulate the bstate to generate an istate. In the west.cfg file I would set west.propagation.gen_istates to true, set west.data.data_refs.initial_state to $WEST_SIM_ROOT/bstates/{initial_state.basis_state_id:06d}, and set west.executable.gen_istate to /usr/bin/true. This has the effect of "enabling" istates for testing without actually making new istates folders with added links (thus incurring unnecessary filesystem I/O) or running a gen_istates.sh script.
  • $WEST_BSTATE_DATA_REF is defined by WESTPA when running get_pcoord.sh, see: https://westpa.github.io/westpa/users_guide/west/setup.html#programs-executed-for-a-single-point



Best,
Hayden

Hayden Scheiber

unread,
Sep 22, 2023, 11:37:31 PM9/22/23
to westpa-users
Hi Jeremy,

Sorry to bother you again. Found what seems to be another bug regarding the per-iteration HDF5 framework. In testing with a single basis state, the HDF5 framework was working for iteration 0 (basis states) using the threads work manager (as in the tutorial), so I tried scaling my job up on my HPC cluster.

In scaling up from 1 to 256 basis states, I found that I had to switch work managers from threads to processes as I was getting segmentation faults with the threads work manager. As soon as I switched to the processes work manager, the per iteration framework failed to store any trajectory info or restart files into the iter_000000.h5 file. This is after making sure I was uploading the seg.h5 trajectory files to $WEST_TRAJECTORY_RETURN twice in get_pcoords.sh. It also does not work with istates enabled. The "processes" work manager seemed to just break the functionality for iteration 0. Other iterations continue to work with though.

Best,
Hayden

Jeremy Leung

unread,
Sep 25, 2023, 10:53:55 AM9/25/23
to westpa-users
Hi Hayden,

Unfortunately work manager errors are notoriously hard to debug, especially with seg faults. Plus, they are often hardware/simulation dependent.

Have you tried passing the `--n-workers XX` option to `w_init`? Sometimes the segfault occurs because all processes (256 in your case) are running at the same time. Limiting the number of workers to XX allows more breathing room for the hardware.

I'm also not sure if it's a typo ('seg.h5' trajectory files') but h5 trajectory files outputs are intentionally ignored by our HDF5 Framework. (https://github.com/westpa/westpa/blob/westpa2/src/westpa/core/trajectory.py#L10-L13). You will have to save them in a different format, like .xtc or .nc.

Hopefully that helps!

-- JL

Hayden Scheiber

unread,
Sep 25, 2023, 1:53:03 PM9/25/23
to westpa-users
Hi Jeremy,

Thanks for your reply. I understand the seg faults with the threads work manager are not something you could easily debug. What seems to be a more important bug is that the processes work manager does not work with the HDF5 framework at iteration 0 (i.e. with w_init), although it does for other iterations (i.e. with w_run).

Thanks for informing me about the --n-workers option, I didn't know about it. I managed to get my scaled up (256 bstates) calculation to correctly run w_init with the HDF5 framework for iteration 0 by setting the work manager to threads and --n-workers=1. However, even --n-workers=2 results in a segfault. I've pasted the error trace below when using --n-workers=2 for interest, but I do not expect it's something that can easily be tracked down and fixed; it might be specific to my HPC environment.

Regarding your comment about seg.h5 files. That has not been my experience using westpa version 2022.05. I have been able to save my trajectory data (actually just my solute trajectory data) to the per-iteration h5 files in .h5 format (by first loading the xtc and topology files into mdtraj, then passing this data to westpa by saving to $WEST_TRAJECTORY_RETURN/seg.h5). This works for iteration 0 and other iterations as well, as long as I use the threads work manager for iteration 0. I can also confirm I have access to all the coordinates and topologies within the per-iteration HDF5 files.


Cheers,
Hayden


Calculating progress coordinate values for basis states.
-- WARNING  [westpa.core.propagators.executable] -- could not read any data for trajectory: Problems reading the array data.
-- WARNING  [westpa.core.propagators.executable] -- could not read any data for trajectory: Problems reading the array data.
-- WARNING  [westpa.core.propagators.executable] -- could not read any data for trajectory: Problems reading the array data.
[5173940:40268:0:40291] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:  40291) ====
 0  /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x2dc) [0x7ffbb0d22acc]
 1  /usr/local/ucx/lib/libucs.so.0(+0x2fcd7) [0x7ffbb0d22cd7]
 2  /usr/local/ucx/lib/libucs.so.0(+0x2ffb6) [0x7ffbb0d22fb6]
 3  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x243fe8) [0x7ffba5c06fe8]
 4  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5SL_insert+0x21) [0x7ffba5c090d1]
 5  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5C__tag_entry+0xcf) [0x7ffba5a63d7f]
 6  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5C_protect+0x53f) [0x7ffba5a561ef]
 7  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5AC_protect+0xa8) [0x7ffba5a34cc8]
 8  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5O_protect+0x135) [0x7ffba5b80595]
 9  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5O_get_info+0x9c) [0x7ffba5b821ec]
10  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x1454c7) [0x7ffba5b084c7]
11  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x15b496) [0x7ffba5b1e496]
12  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5G_traverse+0xd1) [0x7ffba5b1e951]
13  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5G__get_objinfo+0x8d) [0x7ffba5b0b2cd]
14  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5VL__native_group_optional+0x1eb) [0x7ffba5cd65ab]
15  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5VL_group_optional+0x121) [0x7ffba5cc2c01]
16  /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5Gget_objinfo+0xd0) [0x7ffba5b0b030]
17  /root/gmxenv/lib/python3.8/site-packages/tables/hdf5extension.cpython-38-x86_64-linux-gnu.so(get_objinfo+0x50) [0x7ffba5276660]
18  /root/gmxenv/lib/python3.8/site-packages/tables/hdf5extension.cpython-38-x86_64-linux-gnu.so(+0x148854) [0x7ffba527e854]
19  /root/gmxenv/bin/python3() [0x5c4c80]
20  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796) [0x570556]
21  /root/gmxenv/bin/python3() [0x50b07e]
22  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796) [0x570556]
23  /root/gmxenv/bin/python3() [0x50b07e]
24  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796) [0x570556]
25  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5f6ce6]
26  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x859) [0x56b619]
27  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5f6ce6]
28  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x859) [0x56b619]
29  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
30  /root/gmxenv/bin/python3() [0x50b1f0]
31  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1910) [0x56c6d0]
32  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
33  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
34  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1910) [0x56c6d0]
35  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
36  /root/gmxenv/bin/python3() [0x50b1f0]
37  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1910) [0x56c6d0]
38  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
39  /root/gmxenv/bin/python3() [0x50b1f0]
40  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1910) [0x56c6d0]
41  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
42  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
43  /root/gmxenv/bin/python3(PyObject_Call+0x62) [0x5f60b2]
44  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1f3c) [0x56ccfc]
45  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
46  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
47  /root/gmxenv/bin/python3(PyObject_Call+0x62) [0x5f60b2]
48  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1f3c) [0x56ccfc]
49  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
50  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
51  /root/gmxenv/bin/python3(PyObject_Call+0x62) [0x5f60b2]
52  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1f3c) [0x56ccfc]
53  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
54  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
55  /root/gmxenv/bin/python3(PyObject_Call+0x62) [0x5f60b2]
56  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1f3c) [0x56ccfc]
57  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5f6ce6]
58  /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x72d) [0x56b4ed]
59  /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a) [0x5697da]
60  /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x393) [0x5f6ec3]
=================================
[5173940:40268] *** Process received signal ***
[5173940:40268] Signal: Segmentation fault (11)
[5173940:40268] Signal code:  (-6)
[5173940:40268] Failing at address: 0x9d4c
[5173940:40268] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7ffbc3057090]
[5173940:40268] [ 1] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x243fe8)[0x7ffba5c06fe8]
[5173940:40268] [ 2] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5SL_insert+0x21)[0x7ffba5c090d1]
[5173940:40268] [ 3] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5C__tag_entry+0xcf)[0x7ffba5a63d7f]
[5173940:40268] [ 4] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5C_protect+0x53f)[0x7ffba5a561ef]
[5173940:40268] [ 5] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5AC_protect+0xa8)[0x7ffba5a34cc8]
[5173940:40268] [ 6] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5O_protect+0x135)[0x7ffba5b80595]
[5173940:40268] [ 7] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5O_get_info+0x9c)[0x7ffba5b821ec]
[5173940:40268] [ 8] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x1454c7)[0x7ffba5b084c7]
[5173940:40268] [ 9] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(+0x15b496)[0x7ffba5b1e496]
[5173940:40268] [10] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5G_traverse+0xd1)[0x7ffba5b1e951]
[5173940:40268] [11] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5G__get_objinfo+0x8d)[0x7ffba5b0b2cd]
[5173940:40268] [12] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5VL__native_group_optional+0x1eb)[0x7ffba5cd65ab]
[5173940:40268] [13] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5VL_group_optional+0x121)[0x7ffba5cc2c01]
[5173940:40268] [14] /root/gmxenv/lib/python3.8/site-packages/tables/../tables.libs/libhdf5-f789316a.so.200.2.0(H5Gget_objinfo+0xd0)[0x7ffba5b0b030]
[5173940:40268] [15] /root/gmxenv/lib/python3.8/site-packages/tables/hdf5extension.cpython-38-x86_64-linux-gnu.so(get_objinfo+0x50)[0x7ffba5276660]
[5173940:40268] [16] /root/gmxenv/lib/python3.8/site-packages/tables/hdf5extension.cpython-38-x86_64-linux-gnu.so(+0x148854)[0x7ffba527e854]
[5173940:40268] [17] /root/gmxenv/bin/python3[0x5c4c80]
[5173940:40268] [18] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796)[0x570556]
[5173940:40268] [19] /root/gmxenv/bin/python3[0x50b07e]
[5173940:40268] [20] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796)[0x570556]
[5173940:40268] [21] /root/gmxenv/bin/python3[0x50b07e]
[5173940:40268] [22] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x5796)[0x570556]
[5173940:40268] [23] /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x1b6)[0x5f6ce6]
[5173940:40268] [24] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x859)[0x56b619]
[5173940:40268] [25] /root/gmxenv/bin/python3(_PyFunction_Vectorcall+0x1b6)[0x5f6ce6]
[5173940:40268] [26] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x859)[0x56b619]
[5173940:40268] [27] /root/gmxenv/bin/python3(_PyEval_EvalCodeWithName+0x26a)[0x5697da]
[5173940:40268] [28] /root/gmxenv/bin/python3[0x50b1f0]
[5173940:40268] [29] /root/gmxenv/bin/python3(_PyEval_EvalFrameDefault+0x1910)[0x56c6d0]
[5173940:40268] *** End of error message ***

Jeremy Leung

unread,
Sep 27, 2023, 1:10:09 PM9/27/23
to westpa-users
Hi Hayden,

Reading the code again, it seems like you can't use the h5 file as topology but ok as coords, i think. I assume you can rectify that by passing in something else as topology? I've been able to initialize tutorial 5.5 with multiple basis states fine on either work managers, so it's definitely something regarding your set up.  `w_init` runs `get_pcoord.sh`; `w_run` runs `runseg.sh`, so it's possible one works and the other doesn't.

Looking at the backtrace, it's HDF5 and Pytables related (which is what mdtraj uses to read trajectories). I guess there are a couple things to test:
1. Have you tried using passing a different format to $WEST_TRAJECTORY_RETURN to see if it's caused by the seg.h5 file you wrote out? If so, is this while you're creating that file or when WESTPA tries to read the `seg.h5`?
2. Are multiple basis states trying to read the same H5 file at the same time, causing it to segfault because of race conditions?

Hope this helps.

-- JL

Hayden Scheiber

unread,
Sep 27, 2023, 4:30:59 PM9/27/23
to westpa-users
Hi Jeremy,

I did some testing to pinpoint exactly when the seg faults occur. Below is what I found. I looked at 3 options for returning data to $WEST_TRAJECTORY_RETURN (see my get_pcoord.sh script below)
  • In all situations, the processes work manager fails to capture any trajectory and restart data into the iter_000000.h5 file, leaving the file totally empty. Thus, the threads work manager must be used with w_init if you want to use the HDF5 framework for iteration 0.
  • As I found before, using w_init with --work-manager=threads and --n-workers=1, I could run any of options 1, 2, or 3. All 3 options correctly produce the iter_000000.h5 file with topology, coordinates, and restart data!
  • If I use  --work-manager=threads and --n-workers=N for N>1, I cannot use option 2 or 3 as these result in a segfault. However, option 1 correctly produce the iter_000000.h5 file with topology, coordinates, and restart data!

 
Below is my get_pcoord.sh script used for testing:

#!/bin/bash
if [ -n "$SEG_DEBUG" ] ; then
  set -x
  env | sort
fi

# Return the progress coordinate value for this segment
cat $WEST_BSTATE_DATA_REF/pcoords.txt > $WEST_PCOORD_RETURN

# Now we save the solute trajectory into the HDF5 framework for storage
# Note the seg.xtc trajectory is a compressed file with only the solute.
# ref.pdb also only contains the solute.
# seg.h5 is a HDF5 trajectory file with the solute.

# Option 1: use pdb + xtc files
cp $WEST_SIM_ROOT/common_files/ref.pdb $WEST_TRAJECTORY_RETURN/seg.pdb
cp $WEST_BSTATE_DATA_REF/seg.xtc $WEST_TRAJECTORY_RETURN/seg.xtc

# # Option 2: use a single seg.h5 file
# cp $WEST_BSTATE_DATA_REF/seg.h5 $WEST_TRAJECTORY_RETURN/seg.h5

# # Option 3: upload two seg.h5 files
# cp $WEST_BSTATE_DATA_REF/seg.h5 $WEST_TRAJECTORY_RETURN/seg.h5
# cp $WEST_BSTATE_DATA_REF/seg.h5 $WEST_TRAJECTORY_RETURN/seg.h5


# This is the only restart file actually needed by gromacs for the next iteration
# Note the seg.trr trajectory file is a full precision file with solute and solvent
# The seg.gro file is not needed as a common file is used for all segments
# An input seg.edr file is not read by gmx grompp unless using the Nose-Hoover thermostat
# and/or Parinello-Rahman barostat
cp $WEST_BSTATE_DATA_REF/seg.trr $WEST_RESTART_RETURN/parent.trr

# Return the bstate log file to westpa
cp $WEST_BSTATE_DATA_REF/gen_bstates.log $WEST_LOG_RETURN


Cheers,
Hayden
Reply all
Reply to author
Forward
0 new messages