issues with propagating after restart

37 views
Skip to first unread message

Neville Bethel

unread,
Sep 1, 2021, 3:45:03 PM9/1/21
to westpa...@googlegroups.com
Hi WESTPA users,

I am coming across issues when restarting WESTPA simulations.
WESTPA exited after reaching the maximum wall clock time at iteration 29.
I am noticing that westpa is throwing out (merging?) most of the simulations at the start and end of the progress coordinates:

WESTPA_color$ cat traj_segs/000029/*/seg.txt

38.103   0.000

 37.228   0.000

 39.010   0.000

 38.049   0.000

 38.264   0.000

 38.919   0.000

 36.943   0.000

 36.942   0.000

 38.425   0.000

 37.789   0.000

 39.917   0.000

 38.874   0.000

 38.443   0.000

 39.231   0.000

 38.891   0.000

 38.368   0.000

 39.483   0.000

 39.508   0.000

 39.368   0.000

 38.715   0.000

 39.613   0.000

 39.012   0.000

 38.802   0.000

 39.951   0.000

 39.591   0.000

 40.029   0.000

 41.012   0.000

 43.068   0.000

 42.259   0.000

 39.908   0.000

 40.246   0.000

 40.299   0.000

 45.843   0.000

 46.778   0.000

 45.445   0.000

 41.071   0.000

 46.131   0.000

 47.141   0.000

 49.723   0.000

 49.851   0.000

 50.237   0.000

 51.045   0.000

 51.530   0.000

 51.020   0.000

 55.020   0.000

 54.866   0.000

 54.768   0.000

 55.213   0.000

 54.542   0.000

 55.345   0.000

 55.242   0.000

 56.461   0.000

 55.450   0.000

 55.797   0.000

 56.121   0.000

 56.309   0.000

 55.436   0.000

 55.265   0.000

 54.823   0.000

 55.387   0.000

 58.252   0.000

 59.382   0.000

 58.768   0.000

 60.007   1.000

 59.610   0.000

 59.059   0.000

 57.298   0.000

 59.516   0.000

 60.253   1.000

 59.975   0.000

 60.753   1.000

 60.691   1.000

 59.704   0.000

 58.951   0.000

 59.323   0.000

 60.067   1.000


WESTPA_color$ cat traj_segs/000030/*/parent.txt

 37.789   0.000

 37.789   0.000

 37.789   0.000

 37.789   0.000

 37.789   0.000

 37.789   0.000

 37.789   0.000

 37.789   0.000

 38.874   0.000

 38.368   0.000

 38.891   0.000

 38.425   0.000

 38.802   0.000

 38.715   0.000

 38.715   0.000

 38.443   0.000

 39.908   0.000

 39.951   0.000

 39.591   0.000

 39.908   0.000

 39.591   0.000

 39.908   0.000

 39.368   0.000

 39.591   0.000

 40.246   0.000

 41.071   0.000

 40.299   0.000

 40.029   0.000

 41.071   0.000

 41.071   0.000

 41.071   0.000

 41.012   0.000

 45.445   0.000

 45.843   0.000

 45.843   0.000

 45.843   0.000

 46.778   0.000

 46.131   0.000

 47.141   0.000

 49.851   0.000

 51.045   0.000

 51.530   0.000

 51.020   0.000

 50.237   0.000

 55.213   0.000

 54.823   0.000

 55.265   0.000

 54.542   0.000

 54.768   0.000

 55.345   0.000

 55.242   0.000

 54.866   0.000

 56.309   0.000

 56.461   0.000

 56.461   0.000

 56.461   0.000

 56.121   0.000

 56.309   0.000

 56.461   0.000

 56.461   0.000

 59.059   0.000

 59.382   0.000

 59.059   0.000

 58.252   0.000

 59.610   0.000

 58.768   0.000

 59.516   0.000

 58.252   0.000

 60.007   1.000

 60.007   1.000

 60.007   1.000

 60.007   1.000

 60.007   1.000

 60.007   1.000

 60.007   1.000

 60.007   1.000


So for example, the bin at 36-38 has simulations at 37.228, 36.943, 36.942 and 37.789, but it's only carrying 37.789 forward to iteration 30. The same thing happens for bin 60-62. I attached the west.cfg and system.py file.
Is this a bug, or am I missing something here?

Best,
Neville
system.py
west.cfg

Jeremy Leung

unread,
Sep 1, 2021, 4:51:35 PM9/1/21
to westpa...@googlegroups.com
Hi Neville,

Interesting setup for your WESTPA simulation. I assume this is an equilibrium simulation?

To understand the splitting/merging, I suggest running the simulation in debug mode to see what is being split and merged in the log file. (w_run --debug)

If you would also like to email me the west.h5 file directly, I can take a look at the parent ids and weights to understand what is happening. It's really difficult to tell what is causing it just by reading your text outputs.

Regards,

Jeremy L.

P.S. I also suggest saving the pcoord data as a dataset in the h5 file in addition to a txt file. It would help with analysis and save space.
--
Jeremy M. G. Leung
PhD Candidate, Chemistry
Graduate Student Researcher, Chemistry 
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]


--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/CAEJnLjEYJ6iRqkSFbevhn3Y8LUAX6BeVXgxb7EjRza7-PGU%3Dtw%40mail.gmail.com.
<system.py><west.cfg>

Neville Bethel

unread,
Sep 1, 2021, 4:57:15 PM9/1/21
to westpa...@googlegroups.com
Thanks Jeremy, This is an equilibrium simulation with color as an extra progress coordinate. I'll try debug mode to see what's going on.
The pcoord data should also be stored on the h5. I normally just use the text files for a quick check while I am running other things. I clear them away when I am cleaning up.

I attached the h5 file.

-Neville

west.h5

Jeremy Leung

unread,
Sep 1, 2021, 5:26:42 PM9/1/21
to westpa...@googlegroups.com
Hi Neville,

One thing I noticed was that the pcoord in the west.h5 file does not correspond with what you posted in the text file. The pcoord in iteration 31 also "jumped" (compared to iter_29 and iter_30), which suggest some reading/writing discrepancies before and after the restart.

Another thing I noticed was that the pcoord is saved as an integer (as defined in your west.cfg), which might contribute to some rounding error. That might affect how WESTPA process some of the trajectories, especially after a restart where it's starting from a blank slate.

If you could double check to make sure your runseg.sh and get_pcoord.sh are passing the correct terms to $WEST_PCOORD_RETURN, that'll hopefully solve it.

-- JL
--
Jeremy M. G. Leung
PhD Candidate, Chemistry
Graduate Student Researcher, Chemistry 
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

Neville Bethel

unread,
Sep 3, 2021, 12:10:56 PM9/3/21
to westpa...@googlegroups.com
Hi Jeremy, 

Changing the pcoord type from int to float fixed everything. It's very strange how the pcoord was wildly different in the h5 instead of just rounded off, but now it matches exactly.

I had recently changed from an int to float pcoord and forgot to change the format in the config, so it makes sense that that was what was causing the issues.
Maybe WESTPA should throw an error if the pcoord data type doesn't match.

Thanks for catching that.

Best,
Neville


Jeremy Leung

unread,
Sep 3, 2021, 12:41:07 PM9/3/21
to westpa...@googlegroups.com
Hi Neville,

Great! Very glad I could help.

We'll consider your suggestion about verifying pcoord data type. Better error messaging is something we're looking to improve in the near future.

-- JL
--
Jeremy M. G. Leung
PhD Candidate, Chemistry
Graduate Student Researcher, Chemistry 
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]
Reply all
Reply to author
Forward
0 new messages