restarting MAB simulation raises AssertionError: assert abs(1 - norm) < EPS*(len(segments)+n_active_bins)

47 views
Skip to first unread message

Rebecca Walters

unread,
Nov 19, 2021, 5:41:42 AM11/19/21
to westpa-users
Hi,

I am running a 2-D steady-state MAB simulation on my university HPC. When the simulation runs out of time, I try to restart it by just submitting the run.slurm file again. However, it immediately fails as soon as it starts running and I get this error (from the west.log):

System is being built only off of the system driver

Maximum wallclock time: 7 days, 0:00:00

Fri Nov 19 03:35:58 2021

Iteration 5 (100 requested)

Beginning iteration 5

88 segments remain in iteration 5 (88 total)

exception caught; shutting down

-- ERROR    [w_run] -- Traceback (most recent call last):

  File "/mnt/storage/scratch/ft18983/Neuraminidase/westpa/src/westpa/cli/core/w_run.py", line 56, in entry_point

    sim_manager.run()

  File "/mnt/storage/scratch/ft18983/Neuraminidase/WESTPA_1.0/NA_OSE_unbinding/manager.py", line 622, in run

    self.prepare_iteration()

  File "/mnt/storage/scratch/ft18983/Neuraminidase/WESTPA_1.0/NA_OSE_unbinding/manager.py", line 368, in prepare_iteration

    self.report_bin_statistics(initial_binning, save_summary=True)

  File "/mnt/storage/scratch/ft18983/Neuraminidase/WESTPA_1.0/NA_OSE_unbinding/manager.py", line 141, in report_bin_statistics

    assert abs(1 - norm) < EPS*(len(segments)+n_active_bins)

AssertionError

Can someone help me understand why this isn't working? I am also running another 1D steady state MAB simulation where I have restarted it by submitting to the queue again, and I do not get this error.

I am attaching my west.log, west.cfg, tstate.file, and adaptive.py. Please let me know if you need anymore files or information.

Thanks very much!

Kind regards, 

Rebecca Walters

adaptive.py
tstate.file
west.log
west.cfg

Leung, Jeremy

unread,
Nov 19, 2021, 8:24:06 AM11/19/21
to westpa...@googlegroups.com
Hi Rebecca,

You will have to remove the binbound.txt every time before you restart. 

I suggest doing the following:
1) Add `rm binbounds.txt` into your run.slurm submit script (first thing after sourcing env.sh)
2) `w_truncate -n 4` to remove the wrongly restarted iterations where weights are not conserved (Do two just in case, so -n 4 will delete iterations 4 and 5)
3) Remove the corresponding iterations' seg_logs and traj_segs

Hopefully that will fix it!

Best,

Jeremy Leung
-- 
Jeremy M. G. Leung
PhD Candidate, Chemistry
Graduate Student Researcher, Chemistry (Chong Lab)
University of Pittsburgh | 219 Parkman Avenue, Pittsburgh, PA 15260
jml...@pitt.edu | [He, Him, His]

--
You received this message because you are subscribed to the Google Groups "westpa-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to westpa-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/westpa-users/98c8d3b2-c03c-4a55-a8c3-a77c904f5375n%40googlegroups.com.
<adaptive.py><tstate.file><west.log><west.cfg>

Rebecca Walters

unread,
Nov 19, 2021, 9:54:49 AM11/19/21
to westpa-users

Thank you for your speedy reply Jeremy, that seems to have fixed the problem!

Kind regards, 
Rebecca 
Reply all
Reply to author
Forward
0 new messages