Hi Anthony,
I’m using our local clusters. Right now, I’m using two cpu nodes with 64 processors in total for this test. I’m thinking maybe it is the node communication issue and I’m trying to confirm that. So I did another test with only 1 node requested (32 processors). I also modified the number of bins and bin_target_counts because of that. Now I got a new error message:
Mon Aug 15 12:58:40 2022
Iteration 42 (100 requested)
Beginning iteration 42
8 segments remain in iteration 42 (8 total)
4 of 15 (26.666667%) active bins are populated
per-bin minimum non-zero probability: 0.00296617
per-bin maximum probability: 0.356194
per-bin probability dynamic range (kT): 4.78821
per-segment minimum non-zero probability: 0.00148308
per-segment maximum non-zero probability: 0.310059
per-segment probability dynamic range (kT): 5.34264
norm = 1, error in norm = 0 (0*epsilon)
Waiting for segments to complete...
-- ERROR [westpa.core.propagators.executable] -- could not read pcoord from '/tmp/tmpmay36ihx': ValueError('cannot reshape array of size 3 into shape (11,1)')
-- ERROR [westpa.core.sim_manager] -- propagation failed for 1 segment(s):
0
exception caught; shutting down
-- ERROR [w_run] -- error message: propagation failed for 1 segments
-- ERROR [w_run] -- Traceback (most recent call last):
File "/data/yge3/.conda/envs/westpa2/lib/python3.9/site-packages/westpa/cli/core/w_run.py", line 62, in run_simulation
sim_manager.run()
File "/data/yge3/.conda/envs/westpa2/lib/python3.9/site-packages/westpa/core/sim_manager.py", line 752, in run
self.check_propagation()
File "/data/yge3/.conda/envs/westpa2/lib/python3.9/site-packages/westpa/core/sim_manager.py", line 650, in check_propagation
raise PropagationError('propagation failed for {:d} segments'.format(len(failed_segments)))
westpa.core.sim_manager.PropagationError: propagation failed for 1 segments
I think the error message is bout the number of frames in the finished simulation. I set pcoord_len as 11 to make it consistent with the simulation length and saving frequency I set in the openmm simulation script. So I should have 11 frames for each finished simulation. Now for some reason I got only 3 out and that caused this error. This is my guess. Maybe you can provide your professional suggestions on the reason of this error and how to fix it?
Thanks!
Yunhui