Hi everyone,
The time that a single WESTPA iteration takes (3 min) is significantly longer than it takes to finish a single segment (55s). However, given that the number of GPUs I have (204 in total running on 51 nodes) is larger than the total number of segments (195), the whole iteration should take ~1min instead of 3min. What am I missing? Why do I get 3D longer iterations?
Here are some log prints. Time taken by a single segment to finish is 56sec + 2sec for computing the p_coord:
#"Step","Potential Energy (kJ/mole)","Temperature (K)","Speed (ns/day)"
1930000,-303022.625,302.42474136859636,0
1935000,-302605.1875,300.0807577990081,174
1940000,-303047.84375,300.0887716341635,173
1945000,-302578.65625,300.1127171038857,173
1950000,-301680.375,302.25395961573724,173
elapsed seconds: 55.99628233909607
+ date
Wed Jan 11 09:19:12 CST 2023
+ python /u/rmarinescu/work/10mer2D/common_files/dist.py
+ date
Wed Jan 11 09:19:14 CST 2023
Time taken by iterations:
grep "wallclock" slurm.out
Iteration wallclock: 0:03:21.727373, cputime: 3:14:25.149217
Iteration wallclock: 0:03:09.230570, cputime: 3:07:03.777573
Iteration wallclock: 0:03:08.426127, cputime: 2:59:31.713428
Iteration wallclock: 0:02:55.408825, cputime: 2:44:50.820108
Iteration wallclock: 0:03:01.559169, cputime: 2:52:22.652424
Iteration wallclock: 0:03:08.952174, cputime: 2:59:49.063276
Iteration wallclock: 0:03:10.041508, cputime: 3:03:40.220779
Iteration wallclock: 0:03:07.729556, cputime: 3:00:02.322551
Iteration wallclock: 0:03:11.174920, cputime: 3:07:35.708657
Iteration wallclock: 0:03:16.932580, cputime: 3:11:16.898027
Iteration wallclock: 0:03:14.564068, cputime: 3:07:34.667934
Iteration wallclock: 0:03:09.452174, cputime: 3:03:45.795260
Iteration wallclock: 0:03:08.356321, cputime: 3:07:48.013928
Iteration wallclock: 0:03:12.781218, cputime: 3:11:36.007694
Iteration wallclock: 0:03:12.135257, cputime: 3:07:48.813627
Iteration wallclock: 0:03:14.538703, cputime: 3:07:52.203024
Iteration wallclock: 0:03:09.185754, cputime: 3:00:22.352625
Iteration wallclock: 0:03:08.892456, cputime: 3:04:15.003890
Iteration wallclock: 0:03:11.210650, cputime: 3:04:15.699921
Iteration wallclock: 0:03:07.959955, cputime: 3:00:33.297622
And I'm currently running on 51 nodes, with 4 GPUs each:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1277965 gpuA40x4 10mer rmarines R 4:06:36 51 gpub[001,003-005,007-009,014,016,018-021,025-027,029-031,035,037,039,045-048,050-057,059-062,065,069,074-075,078-079,081,083,085,087-089,092]
Any ideas?
Many thanks,
Razvan