Hi and thanks for reading this!
I am trying to estimate the queue time of a job of a certain size and walltime limit. I am doing this because our project considers multiple HPC resources and needs estimated queue time information to decide where to actually submit the job.
From the man page of ‘sbatch’, I found that the “test-only” option can be used to “validate the batch script and return an estimate of when a job would be scheduled to run given the current job queue and all the other arguments specifying the job requirements”. This looks very promising to us.
I tried several launches in IU BigRed3 and TACC Stampede2 HPCs, the recorded results are shown below. (the last two columns are the estimated queue time and actual queue time). From the results, it looks like the estimated time is quite inaccurate (can be either over-estimated or under-estimated):
-----start of output
|
site |
slurm version |
partition |
JobID |
node |
np |
walltime_mins |
timestamp_estimate |
estimated_start |
submit_time |
actual_start |
estimated_wait |
actual_wait |
|
stampede2 |
18.08.5-2 |
skx-normal |
8436162 |
1 |
48 |
10 |
9/9/2021 16:05 |
9/11/2021 23:29 |
9/9/2021 16:08 |
9/9/2021 16:11 |
55:23:56 |
0:02:49 |
|
Stampede2 |
18.08.5-2 |
skx-normal |
8436369 |
1 |
48 |
10 |
9/9/2021 16:51 |
9/12/2021 0:04 |
9/9/2021 16:51 |
9/9/2021 16:52 |
55:13:00 |
0:00:58 |
|
Stampede2 |
18.08.5-2 |
normal |
8436193 |
1 |
48 |
10 |
9/9/2021 16:17 |
9/9/2021 18:02 |
9/9/2021 16:19 |
9/9/2021 16:19 |
1:45:26 |
0:00:02 |
|
Stampede2 |
18.08.5-2 |
normal |
8436308 |
2 |
48 |
10 |
9/9/2021 16:40 |
9/9/2021 18:25 |
9/9/2021 16:41 |
9/9/2021 16:41 |
1:45:00 |
0:00:04 |
|
Bigred3 |
20.11.7 |
general |
1727144 |
1 |
24 |
10 |
9/9/2021 17:57 |
9/10/2021 12:39 |
9/9/2021 17:59 |
9/9/2021 17:59 |
18:42:00 |
0:00:00 |
|
Bigred3 |
20.11.7 |
general |
1734075 |
1 |
24 |
60 |
9/15/2021 14:54 |
9/15/2021 14:54 |
9/15/2021 14:54 |
9/15/2021 15:01 |
0:00:00 |
0:07:11 |
|
Bigred3 |
20.11.7 |
general |
1734079 |
1 |
24 |
20 |
9/15/2021 15:09 |
9/15/2021 15:09 |
9/15/2021 15:09 |
9/15/2021 15:09 |
0:00:00 |
0:00:01 |
|
Bigred3 |
20.11.7 |
general |
1734081 |
4 |
24 |
60 |
9/15/2021 15:11 |
9/15/2021 15:11 |
9/15/2021 15:11 |
9/15/2021 15:34 |
0:00:00 |
0:22:15 |
-----end of output
Could you suggest better ways to estimating the queue time? Or are there any specific configurations/situations on those systems on those systems that might affect the qeueue time estimation? (e.g. fair sharing and site-specific QoS settings?)
Below is an example of my measurement for your information:
-----begin of example
lifen@elogin1(:):~$date && sbatch --test-only -n 24 -N 4 -p general -t 00:60:00 --wrap "hostname"
Wed Sep 15 15:11:49 EDT 2021
sbatch: Job 1734080 to start at 2021-09-15T15:11:49 using 24 processors on nodes nid00[935-938] in partition general
lifen@elogin1(:):~$date && sbatch -n 24 -N 4 -p general -t 00:60:00 --wrap "hostname"
Wed Sep 15 15:11:58 EDT 2021
Submitted batch job 1734081
lifen@elogin1(:):~$sacct --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist -j 1734081
User JobID JobName Partition State Timelimit Start End Elapsed MaxRSS MaxVMSize NNodes NCPUS NodeList
--------- ------------ ---------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- ---------- -------- ---------- ---------------
lifen 1734081 wrap general COMPLETED 01:00:00 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 4 24 nid00[169,883,+
1734081.bat+ batch COMPLETED 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 2136K 226420K 1 18 nid00169
1734081.ext+ extern COMPLETED 2021-09-15T15:34:13 2021-09-15T15:34:13 00:00:00 4K 4K 4 24 nid00[169,883,+
-----end of example
Thanks,
Feng Li
I can imagine at least the following causing differences in the estimated time and the actual start time:
Haven't looked at the code to see if the test-only parameter goes through a complete scheduling cycle before returning the estimate, but I can guarantee that the first two items above happen all the time on my much simpler cluster here.
Hi Machael,
Thanks for your quick response. All factors you mentioned look valid to me. For 1, a similar situation can also happen when there is a queued large job that just failed.
I feel it can be bad if the backfill scheduler is enabled by default, but the ‘–test-only’ estimation doesn’t consider backfill. (I have not checked the details in the source code yet)
And I am still wondering what can be a better way to do such queue estimation. I searched the mailist archive, and saw many suggestions on using the ‘–test-only’ option, but I have not seen much discussion on how reliable it is.
Best,
Feng