[slurm-dev] srun and queues

0 views
Skip to first unread message

Michael Di Domenico

unread,
Nov 24, 2009, 11:13:05 AM11/24/09
to slur...@lists.llnl.gov
I read through the scheduling docs on the slurm website, but i'm not
sure i understand the relation to srun, sbatch and queues. Here's a
scenario:

A cluster with 100 Nodes / 8 Cores each
Between 8am and 8pm users are submitting jobs using `srun -n <num>`,
where <num> is variable
Jobs are short lived, but the quantity being submitted keeps the
cluster at about 80% utilization all day

Occasionally, there is someone that wants to run a large job, (ie will
consume 80-90% of the whole cluster), usually happens in the morning
(between 8am-12pm)

This example also make the assumption that there is no priorities or
special queue's setup, just simple select/linear and sched/backfill

So my question when would the job actually run? Using either srun or sbatch.

Ideally, i'd like to see someone submit their job using sbatch and
when the job bubbles to the top of list in the midst of sruns, slurm
starts holding resources until it has enough to complete the job and
then continues the sruns.

But I'm not sure that's the resultant behavior, a short test i did
seemed to indicate that it does all the jobs it can complete at a
given time. Which would indicate that my full cluster job would not
get run until 8pm when the srun's stop and the cluster utilization
drops to less then 10%.

Don Lipari

unread,
Nov 24, 2009, 4:28:50 PM11/24/09
to slur...@lists.llnl.gov
Michael,

With backfill scheduling enabled, jobs that are not backfilled will
bubble to the top of the queue in a simple FIFO basis. When a large
job makes it to the top of the queue, SLURM will begin to withhold
scheduling nodes it reserves for the top job. Smaller jobs will
still be backfilled, as long as they end before the top priority job
is slated to begin.

This is true no matter if the jobs are submitted using salloc,
sbatch, or srun. The reason your full cluster job does not run until
8pm is because that's when the last, higher priority, running job
completes and releases its nodes. The cluster utilization will
decrease towards zero percent as the 8pm hour approaches, as fewer
and fewer smaller jobs are available to backfill into the ever
shrinking time window.

Don

Reply all
Reply to author
Forward
0 new messages