[slurm-dev] Preemption, signals and Gracetime

Near-Ansari, Naveed

unread,

Sep 13, 2016, 6:49:35 AM9/13/16

to slurm-dev

We are setting up preemption using QOS on our cluster. Documentation seems to say that when a job is preempted it should be getting a SIGCONT and SOGTERM when selected for preemption, and then a SIGTERM, SIGCONT, AND SIGKILL at the end of gracetime.

We have checked all of this and we are sent the signals at the end of GraceTime, but not when selected for preemption. We are listening for these signals to checkpoint when preempted. We are checking for the signals both in the script and the launched executables in case we are wrong about what catches the signals.

I am unclear whether the problem is of my understanding of how it is supposed to work, my configuration, or the documentation.

The doc that mentions the signals sent it http://slurm.schedmd.com/preempt.html.

This is the qos setup:

Name GraceTime Preempt PreemptMode
---------- ---------- ---------- -----------
normal 00:00:00 cluster
sxs-lo 00:20:00 cancel
sxs-hi 00:20:00 sxs-lo cancel

/etc/slurm/slurm.conf:

…
PreemptType=preempt/qos
PreemptMode=CANCEL
…

What am I doing wrong on this?

Thanks,

Naveed

Near-Ansari, Naveed

unread,

Sep 15, 2016, 2:32:41 PM9/15/16

to slurm-dev

My reading of it is that this was added in Slurm 14.11.0pre1 and I don’t see any changes to it later, though I could have missed it:

-- Sent SIGCONT/SIGTERM when a job is selected for preemption with GraceTime
configured rather than waiting for GraceTime to be reached before notifying
the job.

Does anyone have a similar setup that is working?

Naveed

Bill Broadley

unread,

Sep 29, 2016, 8:47:46 PM9/29/16

to slurm-dev

Has anyone gotten the High, Medium, and Low example at
http://slurm.schedmd.com/preempt.html working with slurm-16.05?

It looks pretty simple, there's been some changes from previous versions of
slurm. Mainly that it's PriorityTier instead of Priority and OverSubscribe
instead of SHARED.

So their new example (on 3 lines, sorry for the warp):
PartitionName=low Nodes=lnx Default=YES OverSubscribe=NO PriorityTier=10 \
PreemptMode=requeue
PartitionName=med Nodes=lnx Default=NO OverSubscribe=FORCE:1 PriorityTier=20 \
PreemptMode=suspend
PartitionName=hi Nodes=lnx Default=NO OverSubscribe=FORCE:1 PriorityTier=30 \
PreemptMode=off

I poured over that page and came up with this:

$ cat slurm.conf | egrep -v "^#" | egrep -i
"selecttype|SelectTypeParameter|DefMemPerCPU|JobAcctGatherType|PreemptMode|PreemptType|PriorityTier|SchedulerType"
DefMemPerCPU=2000
SchedulerType=sched/builtin
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
PreemptType=preempt/partition_prio
PreemptMode=GANG,SUSPEND
JobAcctGatherType=jobacct_gather/linux

PartitionName=low Nodes=linux Default=NO OverSubscribe=NO \
PriorityTier=10 PreemptMode=requeue MaxTime=INFINITE State=UP DefMemPerCPU=2000
PartitionName=med Nodes=linux Default=NO OverSubscribe=FORCE:1 \
PriorityTier=20 PreemptMode=suspend MaxTime=INFINITE State=UP DefMemPerCPU=2000
PartitionName=high Nodes=linux Default=YES OverSubscribe=FORCE:1 \
PriorityTier=30 PreemptMode=off MaxTime=INFINITE State=UP DefMemPerCPU=2000

Each node has swap = twice ram.

I noticed the pending job in high:
$ squeue -p high --format="%.8i %.9P %.8T %.10M %.9l %.6D %12R %8Q" | head -10
JOBID PARTITION STATE TIME TIME_LIMI NODES NODELIST(REA PRIORITY
90204 high PENDING 0:00 UNLIMITED 1 (Resources) 2583

Has lower priority then the running jobs in med:
$ squeue -p med --format="%.8i %.9P %.8T %.10M %.9l %.6D %12R %8Q" | grep RUNNING
89647 med RUNNING 2:34:26 UNLIMITED 1 c4-68 9945
89648 med RUNNING 2:30:15 UNLIMITED 1 c4-69 9926

Do I need to set Priority *and* PriorityTier maybe? Any other suggestions?

Reply all

Reply to author

Forward