[slurm-users] Suspend QOS help

90 views
Skip to first unread message

Walls, Mitchell

unread,
Feb 18, 2022, 10:20:51 AM2/18/22
to slurm...@lists.schedmd.com
Hello,

Hoping someone can shed some light on what is causing jobs to run on same nodes simultaneously rather than being actually suspended for the lower priority job? I can provide more info if someone can think of something to help!

# Relevant config.
PreemptType=preempt/qos
PreemptMode=SUSPEND,GANG

PartitionName=general Default=YES Nodes=general OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=general AllowQos=general
PartitionName=suspend Default=NO Nodes=general OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend

# Qoses
Name Priority Preempt PreemptMode
---------- ---------- ---------- -----------
general 1000 suspend cluster
suspend 100 cluster

# squeue (another note is I see that both processes are actually running at same time and not being timesliced in htop)
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
45085 general stress.s user2 R 7:33 2 node[04-05]
45084 suspend stress-s user1 R 7:40 2 node[04-05]

Thanks!

Brian Andrus

unread,
Feb 18, 2022, 10:37:41 AM2/18/22
to slurm...@lists.schedmd.com
First look and I would guess that there are enough resources to satisfy
the requests of both jobs, so no need to suspend.

Having the node info and the job info to compare would be the next step.

Brian Andrus

Walls, Mitchell

unread,
Feb 18, 2022, 10:54:42 AM2/18/22
to slurm...@lists.schedmd.com
Both jobs would be using the whole node same as below but with two nodes. I've reduced the problem space to two isolated partitions on just node04.
NodeName=node04 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=257476 Features=cpu

# qoses have stayed the same.
Name Priority Preempt PreemptMode
---------- ---------- ---------- -----------
general 1000 suspend cluster
suspend 100 cluster

# test partitions
PartitionName=test Default=NO Nodes=cc-cpu-04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=general AllowQos=general
PartitionName=suspend Default=NO Nodes=cc-cpu-04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend

stress-suspend.sh
#!/bin/bash
#SBATCH -p suspend
#SBATCH -C cpu
#SBATCH -q suspend
#SBATCH -c 32
#SBATCH --ntasks-per-node=1
#SBATCH -N 1
stress -c 32 -t $1

#stress.sh
#!/bin/bash
#SBATCH -p test
#SBATCH -C cpu
#SBATCH -q general
#SBATCH -c 32
#SBATCH --ntasks-per-node=1
#SBATCH -N 1
stress -c 32 -t $1


________________________________________
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Brian Andrus <toom...@gmail.com>
Sent: Friday, February 18, 2022 9:36 AM
To: slurm...@lists.schedmd.com
Subject: Re: [slurm-users] Suspend QOS help

Walls, Mitchell

unread,
Feb 18, 2022, 10:55:35 AM2/18/22
to slurm...@lists.schedmd.com
Whoops Nodes name was wrong correction here for partitions.
# test partitions
PartitionName=test Default=NO Nodes=node04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=general AllowQos=general
PartitionName=suspend Default=NO Nodes=node04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend

________________________________________
From: Walls, Mitchell <miw...@siue.edu>
Sent: Friday, February 18, 2022 9:54 AM

Walls, Mitchell

unread,
Feb 18, 2022, 11:25:15 AM2/18/22
to slurm...@lists.schedmd.com
Time slicing based suspend works for the QOSes submitted to the same partition so I at least think the configuration is close. It just doesn't seem to be working for separate partitions for the suspend and general QOS. I'd prefer not to timeslice in the separate partition configuration but it seems both jobs run without timeslicing or suspending when on separate partitions.

45112 test stress.s user2 R 0:02 1 node04
45110 test stress-s user1 S 1:00 1 node04

# A time later
45112 test stress.s user2 R 1:17 1 node04
45110 test stress-s user1 S 2:00 1 node04

#partition
PartitionName=test Default=NO Nodes=node04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 AllowQos=general,suspend

________________________________________
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Walls, Mitchell <miw...@siue.edu>
Sent: Friday, February 18, 2022 9:55 AM

Walls, Mitchell

unread,
Feb 18, 2022, 12:38:59 PM2/18/22
to slurm...@lists.schedmd.com
Looks as if this is a bug or at least seems to be kind of weird and definitely not mentioned in the docs but separate partition QOS suspend needs PriorityTier set on the partitions found in this https://bugs.schedmd.com/show_bug.cgi?id=13410

PartitionName=test Default=NO Nodes=node04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 AllowQos=general PriorityTier=2
PartitionName=suspend Default=NO Nodes=node04 OverSubscribe=FORCE:1 MaxTime=30-00:00:00 AllowQos=suspend PriorityTier=1

with above I see it working:
45123 suspend stress-s user1 S 2:09 1 node04
45124 test stress.s user2 R 4:24 1 node04

________________________________________
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Walls, Mitchell <miw...@siue.edu>
Sent: Friday, February 18, 2022 10:24 AM
Reply all
Reply to author
Forward
0 new messages