[slurm-users] Job preempts entire host instead of single job

121 views
Skip to first unread message

Michał Kadlof

unread,
Jan 17, 2023, 7:04:44 AM1/17/23
to Slurm User Community List

Hi,

I struggle with configuring job preempting. I have nodes with 8 Nvidia A100 GPUs. I have two partitions: short (lower priority) and sfglab (higher priority). I want to allow higher priority jobs to preempt (REQUEUE mode) lower priority job. It looks like it works, however it works too good.

Job from higher priority partition preempts entire host instead of only single job which would be enough to release resources for higher priority partition. Whats more it lock the rest of resources until high-prio job will end. What am I doing wrong?

Here is example:

$ srun --test-only -G1 -c1 --mem 1M -p sfglab
srun: Job 501151 to start at 2023-01-17T12:46:01 using 1 processors on nodes dgx-1 in partition sfglab
srun:   Preempts: 363278,501001,501029,501075,501076,501077,501120,501121

To release these resources it would be enough to preempt one job instead of all.


Here is my config:

slurm.conf

(...)

DefMemPerCPU            = 100
JobAcctGatherFrequency  = 30
JobAcctGatherType       = jobacct_gather/linux
PreemptMode             = REQUEUE
PreemptType             = preempt/partition_prio
PreemptExemptTime       = 00:00:00
SelectType              = select/cons_tres
SelectTypeParameters    = CR_CORE_MEMORY

(...)

PartitionName=short Nodes=dgx-[1-4],sr-[1-3] MaxTime=1-0 State=UP PriorityTier=10000 Default=YES DefaultTime=0-01:00:00 OverSubscribe=NO PreemptMode=requeue

PartitionName=sfglab Nodes=dgx-1 MaxTime=10-0 State=UP PriorityTier=20000 PreemptMode=off OverSubscribe=NO AllowAccounts=sfglab

--
best regards | pozdrawiam serdecznie
Michał Kadlof
Head of the high performance computing center
EdenN cluster administrator
Faculty of Mathematics and Computer Science
Warsaw University of Technology

Michael Gutteridge

unread,
Jan 17, 2023, 9:00:02 AM1/17/23
to Slurm User Community List
Hi

I believe this is how the preemption algorithm works- it selects the entire node's resources:

> For performance reasons, the backfill scheduler reserves whole nodes for jobs, not partial nodes.


However, that does specifically call out the backfill scheduler.  Is that the scheduler type you're using?

 - Michael

Reply all
Reply to author
Forward
0 new messages