I'm using the slurm.conf file from the attachment. I have some partitions, e.g., "long", that apply to the same list of nodes but that are disjoint to the partition "highmem". "highmem" only allows a user to use one node at a time. I have a user who has
submitted many jobs to highmem. Now, other users are waiting for a long time to get their interactive job being scheduled (sometimes 8+min) while some interactive jobs get run in shorter times.
Looking into "sprio" it appears like the jobs in "long" have lower priority than the jobs in the "highmem" partition and have to wait for them. This is surprising to me as the eligible nodes are completely disjoint sets.
# scontrol show job 1860360
JobId=1860360 JobName=bash
UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(1005272) MCS_label=N/A
Priority=1254 Nice=0 Account=hpc-ag-cubi QOS=normal
JobState=PENDING Reason=Priority Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=28-00:00:00 TimeMin=N/A
SubmitTime=2021-03-23T13:39:18 EligibleTime=2021-03-23T13:39:18
AccrueTime=2021-03-23T13:39:18
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-23T13:39:18
Partition=long AllocNode:Sid=
172.16.35.153:21670
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=8G,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=bash
WorkDir=/fast/home/users/holtgrem_c
Power=
NtasksPerTRES:0