[slurm-users] Disjoint partitions & jobs stuck in JobState=PENDING Reason=Priority

39 views

Skip to first unread message

Holtgrewe, Manuel

unread,

Mar 23, 2021, 8:49:02 AM3/23/21

to Slurm User Community List ‎[slurm-users@lists.schedmd.com]‎

Dear all,

I'm using the slurm.conf file from the attachment. I have some partitions, e.g., "long", that apply to the same list of nodes but that are disjoint to the partition "highmem". "highmem" only allows a user to use one node at a time. I have a user who has submitted many jobs to highmem. Now, other users are waiting for a long time to get their interactive job being scheduled (sometimes 8+min) while some interactive jobs get run in shorter times.

Looking into "sprio" it appears like the jobs in "long" have lower priority than the jobs in the "highmem" partition and have to wait for them. This is surprising to me as the eligible nodes are completely disjoint sets.

Is this expected behaviour?

Does anyone of you have an idea?

Am I completely on the wrong track here?

Thanks!

Example for a stuck job:

# scontrol show job 1860360
JobId=1860360 JobName=bash
   UserId=holtgrem_c(100131) GroupId=hpc-ag-cubi(1005272) MCS_label=N/A
   Priority=1254 Nice=0 Account=hpc-ag-cubi QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=28-00:00:00 TimeMin=N/A
   SubmitTime=2021-03-23T13:39:18 EligibleTime=2021-03-23T13:39:18
   AccrueTime=2021-03-23T13:39:18
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-23T13:39:18
   Partition=long AllocNode:Sid=172.16.35.153:21670
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=8G,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=8G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/fast/home/users/holtgrem_c
   Power=
   NtasksPerTRES:0

--
Dr. Manuel Holtgrewe, Dipl.-Inform.
Bioinformatician
Core Unit Bioinformatics – CUBI
Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the Helmholtz Association / Charité – Universitätsmedizin Berlin

Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
Postal Address: Chariteplatz 1, 10117 Berlin

E-Mail: manuel.h...@bihealth.de
Phone: +49 30 450 543 607
Fax: +49 30 450 7 543 901
Web: cubi.bihealth.org www.bihealth.org www.mdc-berlin.de www.charite.de