[slurm-users] job stuck as pending - reason "PartitionConfig"

7,433 views
Skip to first unread message

byron

unread,
Sep 29, 2021, 10:35:14 AM9/29/21
to Slurm User Community List
Hi

When I try to submit a job to one of our partitions it just stay in the stay pending with the reason "PartitionConfig".  Can someone point me in the right direction for how to troubleshoot this?  I'm a bit stumpped.

Some details of the setup

The version is 19.05.7

This is the job that is stuck in state pending
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          10860160   highmem MooseBen byron PD       0:00     16 (PartitionConfig)

$ sinfo -p highmem
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
highmem      up   infinite      1  drain intel-0012
highmem      up   infinite     19   idle intel-[0001-0011,0013-0020]

The output from  scontrol show part
PartitionName=highmem
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=02:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=intel-00[01-20]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE
   OverTimeLimit=NONE PreemptMode=REQUEUE
   State=UP TotalCPUs=320 TotalNodes=20 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Paul Brunk

unread,
Sep 29, 2021, 2:47:27 PM9/29/21
to Slurm User Community List

Hello Byron:

 

I’m guessing that your job is asking for more HW than the highmem_p

has in it, or more cores or RAM within a node than any of the nodes

have, or something like that.  'scontrol show job 10860160' might

help.  You can also look in slurmctld.log for that jobid.

 

--

Paul Brunk, system administrator

Georgia Advanced Computing Resource Center

Enterprise IT Svcs, the University of Georgia

 

From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of byron
Sent: Wednesday, September 29, 2021 10:35
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [slurm-users] job stuck as pending - reason "PartitionConfig"

 

[EXTERNAL SENDER - PROCEED CAUTIOUSLY]

byron

unread,
Sep 30, 2021, 8:06:00 AM9/30/21
to Slurm User Community List
Bingo!

You were right, I was asking for more cores than was available (our highmem nodes have less than out standard nodes).  I was so convinced that the problem was related to my upgrading the OS on those nodes that it never crossed my mind that it was something as straightforward as that.

Thanks for your help.


Reply all
Reply to author
Forward
0 new messages