[slurm-users] job not running because of "Resources", but resources are available

264 views
Skip to first unread message

Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)

unread,
Mar 19, 2021, 8:14:06 PM3/19/21
to Slurm User Community List
Can anyone explain why job 1908239 is not running, or what else I can check?  squeue says "Resources", and start time is always right now, no matter when I run "squeue --start", but the resources are available according to "sinfo ... state=idle".  It's only a 1 minute job, so it's not because the nodes won't be available for long enough to be backfilled.

slurm version is admittedly a bit old, 19.05.7


> squeue -p n2019 --state=PD -l
Fri Mar 19 20:09:17 2021
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           1908239     n2019 LiCu_SPA bernstei  PENDING       0:00      1:00      1 (Resources)
           1908236     n2019 cspbbr3-  jllyons  PENDING       0:00 2-16:00:00      2 (Priority)
           1908227     n2019 Cy3_dupl    yckim  PENDING       0:00 33-08:00:00      4 (Priority)
           1908231 n2019,n20 sGC_Fe_N bernstei  PENDING       0:00 7-00:00:00      4 (JobHeldUser)
           1908238     n2019 LiCu_SPA bernstei  PENDING       0:00   1:00:00      1 (JobHeldUser)

> squeue -j 1908239 --start
             JOBID PARTITION     NAME     USER ST          START_TIME  NODES SCHEDNODES           NODELIST(REASON)
           1908239     n2019 LiCu_SPA bernstei PD 2021-03-19T20:09:17      1 compute-4-[18-19]    (Resources)

> sinfo -p n2019 state=idle
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
n2019        up   infinite     43  alloc compute-4-[0-11,13-17,20-26,28-39,41-47]
n2019        up   infinite      5   idle compute-4-[12,18-19,27,40]

Prentice Bisbal

unread,
Mar 21, 2021, 9:56:35 PM3/21/21
to slurm...@lists.schedmd.com

Please post the output of 'scontrol show job 1908239', and also the output of 'scontrol show node' for one of the idle compute nodes.

Prentice 
Reply all
Reply to author
Forward
0 new messages