[slurm-users] Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions

413 views
Skip to first unread message

sportlecon sportlecon via slurm-users

unread,
Jan 4, 2025, 3:13:15 AM1/4/25
to slurm...@lists.schedmd.com
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
Anyone can help to fix this?

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

John Hearns via slurm-users

unread,
Jan 4, 2025, 6:43:12 AM1/4/25
to sportlecon sportlecon, Slurm User Community List
Output of sinfo and squeue

Look at slurmd log in an example node also
Tail -f is your friend 

Brian Andrus via slurm-users

unread,
Jan 4, 2025, 6:17:23 PM1/4/25
to slurm...@lists.schedmd.com

Run 'sinfo -R' to see the reason any nodes may be down.

It may be as simple as running 'scontrol update state=resume nodename=xxxx' to bring them back, if they are down. It depends on the reason they went down (if that is the issue).

Otherwise, check the job requirements to see what it is asking for that does not exist 'scontrol show job xxx'

Brian Andrus

Steffen Grunewald via slurm-users

unread,
Jan 7, 2025, 3:29:38 AM1/7/25
to sportlecon sportlecon, slurm...@lists.schedmd.com
On Sat, 2025-01-04 at 08:11:21 -0000, Slurm users wrote:
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
> 26 cpu myscript user1 PD 0:00 4 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
> Anyone can help to fix this?

Not without a little bit of extra information,
e.g. "sinfo -p cpu" and maybe "scontrol show job=26"

Best,
Steffen

--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~
Reply all
Reply to author
Forward
0 new messages