[slurm-users] GPU-node not waking up after power-save

7 views
Skip to first unread message

Loris Bennett

unread,
Oct 13, 2022, 2:13:08 AM10/13/22
to Slurm Users Mailing List
Hi,

We use Slurm's power saving mechanism to switch of idle nodes. However,
we don't currently use it for our GPU nodes. This is because in the
past these nodes failed to wake up again when jobs were submitted to the
GPU partition. Before we look at the issue due to the current energy
situation, I was wondering whether this a problem others have (had).

So does power-saving work in general for GPU nodes and, if so, are there
any extra steps one needs to take in order to set things up properly?

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Ümit Seren

unread,
Oct 13, 2022, 3:43:49 AM10/13/22
to Slurm User Community List

We use power saving with our GPU nodes and they power up fine. They take a bit longer to boot but that’s it.

What do you mean with not waking up ?

The power on script is not called ?

Best

Ümit

Loris Bennett

unread,
Oct 13, 2022, 4:47:51 AM10/13/22
to Slurm User Community List
Hi Ümit,

Ümit Seren <uemit...@gmail.com> writes:

> We use power saving with our GPU nodes and they power up fine. They take a bit longer to boot but that’s it.
>
> What do you mean with not waking up ?
>
> The power on script is not called ?

The power-on script is called, but the boot process sometimes fails to
complete. To be honest, I can't recall the exact details of why we gave
up on the power-saving, but I think it was some timing problem in the
way systemd was starting the services. We probably just need to compare
the systemd configuration on the GPU nodes with that on the non-GPUs,
which do wake up properly.

Thanks for confirming that there is no fundamental issue.

Cheers,

Loris
Reply all
Reply to author
Forward
0 new messages