[slurm-users] How to exclude master from computing? Set to DRAINED?

13 views
Skip to first unread message

Xaver Stiensmeier via slurm-users

unread,
Jun 24, 2024, 7:56:55 AMJun 24
to slurm...@lists.schedmd.com

Dear Slurm users,

in our project we exclude the master from computing before starting Slurmctld. We used to exclude the master from computing by simply not mentioning it in the configuration i.e. just not having:

    PartitionName=SomePartition Nodes=master

or something similar. Apparently, this is not the way to do this as it is now a fatal error

fatal: Unable to determine this slurmd's NodeName

therefore, my question:

What is the best practice for excluding the master node from work?

I personally primarily see the option to set the node into DOWN, DRAINED or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way to go. RESERVED fits with the second part "The node is in an advanced reservation and not generally available." and DRAINED "The node is unavailable for use per system administrator request." fits completely. So is DRAINED the correct setting in such a case?

Best regards,
Xaver

Hermann Schwärzler via slurm-users

unread,
Jun 24, 2024, 8:13:19 AMJun 24
to slurm...@lists.schedmd.com
Dear Xaver,

we have a similar setup and yes, we have set the node to "state=DRAIN".
Slurm keeps it this way until you manually change it to e.g. "state=RESUME".

Regards,
Hermann

On 6/24/24 13:54, Xaver Stiensmeier via slurm-users wrote:
> Dear Slurm users,
>
> in our project we exclude the master from computing before starting
> Slurmctld. We used to exclude the master from computing by simply not
> mentioning it in the configuration i.e. just not having:
>
>     PartitionName=SomePartition Nodes=master
>
> or something similar. Apparently, this is not the way to do this as it
> is now a fatal error
>
> fatal: Unable to determine this slurmd's NodeName
>
> therefore, my *question:*
>
> What is the best practice for excluding the master node from work?
>
> I personally primarily see the option to set the node into DOWN, DRAINED
> or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way
> to go. RESERVED fits with the second part "The node is in an advanced
> reservation and *not generally available*." and DRAINED "The node is
> unavailable for use per system administrator request." fits completely.
> So is *DRAINED* the correct setting in such a case?
>
> Best regards,
> Xaver
>
>
>

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Steffen Grunewald via slurm-users

unread,
Jun 24, 2024, 8:25:35 AMJun 24
to Xaver Stiensmeier, slurm...@lists.schedmd.com
On Mon, 2024-06-24 at 13:54:43 +0200, Slurm users wrote:
> Dear Slurm users,
>
> in our project we exclude the master from computing before starting
> Slurmctld. We used to exclude the master from computing by simply not
> mentioning it in the configuration i.e. just not having:
>
>     PartitionName=SomePartition Nodes=master
>
> or something similar. Apparently, this is not the way to do this as it
> is now a fatal error
>
> fatal: Unable to determine this slurmd's NodeName

You're attempting to start the slurmd - which isn't required on this
machine, as you say. Disable it. Keep slurmctld enabled (and declared
in the config).

> therefore, my *question:*
>
> What is the best practice for excluding the master node from work?

Not defining it as a worker node.

> I personally primarily see the option to set the node into DOWN, DRAINED
> or RESERVED.

These states are slurmd states, and therefor meaningless for a machine
that doesn't have a running slurmd. (It's the nodes that are defined in
the config that are supposed to be able to run slurmd.)

> So is *DRAINED* the correct setting in such a case?

Since this only applies to a node that has been defined in the config,
and you (correctly) didn't do so, there's no need (and no means) to
"drain" it.

Best
Steffen

--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Loris Bennett via slurm-users

unread,
Jun 24, 2024, 8:26:26 AMJun 24
to Slurm Users Mailing List
Hi Xaver,

Xaver Stiensmeier via slurm-users <slurm...@lists.schedmd.com>
writes:
You just don't configure the head node in any partition.

You are getting the error because you are starting 'slurmd' on the node,
which implies you do want to run jobs there. Normally you would run only
'slurmctld' and possibly also 'slurmdbd' on your head node.

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

Stephan Roth via slurm-users

unread,
Jun 24, 2024, 8:34:39 AMJun 24
to Xaver Stiensmeier, slurm...@lists.schedmd.com
Dear Xaver,

Could you clarify the function of what you call "master"?

If it's the Slurm controller, i.e. running slurmctld: Why do you need
slurmd running on it as well?

Best,
Stephan

On 24.06.24 13:54, Xaver Stiensmeier via slurm-users wrote:
> Dear Slurm users,
>
> in our project we exclude the master from computing before starting
> Slurmctld. We used to exclude the master from computing by simply not
> mentioning it in the configuration i.e. just not having:
>
>     PartitionName=SomePartition Nodes=master
>
> or something similar. Apparently, this is not the way to do this as it
> is now a fatal error
>
> fatal: Unable to determine this slurmd's NodeName
>
> therefore, my *question:*
>
> What is the best practice for excluding the master node from work?
>
> I personally primarily see the option to set the node into DOWN, DRAINED
> or RESERVED. Since we use ReturnToService=2, I guess DOWN is not the way
> to go. RESERVED fits with the second part "The node is in an advanced
> reservation and *not generally available*." and DRAINED "The node is
> unavailable for use per system administrator request." fits completely.
> So is *DRAINED* the correct setting in such a case?
>
> Best regards,
> Xaver

Xaver Stiensmeier via slurm-users

unread,
Jun 24, 2024, 9:11:36 AMJun 24
to slurm...@lists.schedmd.com
Thanks Steffen,

that makes a lot of sense. I will just not start slurmd in the master
ansible role when the master is not to be used for computing.

Best regards,
Xaver

--

Reply all
Reply to author
Forward
0 new messages