[slurm-dev] Best practices for configurations with login nodes

2 views
Skip to first unread message

Carlos Aguado Sanchez

unread,
Nov 18, 2011, 12:52:58 PM11/18/11
to slur...@lists.llnl.gov
Dear all,


We would like to use slurm to start managing the compute resources of a
small cluster of nodes, O(10). Please, let me check with you what is the
correct way to setup a cluster where users shall login to a number of
separate nodes to submit their jobs. Ideally, those login nodes do not
take part of the compute pool.

Additionally, gres is used to control gpu resources. The hardware of the
compute and login pools is different (e.g. lack of GPUs on the login part).

We have got a working configuration where all compute nodes share the
same slurm.conf file. In the login node, slurm.conf is slightly modified
because Gres fails to load in absence of the GPU dev files. That is,
login nodes have the option GresTypes commented.


I have seen the option NO_CONF_HASH to prevent from logging conf.
related error messages. I'm not sure this is desired though. Could you
please shed some light into this?


I have also seen the --enable-front-end option, but I'm not sure it
applies to this case, does it?


Thank you!
Carlos

Moe Jette

unread,
Nov 18, 2011, 1:22:32 PM11/18/11
to slur...@lists.llnl.gov, Carlos Aguado Sanchez
Quoting Carlos Aguado Sanchez <carlos...@epfl.ch>:

> Dear all,
>
>
> We would like to use slurm to start managing the compute resources of a
> small cluster of nodes, O(10). Please, let me check with you what is the
> correct way to setup a cluster where users shall login to a number of
> separate nodes to submit their jobs. Ideally, those login nodes do not
> take part of the compute pool.

Do not define the login nodes in slurm.conf, but do install SLURM on
those nodes.

> Additionally, gres is used to control gpu resources. The hardware of the
> compute and login pools is different (e.g. lack of GPUs on the login part).
>
> We have got a working configuration where all compute nodes share the
> same slurm.conf file. In the login node, slurm.conf is slightly modified
> because Gres fails to load in absence of the GPU dev files. That is,
> login nodes have the option GresTypes commented.

That was fixed very recently. Perhaps version 2.3.2 or not yet
available. You can configure a gres.conf file like this for now on
those nodes:
name=gpu count=0

> I have seen the option NO_CONF_HASH to prevent from logging conf.
> related error messages. I'm not sure this is desired though. Could you
> please shed some light into this?

I would recommend defining different gres.conf files and using the
same slurm.conf without NO_CONF_HASH

> I have also seen the --enable-front-end option, but I'm not sure it
> applies to this case, does it?

That would normally be used only on IBM BlueGene or Cray computers.

>
> Thank you!
> Carlos
>

Carlos Aguado Sanchez

unread,
Nov 21, 2011, 4:28:05 AM11/21/11
to Moe Jette, slur...@lists.llnl.gov
Moe,


All worked as suggested, thanks. Just a side note, on login nodes
without further options, slurm daemon stops because is unable to find
local node name. slurmd -N <nodename> did the trick.


Appreciated thanks!
Carlos

Peter Kjellström

unread,
Nov 22, 2011, 12:25:21 PM11/22/11
to slur...@lists.llnl.gov
On Monday, November 21, 2011 10:28:05 AM Carlos Aguado Sanchez wrote:
> Moe,
>
>
> All worked as suggested, thanks. Just a side note, on login nodes
> without further options, slurm daemon stops because is unable to find
> local node name. slurmd -N <nodename> did the trick.

That is expected. Compute nodes run slurmd, controller runs slurmctld and
login nodes run no daemons.

/Peter

> Appreciated thanks!
> Carlos
...


> On 11/18/2011 07:22 PM, Moe Jette wrote:

...

signature.asc

Carlos Aguado Sanchez

unread,
Nov 22, 2011, 7:12:43 PM11/22/11
to slur...@lists.llnl.gov, Peter Kjellström
Moe, Peter,


Thanks for the clarification. I somehow got confused in the trial an
error. It's all right now.


Regards,
Carlos

Reply all
Reply to author
Forward
0 new messages