[slurm-users] How to launch slurm services after installation

907 views
Skip to first unread message

刘 博涵

unread,
Nov 27, 2022, 10:34:50 PM11/27/22
to slurm...@lists.schedmd.com
Hi all,

I'm a newcomer to cluster computing and have been trying to setup a Slurm cluster myself. Right now I'm stuck at starting up Slurm's systemd services. I checked out the following tutorials:
  1. Slurm Workload Manager - Quick Start Administrator Guide (schedmd.com)
  2. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
  3. https://wiki.bkslab.org/index.php/Slurm_Installation_Guide
  4. Slurm installation (southgreenplatform.github.io)
All of them state that I should run systemctl enable/start slurmd/slurmdbd/slurmctld after installation, however they always fail because the corresponding systemd config files do not exist, regardless of whether I installed Slurm from source or from EPEL repos. All my systems are CentOS 7.9 with the latest updates prior to Slurm installation, and I was trying to install Slurm 22.05.6 from source. My question is are the systemd config files actually created during installation process as the tutorials imply, or do I have to write them myself? If the latter, then how should I write my slurm systemd config file (what parameters should I put in etc.), any templates I can follow?

Many thanks,

Steve

Brian Andrus

unread,
Nov 27, 2022, 10:59:00 PM11/27/22
to slurm...@lists.schedmd.com

Steve,


I suspect you did not install the packages.


You need to install slurm-slurmctld to get the slurmctld systemd files:

# rpm -qlp slurm-slurmctld-20.11.9-1.el7.x86_64.rpm
/run/slurm/slurmctld.pid
/usr/lib/systemd/system/slurmctld.service
/usr/sbin/slurmctld
/usr/share/man/man8/slurmctld.8.gz


The same for slurm-slurmdbd. Both of those are management daemons and should only be running on one (two if you configure failover) systems.

Your compute nodes need slurm-slurmd, which will provide the systemd files for slurmd.

Kamil Wilczek

unread,
Nov 28, 2022, 3:50:09 AM11/28/22
to Slurm User Community List, Brian Andrus
Hello,

all supported build flags are available with "./configure --help"
command. On of them is "--with-systemdsystemunitdir=DIR", which
will allow you to specify the directory for the systemd service
files for all Slurm daemons. The most important of the flags is imho
the "--prefix", which sets the installation directory.

I'll describe my build setup shortly (sorry for the length),
it might be helpful to someone who is just starting -- I remember when
I was trying to setup this for the first time, it was hell ;)

Of course there are multiple approaches and this is the only one of
them. And mine is probably not too well designed and optimal ;),
but after several years of using Ubuntu's builds I tried this
approach and it works quite well.

I you are building from source, consider using Ansible or any other
automation tool; the whole process becomes much easier and easily
repeatable, I highly recommend this. Otherwise it is a world of pain ;)
and prone to errors to make all the changes manually.

Try setting the "--prefix" to /opt/slurm_version_build_version".
This way you can try to build with different options many times and it
will be easy to test and delete old/bad versions. When you decide that
the outcome is what you want, you can set the "production" prefix to
"/opt/slurm_version". I think it is also the common approach advised in
the official docs:
https://slurm.schedmd.com/quickstart_admin.html#upgrade

This way you will have a separate binaries for each of the Slurm version
and in case of problems with the new build you can always return to the
previous one by symlinking the currently used version to "/opt/slurm".
Slurm is designed to not introduce breaking changes between at least one
major version if I remember correctly, so changing between versions
should work without problems.
I also set separate state and log directories.

# ls -l /opt

...
root root 18 Nov 3 11:07 slurm -> /opt/slurm_22.05.5
root root 94 Aug 12 11:31 slurm_22.05.2
root root 94 Nov 3 11:05 slurm_22.05.5
slurm slurm 39 Aug 12 11:36 slurm_log_dir
slurm slurm 24 Aug 12 11:36 slurm_state_dir
...

All the systemd's service file should use "/opt/slurm/..." paths
in this case. And each build should have separate config files.
This is a bit complicated at first and requires solving several
management problems, but after some time I think it allows for easier
upgrades.

Kind regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[6C4BE20A90A1DBFB3CBE2947A832BF5A491F9F2A]

W dniu 28.11.2022 o 04:58, Brian Andrus pisze:
> Steve,
>
>
> I suspect you did not install the packages.
>
>
> You need to install slurm-slurmctld to get the slurmctld systemd files:
>
> /# rpm -qlp slurm-slurmctld-20.11.9-1.el7.x86_64.rpm//
> ///run/slurm/slurmctld.pid//
> /*//usr/lib/systemd/system/slurmctld.service/*/
> ///usr/sbin/slurmctld//
> ///usr/share/man/man8/slurmctld.8.gz//
> /
>
>
> The same for slurm-slurmdbd. Both of those are management daemons and
> should only be running on one (two if you configure failover) systems.
>
> Your compute nodes need slurm-slurmd, which will provide the systemd
> files for slurmd.
>
>
>
> On 11/27/2022 7:34 PM, 刘 博涵 wrote:
>> Hi all,
>>
>> I'm a newcomer to cluster computing and have been trying to setup a
>> Slurm cluster myself. Right now I'm stuck at starting up Slurm's
>> systemd services. I checked out the following tutorials:
>>
>> 1. Slurm Workload Manager - Quick Start Administrator Guide
>> (schedmd.com) <https://slurm.schedmd.com/quickstart_admin.html>
>> 2. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
>> 3. https://wiki.bkslab.org/index.php/Slurm_Installation_Guide
>> 4. Slurm installation (southgreenplatform.github.io)
>> <https://southgreenplatform.github.io/trainings/hpc/slurminstallation/>
>>
>> All of them state that I should run /systemctl enable/start
>> slurmd/slurmdbd/slurmctld/ after installation, however they always
OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages