[slurm-dev] Installing SLURM locally on Ubuntu 16.04

601 views
Skip to first unread message

Will L

unread,
Oct 29, 2017, 10:14:19 PM10/29/17
to slurm-dev
To slurm-dev,


I am trying to install SLURM 15.08.7 locally on an Ubuntu 16.04 machine. In my case, the master and worker nodes are the same. All I want is to test out some tiny examples locally on my desktop computer. I do not actually have a cluster.

I tried following the guide at https://github.com/superphy/semantic/wiki/SLURM-install-guide, and I am stuck at `sudo /etc/init.d/slurmd start`. The log shows
[2017-10-29T21:57:31.074] slurmctld version 15.08.7 started on cluster cluster              

[2017-10-29T21:57:31.075] layouts: no layout to initialize                                  
[2017-10-29T21:57:31.075] fatal: Frontend not configured correctly in slurm.conf.  See man slurm.conf look for frontendname.   

`slurmd -C` shows:

ClusterName=(null) NodeName=Haggunenon CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=16004 TmpDisk=223645
UpTime=0-00:29:17

I also submitted posts to Stack Overflow:
And here is my /etc/slurm-llnl/slurm.conf. I generated it from /usr/share/doc/slurmctld/slurm-wlm-configurator.html, supplying the information from `slurmd -C`. I also changed the user name to wlandau (my own user name) and set ControlMachine and NodeName to Haggunenon (the hostname).

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=Haggunenon
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=wlandau
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
NodeName=Haggunenon CPUs=4 RealMemory=16004 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=DEFAULT Nodes=Haggunenon Default=YES MaxTime=INFINITE State=UP

Will

Benjamin Redling

unread,
Nov 5, 2017, 2:17:54 PM11/5/17
to slurm-dev

Hi Will,

looking at your stackoverflow postings there doesn't seem to be anything
helpful. Did you solve your problem in the meantime?

Am 30.10.2017 um 03:12 schrieb Will L:
> I am trying to install SLURM 15.08.7 locally on an Ubuntu 16.04 machine.
> In my case, the master and worker nodes are the same.
[...]

Have you tried starting both slurmctld and slurmd in the foreground (-D)?
When I have real trouble with a cluster I open two terminals
side-by-side, set debugging in the slurm.conf to something reasonable
high. Then I start...
... one with: slurmctld -D -f <path_to_config>
... another with: slurmd -D -f <path_to_config>

(I only remember one case where that wasn't helpful: a seemingly random
"user unknown" file access problem)

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323

Will L

unread,
Nov 8, 2017, 8:40:23 AM11/8/17
to slurm-users
Benjamin,


Thanks for following up. I just tried again as you said, with the following result.

$ sudo slurmctld -D -f /etc/slurm-llnl/slurm.conf
slurmctld: slurmctld version 17.02.9 started on cluster cluster
slurmctld: error: Couldn't find the specified plugin name for crypto/munge looking at all files
slurmctld: error: cannot find crypto plugin for crypto/munge
slurmctld: error: cannot create crypto context for crypto/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted

Will

Manuel Rodríguez Pascual

unread,
Nov 8, 2017, 8:47:20 AM11/8/17
to Slurm User Community List, slurm-users
it looks like munge is not correctly configured, or you have some kind of permission problems. This manual explains how to configure and test it. https://github.com/dun/munge/wiki/Installation-Guide

good luck!

Douglas Jacobsen

unread,
Nov 8, 2017, 9:02:27 AM11/8/17
to Slurm User Community List, slurm-users
Also please make sure you have the slurm-munge package installed (at least for the RPMs this is the name of the package, I'm unsure if that packaging layout was conserved for Debian)

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


Gennaro Oliva

unread,
Nov 8, 2017, 10:10:09 AM11/8/17
to Slurm User Community List, slurm-users
Hi Will,

On Wed, Nov 08, 2017 at 01:38:18PM +0000, Will L wrote:
> $ sudo slurmctld -D -f /etc/slurm-llnl/slurm.conf
> slurmctld: slurmctld version 17.02.9 started on cluster cluster
> slurmctld: error: Couldn't find the specified plugin name for crypto/munge
> looking at all files
> slurmctld: error: cannot find crypto plugin for crypto/munge
> slurmctld: error: cannot create crypto context for crypto/munge
> slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not
> permitted

can you please send your current slurm.conf please?
Best regards
--
Gennaro Oliva

Benjamin Redling

unread,
Nov 8, 2017, 11:34:56 AM11/8/17
to slurm...@lists.schedmd.com
On 11/8/17 3:01 PM, Douglas Jacobsen wrote:
> Also please make sure you have the slurm-munge package installed (at
> least for the RPMs this is the name of the package, I'm unsure if that
> packaging layout was conserved for Debian)
nope, it's just "munge"

Douglas Jacobsen

unread,
Nov 8, 2017, 12:49:54 PM11/8/17
to Slurm User Community List
Hi,

Sorry, to clarify, when the RPM spec file is used, it separates out the slurm/crypto_munge.so slurm plugin into the slurm-munge RPM.  I wasn't sure if a debian package preparation did similar.  To me, the log output indicates that slurm/crypto_munge.so does not exist.  If you are using a ./configure && make && make install method instead of a package manager method, then perhaps ./configure did not pick up the munge development libraries.  Perhaps you need munge-dev?

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________



Will L

unread,
Nov 8, 2017, 10:03:32 PM11/8/17
to Slurm User Community List
Thanks for the suggestions. Munge seems to be working just fine. At one point I tried to build SLURM from the source, but when I could not make it work, I `sudo make uninstall`ed it and opted for the pre-built apt version all over again. Maybe that made a mess. What should I do to make SLURM notice munge and other utilities?

Also, here is my current slurm.conf.

 ControlMachine=Haggunenon
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
MailProg=/usr/bin/mail
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=wlandau
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
AccountingStorageType=accounting_storage/none
AccountingStoreJobComment=YES
ClusterName=cluster
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
NodeName=Haggunenon CPUs=1 State=UNKNOWN
PartitionName=partition Nodes=Haggunenon Default=YES MaxTime=INFINITE State=UP

Gennaro Oliva

unread,
Nov 9, 2017, 2:44:47 AM11/9/17
to Slurm User Community List
Hi Will,

On Wed, Nov 08, 2017 at 10:01:31PM -0500, Will L wrote:
> SlurmUser=wlandau

you need change this to:

SlurmUser=slurm

Raymond Wan

unread,
Nov 9, 2017, 3:05:20 AM11/9/17
to Slurm User Community List
Hi Will,


On Thu, Nov 9, 2017 at 11:01 AM, Will L <will....@gmail.com> wrote:
>
> Thanks for the suggestions. Munge seems to be working just fine. At one point I tried to build SLURM from the source, but when I could not make it work, I `sudo make uninstall`ed it and opted for the pre-built apt version all over again. Maybe that made a mess. What should I do to make SLURM notice munge and other utilities?


Yes, that wasn't probably a good idea. I've had SLURM working on a
single computer since Ubuntu 15.04 or 15.10 using the packages without
a lot of problems. I haven't had to turn to installing from source
[yet]...

During the setup of munge, you ran commands such as this:

sudo create-munge-key -f -r
sudo systemctl enable munge
sudo systemctl start munge

(I guess the third line doesn't matter if you reboot.)

So, after you rebooted, did you see /usr/sbin/munged running and owned
by the munge user?


> Also, here is my current slurm.conf.


One issue I had with the SLURM packages for Ubuntu (especially 1-2
years ago) was that the configurator at
/usr/share/doc/slurmctld/slurm-wlm-configurator.html did *not* match
the version I was installing. So I actually ended up using a
web-based configurator.

I'm not sure if that's a big problem...

Another problem with the "older" [*] SLURM packages for Ubuntu is that
many directories are not created during the installation process. So,
in your configuration file, make sure all of the directories
/var/run/... /var/log/... have all been created and accessible by the
slurm user, at least. First ensure that the log directories are
created...once they are, watch the log files when you do:

sudo service slurmctld start
sudo service slurmd start

and it'll tell you what directories are missing. Actually, once you
get to the point where there are log files being generated, you're not
only close, but posting the error message might help us help you
better.

This is what comes to mind; I hope this helps!

Ray

[*] I'm currently on Ubuntu 17.10 and the SLURM packages for that
version. Ubuntu 16.04 is fine, but I haven't kept track of what has
changed / improved in terms of the SLURM packages...

Will L

unread,
Nov 11, 2017, 7:15:02 AM11/11/17
to Slurm User Community List
I am trying what you said, but I am having new and earlier problems. For example, now munge does not start.

$ sudo systemctl enable munge
Synchronizing state of munge.service with SysV init with /lib/systemd/systemd-sysv-install...
Executing /lib/systemd/systemd-sysv-install enable munge
Failed to execute operation: File exists

I really think I mangled my system because I tried so many different things. Is there a way to start fresh without reinstalling my OS?

Gennaro Oliva

unread,
Nov 12, 2017, 8:16:05 AM11/12/17
to Slurm User Community List
Hi Will,

On Sat, Nov 11, 2017 at 07:12:59AM -0500, Will L wrote:
> I really think I mangled my system because I tried so many different
> things. Is there a way to start fresh without reinstalling my OS?

try purging and reinstalling munge, test it and than go on with slurm.
Remember to save your slurm.conf and to change the SlurmUser to slurm.

sudo apt-get remove --purge munge
sudo apt-get install munge
munge -n | unmunge
sudo apt-get install slurm-wlm

Rajiv Nishtala

unread,
Nov 12, 2017, 8:47:51 AM11/12/17
to Slurm User Community List
it maybe that I’m missing context here - but in regards to munge, it makes much more sense to follow the munge follow instructions from the GitHub of munge.
Remember, the permissions for the key and the /var/ and /etc folder are important

Best wishes,
Rajiv

Raymond Wan

unread,
Nov 12, 2017, 9:33:52 AM11/12/17
to Slurm User Community List
Hi Rajiv,

On Sun, Nov 12, 2017 at 9:47 PM, Rajiv Nishtala <nishta...@gmail.com> wrote:
> it maybe that I’m missing context here - but in regards to munge, it makes much more sense to follow the munge follow instructions from the GitHub of munge.
> Remember, the permissions for the key and the /var/ and /etc folder are important
>
> Best wishes,
> Rajiv


Hmmmm, I actually don't agree since sometimes the distribution (i.e.,
Debian, Ubuntu, CentOS, etc.) has made some changes to a piece of
software to integrate it with the rest of the system. At least with
some programs (I think it was the case with slurm, but I can't quite
remember), following general information, information for another
distribution, or even information for an older OS version caused me to
bark up the wrong tree...

For munge, this would be the files in /usr/share/doc/munge/ and
whatever's on-line.

But you are right that the upstream instructions cannot be ignored and
should be consulted in addition to the distribution-relevant
documentation, in case the latter is wrong (which does happen).

Ray

Rajiv Nishtala

unread,
Nov 12, 2017, 10:02:16 AM11/12/17
to Slurm User Community List


> On 12-Nov-2017, at 3:33 PM, Raymond Wan <rwan...@gmail.com> wrote:
>
> Hi Rajiv,
>
> On Sun, Nov 12, 2017 at 9:47 PM, Rajiv Nishtala <nishta...@gmail.com> wrote:
>> it maybe that I’m missing context here - but in regards to munge, it makes much more sense to follow the munge follow instructions from the GitHub of munge.
>> Remember, the permissions for the key and the /var/ and /etc folder are important
>>
>> Best wishes,
>> Rajiv
>
>
> Hmmmm, I actually don't agree since sometimes the distribution (i.e.,
> Debian, Ubuntu, CentOS, etc.) has made some changes to a piece of
> software to integrate it with the rest of the system. At least with
> some programs (I think it was the case with slurm, but I can't quite
> remember), following general information, information for another
> distribution, or even information for an older OS version caused me to
> bark up the wrong tree…
I setup slurm recently on ubuntu 14.x and 16.x and both seemed to work fine with the instructions given. Like I said, I maybe missing information to provide concrete information I know.
>
> For munge, this would be the files in /usr/share/doc/munge/ and
> whatever's on-line.
>
> But you are right that the upstream instructions cannot be ignored and
> should be consulted in addition to the distribution-relevant
> documentation, in case the latter is wrong (which does happen).
I agree with that.
>
> Ray
>


Will L

unread,
Nov 12, 2017, 10:04:21 AM11/12/17
to Slurm User Community List
I just tried `sudo apt-get remove --purge munge`, etc., and munge itself seems to be working fine. But I still get `slurmctld: error: Couldn't find the specified plugin name for crypto/munge looking at all files`. Is there a way to get around munge altogether? I am just testing on my local machine, so I do not actually need any more special authentication.

Will L

unread,
Nov 12, 2017, 10:08:51 AM11/12/17
to Slurm User Community List
In general, is Debian more cooperative with HPC systems than Ubuntu? Because I may just kill my KDE Neon installation and trying Debian + KDE Plasma.

Raymond Wan

unread,
Nov 12, 2017, 10:22:12 AM11/12/17
to Slurm User Community List
Hi Will,


On Sun, Nov 12, 2017 at 11:07 PM, Will L <will....@gmail.com> wrote:
> In general, is Debian more cooperative with HPC systems than Ubuntu? Because
> I may just kill my KDE Neon installation and trying Debian + KDE Plasma.


By "HPC" systems, do you mean a server? Or a supercomputer??

As far as I know, no, it wouldn't be more cooperative. Both have
similar packages, of course. But Debian is updated less than Ubuntu
(assuming you're not using the LTS versions). If you're ok with that,
then you can give Debian a try. Especially if you're doing this on a
server that has to go into production use and won't be updated very
often.

As for the munge steps, after doing them, did you restart your system?
Is there a munge process running after your server comes back?

By the way, I was skimming over your old messages and I think
something has changed. Or perhaps I've lost some context...

You first said you were installing slurm 15.08.7 locally on an Ubuntu
16.04 machine. But you then copy and pasted some text that indicated
slurm 17.02.9 was started. Can you clarify what version of slurm and
Ubuntu you're using? I think 17.02.9 is source since even 17.10 is
"just" on 17.02.6.

Ray

Gennaro Oliva

unread,
Nov 12, 2017, 10:52:54 AM11/12/17
to Slurm User Community List
Hi Will,

On Sun, Nov 12, 2017 at 10:03:18AM -0500, Will L wrote:
> I just tried `sudo apt-get remove --purge munge`, etc., and munge itself

this should have uninstalled slurm-wlm also, did you reinstalled it with apt?

> seems to be working fine. But I still get `slurmctld: error: Couldn't find
> the specified plugin name for crypto/munge looking at all files`. Is there

if you didn't reinstall slurm with apt you may be using the slurmctld
executable from a failed source installation, and for some reason this
can't find the corresponding plugin directory.

I suggest to try to install the slurm-wlm package with:

apt-get install slurm-wlm

and copy the slurm.conf you posted previously under /etc/slurm-llnl
but remember to set the SlurmUser to slurm

> a way to get around munge altogether? I am just testing on my local
> machine, so I do not actually need any more special authentication.

It's not a problem to setup munge authentication with a single machine,
its a problem if you don't have all packages installed and
slurm-wlm-basic-plugins must be installed.
Regards
--
Gennaro Oliva

Benjamin Redling

unread,
Nov 13, 2017, 10:16:30 AM11/13/17
to slurm...@lists.schedmd.com
On 11/12/17 4:52 PM, Gennaro Oliva wrote:
> On Sun, Nov 12, 2017 at 10:03:18AM -0500, Will L wrote:

>> I just tried `sudo apt-get remove --purge munge`, etc., and munge itself

> this should have uninstalled slurm-wlm also, did you reinstalled it with apt?

>> seems to be working fine. But I still get `slurmctld: error: Couldn't find
>> the specified plugin name for crypto/munge looking at all files`. Is there

> if you didn't reinstall slurm with apt you may be using the slurmctld
> executable from a failed source installation, and for some reason this
> can't find the corresponding plugin directory.

> I suggest to try to install the slurm-wlm package with:

> apt-get install slurm-wlm
I would *currently* avoid the Debian (Stretch) packages like the plague:
the last update tried to (re)start slurmctld which -- surprise, surprise
-- fails on every node that's not the master with an exit code that
leaves the packages unconfigured.

That raises the question if anyone did bother to test them on a
multi-node cluster

I'm still hoping I messed that up and not the maintainer. Maybe I expect
to much from an "apt upgrade" nowadays...

E V

unread,
Nov 13, 2017, 1:37:19 PM11/13/17
to Slurm User Community List
For debian stretch, the slurm daemon have been broken out into different packages. So the nodes only need to install the slurmd package. slurm-wlm installs everything and thus is only needed on the controller nodes. Or skip slurm-wlm all together and just install slurmctld on the controller nodes and slurmd on the compute nodes.
Reply all
Reply to author
Forward
0 new messages