[slurm-users] Configuration issue on Ubuntu

2,418 views
Skip to first unread message

Umut Arus

unread,
Aug 28, 2018, 9:45:02 AM8/28/18
to slurm...@lists.schedmd.com
Hi,

I'm trying to install and configure slurm-wlm 17.11.2 package. Firstly I wanted to configure as a single host. munge, slurmd and slurmctld was installed. munge slurmd can up and run properly but I couldnt up and run slurmctld!

It seems the main problem is; slurmctld: fatal: No front end nodes defined


Some outputs are on the below:
------------
slurmctld: debug3: Success.
slurmctld: debug3: not enforcing associations and no list was given so we are giving a blank list
slurmctld: debug2: No Assoc usage file (/var/lib/slurm-llnl/slurmctld/assoc_usage) to recover
slurmctld: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
slurmctld: debug3: layouts: layouts_init()...
slurmctld: layouts: no layout to initialize
slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/topology_none.so
slurmctld: topology NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug:  No DownNodes
slurmctld: fatal: No front end nodes defined

root@umuta:/etc/slurm-llnl# srun -N1 /bin/hostname
srun: error: Unable to allocate resources: Unable to contact slurm controller (connect failure)
root@umuta:/etc/slurm-llnl# slurmd -C
NodeName=umuta CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=2 RealMemory=7880
UpTime=69-02:45:50
root@umuta:/etc/slurm-llnl#
root@umuta:/etc/slurm-llnl#
root@umuta:/etc/slurm-llnl#
root@umuta:/etc/slurm-llnl# systemctl restart slurmctld
root@umuta:/etc/slurm-llnl# systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2018-08-28 16:22:01 +03; 5s ago
     Docs: man:slurmctld(8)
  Process: 30779 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 30793 (code=exited, status=1/FAILURE)

Ağu 28 16:22:01 umuta systemd[1]: Starting Slurm controller daemon...
Ağu 28 16:22:01 umuta systemd[1]: slurmctld.service: New main PID 28172 does not exist or is a zombie.
Ağu 28 16:22:01 umuta systemd[1]: Started Slurm controller daemon.
Ağu 28 16:22:01 umuta systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Ağu 28 16:22:01 umuta systemd[1]: slurmctld.service: Failed with result 'exit-code'.
root@umuta:/etc/slurm-llnl#


Configured with configurator:
root@umuta:/etc/slurm-llnl# cat slurm.conf
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=umuta
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
#
# COMPUTE NODES
NodeName=umuta CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=umuta Default=YES MaxTime=INFINITE State=UP

thanks.


--
Umut A.

Raymond Wan

unread,
Aug 28, 2018, 11:27:39 AM8/28/18
to Slurm User Community List, Umut Arus

Hi,


On Tuesday, August 28, 2018 09:43 PM, Umut Arus wrote:
> # COMPUTE NODES
> NodeName=umuta CPUs=1 State=UNKNOWN


I'm not sure what's the cause of your problem, but one thing
I noticed is that the line above should be replaced with the
output of the first line of "slurmd -C".

The man pages of slurmd says:

-----
-C Print actual hardware configuration and exit.
The format of output is the same as used in slurm.conf to
describe a node's configuration plus it's uptime.
-----

Oh! I presume you have a value for ControlMachine in your
slurm.conf file? I think that is your "front end node".

Ray



Umut Arus

unread,
Aug 28, 2018, 4:34:42 PM8/28/18
to rwan...@gmail.com, slurm...@lists.schedmd.com
Thanks for your reply. Well I'll change NodeName info as output of slurmd -C. 

Yes, both Compute Node and ControlMachine are same machine for this first test setup. Should any other config parameter need in config file?

thanks...
--
Umut A.

Chris Samuel

unread,
Aug 28, 2018, 5:05:05 PM8/28/18
to slurm...@lists.schedmd.com
On Tuesday, 28 August 2018 11:43:54 PM AEST Umut Arus wrote:

> It seems the main problem is; slurmctld: fatal: No front end nodes defined

Frontend nodes are for IBM BlueGene and Cray systems where you cannot run
slurmd on the compute nodes themselves so a proxy system must be used instead
(at $JOB-1 we used this on our BG/Q system). I strongly suspect you are not
running on either of those!

https://slurm.schedmd.com/slurm.conf.html

# These options may only be used on systems configured and built with the
# appropriate parameters (--have-front-end, --enable-bluegene-emulation)
# or a system determined to have the appropriate architecture by the
# configure script (BlueGene or Cray systems).

If you built Slurm yourself you'll need to check you didn't use those
arguments by mistake or configure didn't enable them in error, and if this is
an Ubuntu package then it's probably an bug in how they packaged it!

Best of luck,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC




Umut Arus

unread,
Aug 29, 2018, 6:25:10 AM8/29/18
to slurm...@lists.schedmd.com
Thank you Chris. After your suggestion I compiled latest stable version on a CentOS. And installed Munge packages firstly from Centos repository. Now I'm getting the below error.

slurmd is succesfully working on same machine.

./slurmctld -Dcvvvv
slurmctld: debug:  Log file re-opened
slurmctld: error: Unable to open pidfile `/var/run/slurm-llnl/slurmctld.pid': No such file or directory
slurmctld: slurmctld version 17.11.9-2 started on cluster cluster
slurmctld: debug3: Trying to load plugin /root/sl/sl2/lib/slurm/crypto_munge.so
slurmctld: error: Couldn't find the specified plugin name for crypto/munge looking at all files

slurmctld: debug3: accept_path_paranoia: stat(/root/sl/sl2/lib/slurm) failed
slurmctld: error: cannot find crypto plugin for crypto/munge
slurmctld: error: cannot create crypto context for crypto/munge

slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Permission denied

What is you suggestion for the "Permission denied" and munge errors?
--
Umut A.

Chris Samuel

unread,
Aug 29, 2018, 6:36:39 AM8/29/18
to slurm...@lists.schedmd.com
On Wednesday, 29 August 2018 8:23:43 PM AEST Umut Arus wrote:

> Thank you Chris. After your suggestion I compiled latest stable version on a
> CentOS. And installed Munge packages firstly from Centos repository. Now
> I'm getting the below error.
[...]
> slurmctld: debug3: Trying to load plugin /root/sl/sl2/lib/slurm/crypto_munge.so

To me that looks like you managed to compile Slurm against a
version of Munge installed under root's home directory.

This is unlikely to be what you want.

If you build Slurm as a non-root user then it won't find that.

All the best,

Gennaro Oliva

unread,
Sep 5, 2018, 3:49:38 AM9/5/18
to Slurm User Community List
Hi Chris,

On Wed, Aug 29, 2018 at 07:04:27AM +1000, Chris Samuel wrote:
> On Tuesday, 28 August 2018 11:43:54 PM AEST Umut Arus wrote:
>
> > It seems the main problem is; slurmctld: fatal: No front end nodes defined
>
> Frontend nodes are for IBM BlueGene and Cray systems where you cannot run
> slurmd on the compute nodes themselves so a proxy system must be used instead
> (at $JOB-1 we used this on our BG/Q system). I strongly suspect you are not
> running on either of those!

The option --enable-front-end to configure is also needed to emulate
really large cluster:

https://slurm.schedmd.com/faq.html#multi_slurmd

> If you built Slurm yourself you'll need to check you didn't use those
> arguments by mistake or configure didn't enable them in error, and if this is
> an Ubuntu package then it's probably an bug in how they packaged it!

This option is enabled only in the slurmctld daemon that is contained in
the slurm-wlm-emulator package that is not intended to be used for batch
jobs.

vagrant@ubuntu-bionic:~$ grep 'No front end nodes defined' /usr/sbin/slurmctld-wlm-emulator
Binary file /usr/sbin/slurmctld-wlm-emulator matches
vagrant@ubuntu-bionic:~$ grep 'No front end nodes defined' /usr/sbin/slurmctld-wlm
vagrant@ubuntu-bionic:~$

It can be possible that Umut installed slurm-wlm-emulator package
together with the regular package and the emulated daemon was picked by
the alternatives system.

Best regards,
--
Gennaro Oliva

John Hearns

unread,
Sep 5, 2018, 4:33:13 AM9/5/18
to Slurm User Community List
Following on from what Chris Samuel says
/root/sl/sl2  kinda suggest Scientific Linux to me (SL - Redhat alike distribution used by Fermilab and CERN)
Or it could just be sl = slurm

I would run  ldd `which slurctld` and let us know what libraries is it linked to



Chris Samuel

unread,
Sep 5, 2018, 7:06:51 AM9/5/18
to slurm...@lists.schedmd.com
On Wednesday, 5 September 2018 5:48:25 PM AEST Gennaro Oliva wrote:

> It can be possible that Umut installed slurm-wlm-emulator package
> together with the regular package and the emulated daemon was picked by
> the alternatives system.

That sounds eminently possible, that's a great catch Gennaro!

Ah, just noticed you're the Debian package maintainer for Slurm. :-)

All the best,
Reply all
Reply to author
Forward
0 new messages