[slurm-users] Slurm setup question

2,162 views
Skip to first unread message

Matt Hohmeister

unread,
Apr 11, 2018, 8:27:22 AM4/11/18
to slurm...@lists.schedmd.com

I’m brand-new to Slurm, and setting it up on a single RHEL 7.4 VM as a proof of concept before I deploy it. After following the instructions on https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/ (sorry, site not working now), I can get slurmd to start perfectly, but slurmctld fails to start with the following journalctl -xe; I was wondering if anyone has run into this or could shed some light on this…thanks in advance!

 

Apr 11 08:18:30 psy-slurm polkitd[680]: Registered Authentication Agent for unix-process:1779:31362 (system bus name :1.26 [/usr/bin/pkttyagent --notify-fd 5 --fallbac

Apr 11 08:18:30 psy-slurm systemd[1]: Starting Slurm controller daemon...

-- Subject: Unit slurmctld.service has begun start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit slurmctld.service has begun starting up.

Apr 11 08:18:30 psy-slurm systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.

Apr 11 08:18:30 psy-slurm systemd[1]: Started Slurm controller daemon.

-- Subject: Unit slurmctld.service has finished start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit slurmctld.service has finished starting up.

--

-- The start-up result is done.

Apr 11 08:18:30 psy-slurm polkitd[680]: Unregistered Authentication Agent for unix-process:1779:31362 (system bus name :1.26, object path /org/freedesktop/PolicyKit1/A

Apr 11 08:18:30 psy-slurm slurmctld[1787]: fatal: Incorrect permissions on state save loc: /var/spool

Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE

Apr 11 08:18:30 psy-slurm systemd[1]: Unit slurmctld.service entered failed state.

Apr 11 08:18:30 psy-slurm systemd[1]: slurmctld.service failed.

 

Matt Hohmeister

Systems and Network Administrator

Department of Psychology

Florida State University

PO Box 3064301

Tallahassee, FL 32306-4301

Phone: +1 850 645 1902

Fax: +1 850 644 7739

 

Ole Holm Nielsen

unread,
Apr 11, 2018, 8:44:46 AM4/11/18
to slurm...@lists.schedmd.com
Hi Matt,

You might want to take a look at my Slurm Wiki, which focuses on
CentOS/RHEL 7: https://wiki.fysik.dtu.dk/niflheim/SLURM. Complete
instructions for Slurm installation, configuration, etc. is in the Wiki.

/Ole

Douglas Jacobsen

unread,
Apr 11, 2018, 10:41:06 AM4/11/18
to Slurm User Community List
It looks like your slurm.conf is specifying /var/spool as your Save state directory, and `fatal: Incorrect permissions on state save loc: /var/spool` indicates that SlurmUser (another configuration in slurm.conf) does not have access to write to it.  It might be a good to make a directory dedicated for this purpose, e.g. /var/spool/slurm/<clustername>_state, and then make sure that the SlurmUser (usually either "slurm" or root, depending on your needs), can access that directory.

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


Matt Hohmeister

unread,
Apr 11, 2018, 11:23:20 AM4/11/18
to Slurm User Community List

Thanks; I just set StateSaveLocation=/var/spool/slurm.state, and that went away. Of course, another error popped up:

 

Apr 11 11:19:24 psy-slurm slurmctld[1772]: fatal: Invalid node names in partition slurm

 

Here’s the relevant section from slurm.conf; IP address changed to protect the innocent. This is a single-node cluster that I’m using just to make a working proof-of-concept.

 

# COMPUTE NODES

NodeName=psy-slurm NodeAddr=192.0.2.157

PartitionName=slurm Nodes= Default=YES MaxTime=INFINITE State=UP

 

Matt Hohmeister

Systems and Network Administrator

Department of Psychology

Florida State University

PO Box 3064301

Tallahassee, FL 32306-4301

Phone: +1 850 645 1902

Fax: +1 850 644 7739

 

Lachlan Musicman

unread,
Apr 11, 2018, 7:55:07 PM4/11/18
to Slurm User Community List
On 12 April 2018 at 01:22, Matt Hohmeister <hohme...@psy.fsu.edu> wrote:

Thanks; I just set StateSaveLocation=/var/spool/slurm.state, and that went away. Of course, another error popped up:

 

Apr 11 11:19:24 psy-slurm slurmctld[1772]: fatal: Invalid node names in partition slurm

 

Here’s the relevant section from slurm.conf; IP address changed to protect the innocent. This is a single-node cluster that I’m using just to make a working proof-of-concept.

 

# COMPUTE NODES

NodeName=psy-slurm NodeAddr=192.0.2.157

PartitionName=slurm Nodes= Default=YES MaxTime=INFINITE State=UP




The error message says it all.

Change


PartitionName=slurm Nodes= Default=YES MaxTime=INFINITE State=UP

to

PartitionName=slurm Nodes=psy-slurm Default=YES MaxTime=INFINITE State=UP

Note that

NodeName describes the nodes.
PartitionName describes the partitions and lists Node that are in it.

Cheers
L.

Reply all
Reply to author
Forward
0 new messages