[slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

10,876 views
Skip to first unread message

Nousheen

unread,
Jan 27, 2022, 4:54:15 AM1/27/22
to Slurm User Community List

Hello everyone,

I am installing slurm on Centos 7 following tutorial: https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/

I am at the step where we start slurm but it gives me the following error:

[root@exxact slurm-21.08.5]# systemctl enable slurmd.service
Failed to execute operation: No such file or directory

I have run the command to check if slurm is configured properly

[root@exxact slurm-21.08.5]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
UpTime=19-16:06:00

I am new to this and unable to understand the problem. Kindly help me resolve this.

My slurm.conf file is as follows:

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=cluster194
SlurmctldHost=192.168.60.194
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=nousheen
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
#AccountingStoreFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=linux[1-32] CPUs=11 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP  


Best Regards,
Nousheen Parvaiz

Ole Holm Nielsen

unread,
Jan 27, 2022, 5:05:30 AM1/27/22
to slurm...@lists.schedmd.com
Maybe my Slurm Wiki pages will help you get started:
https://wiki.fysik.dtu.dk/niflheim/SLURM

Best regards,
Ole

Jeffrey R. Lang

unread,
Jan 27, 2022, 10:23:47 AM1/27/22
to Slurm User Community List

The missing file error has nothing to do with slurm.  The systemctl command is part of the systems service management.

 

The error message indicates that you haven’t copied the slurmd.service file on your compute node to /etc/systemd/system or /usr/lib/systemd/system.  /etc/systemd/system is usually used when a user adds a new service to a machine.

 

Depending on your version of Linux you may also need to do a systemctl daemon-reload to activate the slurmd.service within system.

 

Once slurmd.service is copied over, the systemctld command should work just fine.

 

Remember:

                slurmd.service     -  Only on compute nodes

                slurmctld.service – Only on your cluster management node

              slurmdbd.service – Only on your cluster management node

 

From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of Nousheen
Sent: Thursday, January 27, 2022 3:54 AM
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory

 

This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.

Nousheen

unread,
Jan 31, 2022, 12:09:46 AM1/31/22
to Slurm User Community List
Dear Jeffrey,

Thank you for your response. I have followed the steps as instructed. After the copying the files to their respective locations "systemctl status slurmctld.service" command gives me an error as follows: 

(base) [nousheen@exxact system]$ systemctl daemon-reload
(base) [nousheen@exxact system]$ systemctl enable slurmctld.service
(base) [nousheen@exxact system]$ systemctl start slurmctld.service
(base) [nousheen@exxact system]$ systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31 PKT; 3s ago
  Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 18114 (code=exited, status=1/FAILURE)

Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE
Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered failed state.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
 

Kindly guide me. Thank you so much for your time. 

Best Regards,
Nousheen Parvaiz
 

Nousheen

unread,
Jan 31, 2022, 12:24:47 AM1/31/22
to Slurm User Community List
The same error shows up on compute node which is as follows:

[root@c103008 ~]# systemctl enable slurmd.service
[root@c103008 ~]# systemctl start slurmd.service
[root@c103008 ~]# systemctl status slurmd.service
● slurmd.service - Slurm node daemon
   Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST; 2s ago
  Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=203/EXEC)
 Main PID: 11505 (code=exited, status=203/EXEC)

Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited, code=exited, status=203/EXEC
Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state.
Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.


Best Regards,
Nousheen Parvaiz


Hermann Schwärzler

unread,
Jan 31, 2022, 3:47:01 AM1/31/22
to slurm...@lists.schedmd.com
Dear Nousheen,

I guess there is something missing in your installation - proably your
slurm.conf?

Do you have logging enabled for slurmctld? If yes what do you see in
that log?
Or what do you get if you run slurmctld manually like this:

/usr/local/sbin/slurmctld -D

Regards,
Hermann
> ᐧ
>
> On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRL...@uwyo.edu
> <mailto:JRL...@uwyo.edu>> wrote:
>
> The missing file error has nothing to do with slurm.  The systemctl
> command is part of the systems service management.____
>
> __ __
>
> The error message indicates that you haven’t copied the
> slurmd.service file on your compute node to /etc/systemd/system or
> /usr/lib/systemd/system.  /etc/systemd/system is usually used when a
> user adds a new service to a machine.____
>
> __ __
>
> Depending on your version of Linux you may also need to do a
> systemctl daemon-reload to activate the slurmd.service within
> system.____
>
> __ __
>
> Once slurmd.service is copied over, the systemctld command should
> work just fine.____
>
> __ __
>
> Remember:____
>
>                 slurmd.service     -  Only on compute nodes____
>
>                 slurmctld.service – Only on your cluster management
> node____
>
>               slurmdbd.service – Only on your cluster management
> node____
>
> __ __
>
> *From:* slurm-users <slurm-use...@lists.schedmd.com
> <mailto:slurm-use...@lists.schedmd.com>> *On Behalf Of *Nousheen
> *Sent:* Thursday, January 27, 2022 3:54 AM
> *To:* Slurm User Community List <slurm...@lists.schedmd.com
> <mailto:slurm...@lists.schedmd.com>>
> *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
> execute operation: No such file or directory____
>
> __ __
>
> ◆ This message was sent from a non-UWYO address. Please exercise
> caution when clicking links or opening attachments from external
> sources.____
>
> __ __
>
> __ __
>
> Hello everyone,____
>
> __ __
>
> I am installing slurm on Centos 7 following tutorial:
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
> <https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
>
> __ __
>
> I am at the step where we start slurm but it gives me the following
> error:____
>
> __ __
>
> [root@exxact slurm-21.08.5]# systemctl enable slurmd.service____
>
> Failed to execute operation: No such file or directory____
>
> __ __
>
> I have run the command to check if slurm is configured properly____
>
> __ __
>
> [root@exxact slurm-21.08.5]# slurmd -C
> NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
> ThreadsPerCore=2 RealMemory=31889
> UpTime=19-16:06:00____
>
> __ __
>
> I am new to this and unable to understand the problem. Kindly help
> me resolve this.____
>
> __ __
>
> My slurm.conf file is as follows:____
>
> __ __
> NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>
> PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP ____
>
> __ __
>
>
> ____
>
> Best Regards,____
>
> Nousheen Parvaiz____
>
> ᐧ____
>

Ole Holm Nielsen

unread,
Jan 31, 2022, 4:04:42 AM1/31/22
to slurm...@lists.schedmd.com
Hi Nousheen,

I recommend you again to follow the steps for installing Slurm on a CentOS
7 cluster:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

Maybe you will need to start installation from scratch, but the steps are
guaranteed to work if followed correctly.

IHTH,
Ole

On 1/31/22 06:23, Nousheen wrote:
> The same error shows up on compute node which is as follows:
>
> [root@c103008 ~]# systemctl enable slurmd.service
> [root@c103008 ~]# systemctl start slurmd.service
> [root@c103008 ~]# systemctl status slurmd.service
> ● slurmd.service - Slurm node daemon
>    Loaded: loaded (/etc/systemd/system/slurmd.service; enabled; vendor
> preset: disabled)
>    Active: failed (Result: exit-code) since Mon 2022-01-31 00:22:42 EST;
> 2s ago
>   Process: 11505 ExecStart=/usr/local/sbin/slurmd -D -s $SLURMD_OPTIONS
> (code=exited, status=203/EXEC)
>  Main PID: 11505 (code=exited, status=203/EXEC)
>
> Jan 31 00:22:42 c103008 systemd[1]: Started Slurm node daemon.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service: main process exited,
> code=exited, status=203/EXEC
> Jan 31 00:22:42 c103008 systemd[1]: Unit slurmd.service entered failed state.
> Jan 31 00:22:42 c103008 systemd[1]: slurmd.service failed.
>
>
> Best Regards,
> Nousheen Parvaiz
>
>
> ᐧ
>
> On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <JRL...@uwyo.edu
> <mailto:JRL...@uwyo.edu>> wrote:
>
> The missing file error has nothing to do with slurm.  The
> systemctl command is part of the systems service management.____
>
> __ __
>
> The error message indicates that you haven’t copied the
> slurmd.service file on your compute node to /etc/systemd/system or
> /usr/lib/systemd/system.  /etc/systemd/system is usually used when
> a user adds a new service to a machine.____
>
> __ __
>
> Depending on your version of Linux you may also need to do a
> systemctl daemon-reload to activate the slurmd.service within
> system.____
>
> __ __
>
> Once slurmd.service is copied over, the systemctld command should
> work just fine.____
>
> __ __
>
> Remember:____
>
>                 slurmd.service     -  Only on compute nodes____
>
>                 slurmctld.service – Only on your cluster
> management node____
>
>               slurmdbd.service – Only on your cluster management
> node____
>
> __ __
>
> *From:* slurm-users <slurm-use...@lists.schedmd.com
> <mailto:slurm-use...@lists.schedmd.com>> *On Behalf Of
> *Nousheen
> *Sent:* Thursday, January 27, 2022 3:54 AM
> *To:* Slurm User Community List <slurm...@lists.schedmd.com
> <mailto:slurm...@lists.schedmd.com>>
> *Subject:* [slurm-users] systemctl enable slurmd.service Failed to
> execute operation: No such file or directory____
>
> __ __
>
> ◆ This message was sent from a non-UWYO address. Please exercise
> caution when clicking links or opening attachments from external
> sources.____
>
> __ __
>
> __ __
>
> Hello everyone,____
>
> __ __
>
> I am installing slurm on Centos 7 following tutorial:
> https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
> I am at the step where we start slurm but it gives me the
> following error:____
>
> __ __
>
> [root@exxact slurm-21.08.5]# systemctl enable slurmd.service____
>
> Failed to execute operation: No such file or directory____
>
> __ __
>
> I have run the command to check if slurm is configured properly____
>
> __ __
>
> [root@exxact slurm-21.08.5]# slurmd -C
> NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1
> CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
> UpTime=19-16:06:00____
>
> __ __
>
> I am new to this and unable to understand the problem. Kindly help
> me resolve this.____
>
> __ __
>
> My slurm.conf file is as follows:____
>
> __ __
> NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
>
> PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE

Nousheen

unread,
Jan 31, 2022, 10:20:58 PM1/31/22
to Slurm User Community List

Best Regards,
Nousheen Parvaiz
Ph.D. Scholar
National Center For Bioinformatics
Quaid-i-Azam University, Islamabad 


Dear Hermann,

Thank you for your reply. I have given below my slurm.conf and log file.

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=Nousheen1
SlurmctldHost=192.168.60.194
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log

#
#
# COMPUTE NODES

NodeName=Nousheen1 NodeAddr=192.168.60.194 CPUs=1 State=UNKNOWN
NodeName=Nousheen2 NodeAddr=192.168.60.104 CPUs=1 State=UNKNOWN


PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP


Slurmctld.log
[2022-01-31T11:42:26.293] error: chdir(/var/log): Permission denied
[2022-01-31T11:42:26.294] slurmctld version 21.08.5 started on cluster cluster194
[2022-01-31T11:42:26.294] error: Couldn't find the specified plugin name for cred/munge looking at all files
[2022-01-31T11:42:26.295] error: cannot find cred plugin for cred/munge
[2022-01-31T11:42:26.295] error: cannot create cred context for cred/munge
[2022-01-31T11:42:26.295] fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted

(base) [nousheen@exxact Documents]$ usr/local/sbin/slurmctld -D
bash: usr/local/sbin/slurmctld: No such file or directory


Best Regards,
Nousheen Parvaiz

Nousheen

unread,
Jan 31, 2022, 11:07:08 PM1/31/22
to Slurm User Community List
Dear Ole,

Thank you for your response.
I am doing it again using your suggested link.

Best Regards,
Nousheen Parvaiz


Nousheen

unread,
Jan 31, 2022, 11:57:54 PM1/31/22
to Slurm User Community List
Dear Ole and Hermann,

I have reinstalled slurm from scratch now following this link:

The error remains the same. Kindly guide me where will i find this cred/munge plugin. Please help me resolve this issue.

[root@exxact slurm]# slurmd -C

NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=31889
UpTime=0-22:06:45
[root@exxact slurm]# systemctl enable slurmctld.service
[root@exxact slurm]# systemctl start slurmctld.service
[root@exxact slurm]# systemctl status slurmctld.service

● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2022-02-01 09:46:20 PKT; 8s ago
  Process: 27530 ExecStart=/usr/local/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 27530 (code=exited, status=1/FAILURE)

Feb 01 09:46:20 exxact systemd[1]: Started Slurm controller daemon.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service: main process exited, ...RE
Feb 01 09:46:20 exxact systemd[1]: Unit slurmctld.service entered failed state.
Feb 01 09:46:20 exxact systemd[1]: slurmctld.service failed.


[root@exxact slurm]# /usr/local/sbin/slurmctld -D
slurmctld: slurmctld version 21.08.5 started on cluster cluster194
slurmctld: error: Couldn't find the specified plugin name for cred/munge looking at all files
slurmctld: error: cannot find cred plugin for cred/munge
slurmctld: error: cannot create cred context for cred/munge
slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not permitted



Best Regards,
Nousheen Parvaiz

Sean Crosby

unread,
Feb 1, 2022, 2:29:56 AM2/1/22
to Slurm User Community List
Did you build Slurm yourself from source? If so, when you build from source, on that node, you need to have the munge-devel package installed (munge-devel on EL systems, libmunge-dev on Debian)

You then need to set up munge with a shared munge key between the nodes, and have the munge daemon running.

This is all detailed on Ole's wiki which was linked previously - https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

Sean

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Nousheen <noushee...@gmail.com>
Sent: Tuesday, 1 February 2022 15:56
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [EXT] Re: [slurm-users] systemctl enable slurmd.service Failed to execute operation: No such file or directory
 
External email: Please exercise caution


Dear Ole and Hermann,
Reply all
Reply to author
Forward
0 new messages