[slurm-users] slurmctld failed to start

17 views
Skip to first unread message

Dhumal, Dr. Nilesh via slurm-users

unread,
Jul 31, 2025, 10:28:40 PMJul 31
to slurm...@lists.schedmd.com
Hello, 
We recently installed slurm-25 on Redhat linux. 
We failed to start the slurmctld service. 
sudo systemctl start slurmctld
Job for slurmctld.service failed because the control process exited with error code.
See "systemctl status slurmctld.service" and "journalctl -xeu slurmctld.service" for details.

sudo systemctl status slurmctld
× slurmctld.service - Slurm controller daemon
     Loaded: loaded (/usr/local/lib/systemd/system/slurmctld.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2025-07-31 22:23:18 EDT; 1min 3s ago
    Process: 44317 ExecStart=/usr/local/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
   Main PID: 44317 (code=exited, status=1/FAILURE)
        CPU: 35ms

Jul 31 22:22:29 fgcu-compute01 systemd[1]: Starting Slurm controller daemon...
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: If munged is up, restart with --num-threads=10
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Munge encode failed: Failed to connect to "/run/munge/mung>
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Failed to create MUNGE Credential
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Couldn't load specified plugin name for auth/munge: Plugin>
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: cannot create auth context for auth/munge
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] fatal: failed to initialize auth plugin
Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Failed with result 'exit-code'.
Jul 31 22:23:18 fgcu-compute01 systemd[1]: Failed to start Slurm controller daemon.

Here is munge service status.
munge.service - MUNGE authentication service
     Loaded: loaded (/usr/local/lib/systemd/system/munge.service; enabled; preset: disabled)
     Active: active (running) since Thu 2025-07-31 22:06:14 EDT; 19min ago
       Docs: man:munged(8)
   Main PID: 44039 (munged)
      Tasks: 4 (limit: 606218)
     Memory: 1.4M
        CPU: 18ms
     CGroup: /system.slice/munge.service
             └─44039 /usr/local/sbin/munged

Jul 31 22:06:14 fgcu-compute01 systemd[1]: Starting MUNGE authentication service...
Jul 31 22:06:14 fgcu-compute01 systemd[1]: Started MUNGE authentication service.

Any suggestion is apprecieted to resolve this issue. 

Thanks,


Nilesh Dhumal  

Associate Professor of Chemistry,

 http://faculty.fgcu.edu/ndhumal/

Coordinator, FGCU Computational Facility, 

https://www.fgcu.edu/cas/facultyresources/computationalfacility/
SH-430; Department of Chemistry and Physics
Florida Gulf Coast University
10501 FGCU Boulevard South
Fort Myers, FL 33965-6565
Phone: (239) 745-4394
Email: ndh...@fgcu.edu 


Ole Holm Nielsen via slurm-users

unread,
Aug 1, 2025, 1:57:05 AMAug 1
to slurm...@lists.schedmd.com
Hi Nilesh,

It seems that your Munge setup isn't working. Maybe the munge.key file
isn't shared on all nodes?

I recommend you to take a look at this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
to get a complete overview of the tasks involved in setting up a Slurm
cluster.

IHTH,
Ole

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Dhumal, Dr. Nilesh via slurm-users

unread,
Aug 1, 2025, 3:37:32 AMAug 1
to Ole Holm Nielsen, slurm...@lists.schedmd.com
Thanks.  I was testing on master node without connecting to computer nodes.  


From: Ole Holm Nielsen via slurm-users <slurm...@lists.schedmd.com>
Sent: Friday, August 1, 2025 1:54:42 AM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: slurmctld failed to start
 
   External Email: Do not click links or attachments unless you recognize the sender and know the content is safe.


Hi Nilesh,

It seems that your Munge setup isn't working.  Maybe the munge.key file
isn't shared on all nodes?

I recommend you to take a look at this Wiki page:

Ole Holm Nielsen via slurm-users

unread,
Aug 1, 2025, 4:07:00 AMAug 1
to slurm...@lists.schedmd.com
On 8/1/25 09:34, Dhumal, Dr. Nilesh wrote:
> Thanks.  I was testing on master node without connecting to computer nodes.

So you need to test your Munge setup, see
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#munge-configuration-and-testing

/Ole

> *From:* Ole Holm Nielsen via slurm-users <slurm...@lists.schedmd.com>
> *Sent:* Friday, August 1, 2025 1:54:42 AM
> *To:* slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
> *Subject:* [slurm-users] Re: slurmctld failed to start
>    External Email: Do not click links or attachments unless you recognize
> the sender and know the content is safe.
>
> Hi Nilesh,
>
> It seems that your Munge setup isn't working.  Maybe the munge.key file
> isn't shared on all nodes?
>
> I recommend you to take a look at this Wiki page:
> https://nam04.safelinks.protection.outlook.com/?
> url=https%3A%2F%2Fwiki.fysik.dtu.dk%2FNiflheim_system%2FSlurm_installation%2F&data=05%7C02%7Cndhumal%40fgcu.edu%7C0601ca47bf07420ac1d108ddd0c03310%7Cf7a5a4ef4ffa4c80bfb3c12e28872099%7C0%7C0%7C638896246124312890%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SeuoAq5%2FxpoTpzKOb%2FhViNFqQGksOSVvWpm8lFgtfzE%3D&reserved=0 <https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/>
Reply all
Reply to author
Forward
0 new messages