[slurm-users] cannot find auth plugin for auth/munge

8,599 views
Skip to first unread message

~Stack~

unread,
Jun 15, 2018, 6:37:56 PM6/15/18
to slurm...@lists.schedmd.com
Greetings,

I've got Slurm 17.11.7 running on a Scientific Linux 6. Things are
working great.

I have a Scientific Linux 7 system that I just want to be able to run
sinfo/squeue/sacct on. I installed 17.11.7 from the OpenHPC repo (it's
what we have running on the other SL7 cluster).

The munge.key and the slurm.conf file are the exact same as on the rest
of the system.

I can communicate easily between the slurmd/slurmdbd cluster host and
the new system. I can run 'munge -n | ssh <host> unmunge' in both
directions and have it all work. The munge service is running and just
to be sure I've restarted it several times.

I've disabled firewalls and SELinux is in permissive mode (just in case,
it will go back on after I figure it out).

When I run sinfo I get the following output:

sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: authentication: authentication initialization failure
slurm_load_partitions: Protocol authentication error


When I look at the slurmctld.log file I see this error:
error: slurm_receive_msg [IP:49916]: Zero Bytes were transmitted or received

I do have the slurm-munge package installed on the client (as well as
most all of the other slurm packages too).

I suspect it is something with the OHPC rpms, but I'm not sure.

Any thoughts on how to fix or should I just rebuild from source?

Thanks!
~Stack~

signature.asc

~Stack~

unread,
Jun 20, 2018, 12:13:40 PM6/20/18
to slurm...@lists.schedmd.com
Greetings,
An update. I was unable to get any further. I removed all of the OHPC
packages and built on the EL7 system from source the exact same version
of SLURM 17.11.7 as I have on the EL6 cluster. I end up with the EXACT
same error. The compile was done with `rpmbuild -ta`.

Here is what I find weird. In the compile history I SEE where it finds
the munge-devel package. However, it doesn't generate a slurm-munge rpm
like it does on the EL6 system. I think the docs are wrong that this is
generated for EL7 systems.

I did some poking around and found that indeed, I do have
/usr/lib64/slurm/auth_munge.so. Pass it into yum provides and I see it
is in the slurm-17.11.7-1.el7.x86_64.rpm package!

So I have it installed. Why then do I still have the error?
sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: Couldn't find the specified plugin name for auth/munge
looking at all files
sinfo: error: cannot find auth plugin for auth/munge
sinfo: error: cannot create auth context for auth/munge
sinfo: error: authentication: authentication initialization failure
slurm_load_partitions: Protocol authentication error


Any thoughts?
Thanks!
~Stack~



signature.asc

~Stack~

unread,
Jun 20, 2018, 1:19:40 PM6/20/18
to slurm...@lists.schedmd.com
Greetings,
I found the issue. It seems that the EL7 rpms do not properly configure
the auth_munge.so location. Here is the work around for this problem.

ln -s /usr/lib64/slurm/auth_munge.so /usr/local/lib/slurm/auth_munge.so

Thanks!
~Stack~

signature.asc
Reply all
Reply to author
Forward
0 new messages