[slurm-users] auth_munge.so: Incompatible Slurm plugin version (21.08.8)

402 views
Skip to first unread message

Julien Rey

unread,
Oct 4, 2023, 1:05:03 PM10/4/23
to slurm...@lists.schedmd.com
Hello,

I did an upgrade of Slurm this week (20.11 to 21.08.8) and while
everything seems to be working with srun and sbatch commands, here is
what I get when I try to launch jobs from drmaa library:


python: /usr/local/lib/slurm/auth_munge.so: Incompatible Slurm plugin
version (21.08.8)
python: error: Couldn't load specified plugin name for auth/munge:
Incompatible plugin version
python: error: cannot create auth context for auth/munge
python: error: slurm_send_node_msg: g_slurm_auth_create:
REQUEST_SUBMIT_BATCH_JOB has authentication error: No such device or address
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/drmaa/session.py", line 340,
in runBulkJobs
    return list(run_bulk_job(jobTemplate, beginIndex, endIndex, step))
  File "/usr/lib/python2.7/site-packages/drmaa/helpers.py", line 286,
in run_bulk_job
    c(drmaa_run_bulk_jobs, jids, jt, start, end, incr)
  File "/usr/lib/python2.7/site-packages/drmaa/helpers.py", line 302, in c
    return f(*(args + (error_buffer, sizeof(error_buffer))))
  File "/usr/lib/python2.7/site-packages/drmaa/errors.py", line 151, in
error_check
    raise _ERRORS[code - 1](error_string)
drmaa.errors.InternalException: code 1: slurm_submit_batch_job error
(1007): Protocol authentication error


We are running CentOS7 and the following munge development libs are
installed on all the nodes:

munge-devel-0.5.11-3.el7.x86_64
munge-0.5.11-3.el7.x86_64
munge-libs-0.5.11-3.el7.x86_64


Here is the commands I used to compile slurm so I think the munge plugin
was correctly built:

./configure --sysconfdir=/etc/slurm --enable-pam
make -j $(nproc)
make install
ldconfig


I don't know if this is a slurm or a drmaa bug. So any advice would be
welcome.


Best.

--
Julien Rey

Plate-forme RPBS
Unité BFA - CMPLI
Université de Paris
tel: 01 57 27 83 95


Rémi Palancher

unread,
Oct 5, 2023, 4:15:42 AM10/5/23
to Slurm User Community List
Hello Julien,

Le mercredi 4 octobre 2023 à 19:04, Julien Rey <julie...@univ-paris-diderot.fr> a écrit :

> Hello,
>
> I did an upgrade of Slurm this week (20.11 to 21.08.8) and while
> everything seems to be working with srun and sbatch commands, here is
> what I get when I try to launch jobs from drmaa library:
>
> …
>
> I don't know if this is a slurm or a drmaa bug. So any advice would be
> welcome.

Slurm daemons, binaries and libraries check the version of the plugins matches their own version at load time. The version of the plugins is bumped on every major version of Slurm (eg. 21.08) hence plugins compiled with 21.08 cannot be loaded by programs linked with libslurm from Slurm 20.11.

I suspect in this case DRMMA to be compiled and linked on libslurm from Slurm 20.11 trying (and failing) to load newer plugins provided with Slurm 21.08.

Did you try to recompile your DRMMA layer against Slurm 21.08.8 headers and library?

--
Rémi Palancher
Rackslab: Open Source Solutions for HPC Operations
https://rackslab.io


Julien Rey

unread,
Oct 6, 2023, 12:38:51 PM10/6/23
to slurm...@lists.schedmd.com
Hello Rémy,

Indeed, libdrmaa was linked against the wrong version of libslurm:

ldd /usr/local/lib/libdrmaa.so.1.0.8
    linux-vdso.so.1 =>  (0x00007ffe17b8b000)
    libslurm.so.36 => /usr/local/lib/libslurm.so.36 (0x00007f237179f000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f237159b000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f237137f000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f237107d000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f2370e63000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f2370a95000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f2371d7d000)

I just had to recompile drmaa to make it work again.

Somehow the error message I was getting was misleading. The fix was very
simple actually.

Thanks for your help.

Best.

J.
Reply all
Reply to author
Forward
0 new messages