[slurm-users] Trouble Running Slurm C Extension Plugin

Skip to first unread message

Glen MacLachlan via slurm-users

Apr 9, 2024, 12:22:16 PMApr 9
to slurm...@schedmd.com

We have a plugin in Lua that mostly does what we want but there are features available in the C extension that are not available to lua. For that reason, we are attempting to convert to C using the guidance found here: https://slurm.schedmd.com/job_submit_plugins.html#buildingWe arrived here because the lua plugins don't seem to stretch enough to cover the use case we were looking at, i.e., branching off of the value of alloc_id or, for that matter, get_sid().

The goal is to disallow interactive allocations (i.e., salloc) on specific partitions while allowing it on others. However, we've run into an issue with our C plugin right out of the gate and I've included a minimal reproducer as an example which is basically a "Hello World" type of test (job_submit_disallow_salloc.c, see attached). 

What we expect to happen is a sort of hello-world result with a message being written to a /tmp/min_repo.log but that does not occur. It seems that the plugin does not get run at all when jobs are submitted. Jobs still run as expected but the plugin seems to be ignored. 

We compile 
gcc -fPIC -DHAVE_CONFIG_H -I /modules/source/slurm-23.02.4 -g -O2 -pthread -fno-gcse -Werror -Wall -g -O0 -fno-strict-aliasing -MT job_submit_disallow_salloc.lo -MD -MP -MF .deps/job_submit_disallow_salloc.Tpo -c job_submit_disallow_salloc.c -o .libs/job_submit_disallow_salloc.o

mv .deps/job_submit_disallow_salloc.Tpo .deps/job_submit_disallow_salloc.Plo

and link
gcc -shared -fPIC -DPIC .libs/job_submit_disallow_salloc.o -O2 -pthread -O0 -pthread -Wl,-soname -Wl,job_submit_disallow_salloc.so    -o job_submit_disallow_salloc.so

Check links after copying to /usr/lib64/slurm:
ldd /usr/lib64/slurm/job_submit_disallow_salloc.so
linux-vdso.so.1 (0x00007ffe467aa000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c02095000)
libc.so.6 => /lib64/libc.so.6 (0x00007f1c01cd0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1c024b7000)

Can someone point out what we are doing incorrectly or how we might troubleshoot this issue?

Kindest regards, 

The minimal reproducer is basically a "hello world" for C extensions which I've pasted below (I've also attached it for convenience):

#include <slurm/slurm.h>
#include <slurm/slurm_errno.h>
#include <stdio.h>
#include "src/slurmctld/slurmctld.h"

const char plugin_name[] = "Min Reproducer";
const char plugin_type[] = "job_submit/disallow_salloc";
const uint32_t plugin_version = SLURM_VERSION_NUMBER;

extern int job_submit(job_desc_msg_t *job_desc, uint32_t submit_uid,
                      char **err_msg)
        FILE *fp;
        fp = fopen("/tmp/min_repo.log", "w");

        return SLURM_SUCCESS;

int job_modify(job_desc_msg_t *job_desc, job_record_t *job_ptr,
               uint32_t submit_uid, char **err_msg)
        return SLURM_SUCCESS;


Ryan Cox via slurm-users

Apr 9, 2024, 4:48:22 PMApr 9
to Glen MacLachlan, slurm...@schedmd.com

I don't think I see it in your message, but are you pointing to the plugin in slurm.conf with JobSubmitPlugins=?  I assume you are but it's worth checking.

Ryan Cox
Office of Research Computing
Brigham Young University

Glen MacLachlan via slurm-users

May 10, 2024, 8:51:33 AM (9 days ago) May 10
to Ryan Cox, slurm...@schedmd.com
Hi Ryan, 

My apologies for letting this reply languish. Thank you for your reply - we have a working plugin now. 

I believe the issue using the plugin without restarting slurmctld first was (for some reason I still haven't figured out) causing slurmctld to crash and I had attributed it to a problem with the plugin itself. 

I found that restarting slurmctld was required. Without restarting, even if I run scontrol reconfigure, I was getting
salloc: error: Job submit/allocate failed: Unexpected message received. 
It's consistent - I just tested it again to double check before sending this reply and the smallest change to the plugin will cause slurmctld to crash if I don't restart it first. Maybe that was mentioned somewhere in the job_submit_plugins documentation but if so I missed it and that's pretty much all that we needed. 

Thanks again!

Kind Regards,

Glen MacLachlan, PhD
Cyberinfrastructure Specialist 
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147

slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com
Reply all
Reply to author
0 new messages