[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

94 views
Skip to first unread message

Kevin Buckley

unread,
Dec 16, 2020, 9:23:03 PM12/16/20
to Slurm User Community List
Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen anything like this.

As the slurmctld restarts, after upping the debug level, it all look hunky dory,

[2020-12-17T09:23:46.204] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so
[2020-12-17T09:23:46.205] debug3: Success.
[2020-12-17T09:23:46.206] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_lua.so
[2020-12-17T09:23:46.207] debug3: slurm_lua_loadscript: job_submit/lua: loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:23:46.208] debug3: Success.
[2020-12-17T09:23:46.209] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/prep_script.so
[2020-12-17T09:23:46.210] debug3: Success.

but, at the point a submiited job that should pass through the job_submit script,

[2020-12-17T09:26:06.806] debug3: job_submit/lua: slurm_lua_loadscript: skipping loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:26:06.807] debug3: assoc_mgr_fill_in_user: found correct user: someuser(12345)
[2020-12-17T09:26:06.808] debug5: assoc_mgr_fill_in_assoc: looking for assoc of user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance
[2020-12-17T09:26:06.809] debug3: assoc_mgr_fill_in_assoc: found correct association of user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance to assoc=67 acct=accnts0001


Reason I went looking is that the job_submit.lua should be telling
me, the job submitter, to "sling my hook" as I have, deliberately,
left something out.

FWIW, the debug level here goes all the way to 5, so I was hoping
for a little more info as to why it is skipping it.

The skip is occuring, in src/lua/slurm_lua.c, because of this trap

if (st.st_mtime <= *load_time) {
debug3("%s: %s: skipping loading Lua script: %s", plugin,
__func__, script_path);
return SLURM_SUCCESS;
}
debug3("%s: %s: loading Lua script: %s", __func__, plugin, script_path);

where "st" is a stat struct, but I am currently none the wiser as why
such a condition would be (maybe even, would need to be) triggered?

The job submit script is certainly "younger" than the time of the slurmctld
restart, and of the job submission, be then, why wouldn't it be?

Kevin
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre

Chris Samuel

unread,
Dec 16, 2020, 10:34:38 PM12/16/20
to slurm...@lists.schedmd.com
On 16/12/20 6:21 pm, Kevin Buckley wrote:

> The skip is occuring, in src/lua/slurm_lua.c, because of this trap

That looks right to me, that's Doug's code which is checking whether the
file has been updated since slurmctld last read it in. If it has then
it'll reload it, but if it hasn't then it'll skip it (and if you've got
debugging up high then you'll see that message).

So if you see that message then the lua has been read in to slurmctld
and should get called. You might want to check the log for when it last
read it in, just in case there was some error detected at that point.

You can also use luac to run a check over the script you've got like this:

luac -p /etc/opt/slurm/job_submit.lua

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Kevin Buckley

unread,
Dec 17, 2020, 9:05:24 PM12/17/20
to slurm...@lists.schedmd.com, Chris Samuel
On 2020/12/17 11:34, Chris Samuel wrote:
> On 16/12/20 6:21 pm, Kevin Buckley wrote:
>
>> The skip is occuring, in src/lua/slurm_lua.c, because of this trap
>
> That looks right to me, that's Doug's code which is checking whether the
> file has been updated since slurmctld last read it in. If it has then
> it'll reload it, but if it hasn't then it'll skip it (and if you've got
> debugging up high then you'll see that message).

OK. That makes sense.

> So if you see that message then the lua has been read in to slurmctld
> and should get called. You might want to check the log for when it last
> read it in, just in case there was some error detected at that point.

Well, in the log snippet I provided, the implication is: Success.

> You can also use luac to run a check over the script you've got like this:
>
> luac -p /etc/opt/slurm/job_submit.lua

There's no luac in the Cray SDB images by default, only the
supporting libs, as the functionality is clearly there, vis:
the very first loading had already picked up a "missing end",
hence the assumption that the Success seen was implying a
"deep joy".

Will keep playing: cheers for the info,
Reply all
Reply to author
Forward
0 new messages