[slurm-dev] SPANK plugin to access job info at submission stage

4 views
Skip to first unread message

Yong Qin

unread,
Jul 19, 2016, 1:53:56 AM7/19/16
to slurm-dev
Hi,

I'm trying to write a plugin to filter jobs at submission time (accept or deny with an error msg). I have to admit that I have not started reading the job submission plugin architecture yet and I will do that if there's really no way to implement it as a SPANK plugin.

My understanding up to this point is, to achieve this goal the most likely callback is slurm_spank_init() (local context). However at this stage there is no way to access any job related information until the job is allocated. Ideally I would like to access the job submission line in its original form (-n 4 -t 20:0:0 --mem 2g, etc.) so that I can be as thorough as possible when parsing it. Is there any way to access that information as I describe? Thanks for shedding the light.

Yong Qin

Marcin Stolarek

unread,
Jul 19, 2016, 3:40:03 AM7/19/16
to slurm-dev
check this:

However for directly accessing the options specified you probably need to work with  wrapper. Inside the plugin you can work on  job structure.

cheers,
marcin

Nicholas McCollum

unread,
Jul 19, 2016, 1:22:39 PM7/19/16
to slurm-dev

If you compile slurm with lua support you will have access to the
job_submit.lua plugin. You don't get to see the exact syntax that a user
used to submit a job, a feature I miss from Torque/Moab, you can see what
slurm interpreted it as. The documentation for the plugin is pretty poor
and I am no Lua expert, but it does do 99% of what I need it to do.

Below is a tiny snippet of my job_submit.lua script. Here I show that I want
users to specify a time limit as well as QOS, if they don't I want the job to
be rejected.

Since my supercomputers are used for educational purposes, I also want the user
to know why their job was rejected.

function slurm_job_submit(job_desc, part_list, submit_uid)
--[[ Start with an error count of 0 ]]--
local asc_error = 0
local asc_error_verbose = ""

--[[ Seems that setting variables to local speeds Lua up drastically ]]--
local asc_job_time = job_desc.time_limit
local asc_qos = job_desc.qos

--[[ If the user doesn't ask for a time limit, boot them ]]--
if asc_job_time == 4294967294 then
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nJob must request a time limit using the --time= flag.\n", asc_error_verbose)
end

--[[ If the user does not specify a QoS, boot them ]]--
if asc_qos == nil then
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nJob must request a QoS using the --qos= flag.\n", asc_error_verbose)
asc_qos = "invalid"
end

--[[ Spit out errors if found ]]--
if asc_error > 0 then
slurm.log_user("\n%s", asc_error_verbose)
return slurm.ERROR
end
return slurm.SUCCESS
end

If you look inside the source for the plugin:
https://github.com/SchedMD/slurm/blob/32dbffb237c0882a99758748b07fc1abfe352d06/src/plugins/job_submit/lua/job_submit_lua.c
You'll see that it has access to a ton of variables that are set during job
submission. Any of these variables can be modified before the jobs gets sent
to the queue.

Let me know if this is helpful or you need any pointers. I'm not an expert in
this, but I feel like this plugin could use better documentation as it is quite
flexible and powerful.

-------------------
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
Reply all
Reply to author
Forward
0 new messages