[slurm-users] Block jobs on GPU partition when GPU is not specified

1,047 views
Skip to first unread message

Ratnasamy, Fritz

unread,
Sep 25, 2021, 1:22:32 AM9/25/21
to Slurm User Community List
Hi, 

I would like to block jobs submitted in our GPU partition when gres=gpu:1 (or any number between 1 and 4) is not specified when submitting a job through sbatch or requesting an interactive session with srun. 
Currently, /etc/slurm/slurm.conf has JobSumitPlugins=lua commented. 
The liblua.so is now installed. 
I would like to use something similar as the example mentioned at the end of the page: 
https://slurm.schedmd.com/resource_limits.html
Can I use the following code : 
function slurm_job_submit(job_desc, part_list, submit_uid)
   if (job_desc.gres ~= nil)
   then
      for g in job_desc.gres:gmatch("[^,]+")
      do
	 bad = string.match(g,'^gpu[:]*[0-9]*$')
	 if (bad ~= nil)
	 then
	    slurm.log_info("User specified gpu GRES without type: %s", bad)
	    slurm.user_msg("You must always specify a type when requesting gpu GRES")
	    return slurm.ERROR
	 end
      end
   end
end
I do not need to check if the model is specified though. In that case, 
1/ Should I change the line bad = string.match(g,'^gpu[:]*[0-9]*$') to string.match(g,'^gpu[:]*[0-9]') 
2/ Do I need to uncomment  JobSumitPlugins=lua
3/ Where to specify the function call slurm_job_submit so I make sure the check to see if gres=gpu:1 is happening?
4/ I would need job_submit_lua.so, where can I find that library and if it is not there, how can i dowload it? 

Thanks for your help. I am new to regular expressions, lua and Slurm so I apologize if my questions do not make sense. 


Fritz Ratnasamy

Data Scientist

Information Technology

The University of Chicago

Booth School of Business

5807 S. Woodlawn

Chicago, Illinois 60637

Phone: +(1) 773-834-4556

Renfro, Michael

unread,
Sep 25, 2021, 12:08:51 PM9/25/21
to Slurm User Community List

If you haven't already seen it there's an example Lua script from SchedMD at [1], and I've got a copy of our local script at [2]. Otherwise, in the order you asked:

 

  1. That seems reasonable, but our script just checks if there's a gres at all. I don't *think* any gres other than gres=gpu would let the job run, since our GPU nodes only have Gres=gpu:2 entries. Same thing for asking for more GPUs than are in the node: if someone asked for gres=gpu:3 or higher, the job would get blocked.

    The above might be an annoyance to your users if their job just sits in the queue with no other notice, but it hasn't really been an issue here. The big benefit from your side would be that you could simplify the if statement down to something like 'if (job_desc.gres ~= nil)'.

  2. yes, uncomment JobSubmitPlugins=lua

  3. Far as I know, if you uncomment the JobSubmitPlugin line and have a job_submit.lua file in the same folder as your slurm.conf, the Lua script should get executed automatically.

  4. Our RPM installations of Slurm contained the job_submit_lua.so, both for Bright 8 and for OpenHPC.

 

[1] https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua

[2] https://gist.github.com/mikerenfro/df89fac5052a45cc2c1651b9a30978e0

 

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Ratnasamy, Fritz <fritz.r...@chicagobooth.edu>
Date: Saturday, September 25, 2021 at 12:23 AM
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [slurm-users] Block jobs on GPU partition when GPU is not specified

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.


Ratnasamy, Fritz

unread,
Sep 27, 2021, 1:31:54 PM9/27/21
to Slurm User Community List
Hi Michael Renfro, 

Thanks for your reply. Based on your answers, would this work: 
1/ a function job_submit.lua with the following contents (just need a function that errored when gres:gpu is not specified in srun or in sbatch): 

function slurm_job_submit(job_desc, part_list, submit_uid)

        if job_desc.partition == 'gpu' then
                     if  (job_desc.gres == nil) then
                              slurm.log_info("User did not specified gres=gpu: ")
                              slurm.user_msg("You have to specify gres=gpu:x  where x is number of GPUs.")

                              return slurm.ERROR
                     end
        end
end


4/  I found out a file  the file job_submit_lua.so in our controller in /lib64/slurm/ and also the lua lib seems to be installed: 
 sudo rpm -qa | grep lua

lua-5.3.4-11.el8.x86_64
lua-libs-5.3.4-11.el8.x86_64
lua-devel-5.3.4-11.el8.x86_64

 so I guess for now I just need to create job_submit.lua, uncomment the job plugin in slurm.conf/ is there any Slurm service to restart after that?

Thanks again

Fritz Ratnasamy

Data Scientist

Information Technology

The University of Chicago

Booth School of Business

5807 S. Woodlawn

Chicago, Illinois 60637

Phone: +(1) 773-834-4556


CAUTION: This email has originated outside of University email systems. Please do not click links or open attachments unless you recognize the sender and trust the contents as safe.

Renfro, Michael

unread,
Sep 27, 2021, 2:40:38 PM9/27/21
to Slurm User Community List

Might need a restart of slurmctld at most, I expect.

Ratnasamy, Fritz

unread,
Sep 27, 2021, 2:59:08 PM9/27/21
to Slurm User Community List
Does the script below look correct? 

function slurm_job_submit(job_desc, part_list, submit_uid)

        if job_desc.partition == 'gpu' then
                     if  (job_desc.gres == nil) then
                              slurm.log_info("User did not specified gres=gpu: ")
                              slurm.user_msg("You have to specify gres=gpu:x  where x is number of GPUs.")
                              return slurm.ERROR
                     end
        end
end

Fritz Ratnasamy

Data Scientist

Information Technology

The University of Chicago

Booth School of Business

5807 S. Woodlawn

Chicago, Illinois 60637

Phone: +(1) 773-834-4556


Renfro, Michael

unread,
Sep 27, 2021, 3:04:09 PM9/27/21
to Slurm User Community List

On a quick read, it did look correct.

Reply all
Reply to author
Forward
0 new messages