[slurm-users] Keep CPU Jobs Off GPU Nodes

Frank Pari

unread,

Mar 28, 2023, 4:07:25 PM3/28/23

to slurm...@lists.schedmd.com

Hi all,

First, thank you all for participating in this list. I've learned so much by just following in other's threads. =)

I'm looking at creating a scavenger partition with idle resources from CPU and GPU nodes and I'd like to keep this to one partition. But, I don't want CPU only jobs using up resources on the GPU nodes.

I've seen suggestions for job/lua scripts. But, I'm wondering if there's any other way to ensure a job has requested at least 1 gpu for the scheduler to assign that job to a GPU node.

Thanks in advance!

-Frank

Frank Pari

unread,

Mar 28, 2023, 7:24:48 PM3/28/23

to slurm...@lists.schedmd.com

Well, I wanted to avoid using lua. But, it looks like that's going to be the easiest way to do this without having to create a separate partition for the GPUs. Basically, check for at least one gpu in the job submission and if none exclude all GPU nodes for the job.

Now I'm wondering how to auto-gen the list of nodes with GPUs, so I don't have to remember to update job_submit.lua everytime we get new GPU nodes.

-F

Ward Poelmans

unread,

Mar 29, 2023, 2:58:21 AM3/29/23

to slurm...@lists.schedmd.com

Hi,

We have a dedicated partitions for GPUs (their name ends with _gpu) and simply forbid a job that is not requesting GPU resources to use this partition:

local function job_total_gpus(job_desc)
-- return total number of GPUs allocated to the job
-- there are many ways to request a GPU. This comes from the job_submit example in the slurm source
-- a GPU resource is either nil or "gres:gpu:N", with N the number of GPUs requested

-- pick relevant job resources for GPU spec (undefined resources can show limit values)
gpu_specs = {
['tres_per_node'] = 1,
['tres_per_task'] = 1,
['tres_per_socket'] = 1,
['tres_per_job'] = 1,
}

-- number of nodes
if job_desc['min_nodes'] < 0xFFFFFFFE then gpu_specs['tres_per_node'] = job_desc['min_nodes'] end
-- number of tasks
if job_desc['num_tasks'] < 0xFFFFFFFE then gpu_specs['tres_per_task'] = job_desc['num_tasks'] end
-- number of sockets
if job_desc['sockets_per_node'] < 0xFFFE then gpu_specs['tres_per_socket'] = job_desc['sockets_per_node'] end
gpu_specs['tres_per_socket'] = gpu_specs['tres_per_socket'] * gpu_specs['tres_per_node']

gpu_options = {}
for tres_name, _ in pairs(gpu_specs) do
local num_gpus = string.match(tostring(job_desc[tres_name]), "^gres:gpu:([0-9]+)") or 0
gpu_options[tres_name] = tonumber(num_gpus)
end
-- calculate total GPUs
for tres_name, job_res in pairs(gpu_specs) do
local num_gpus = gpu_options[tres_name]
if num_gpus > 0 then
total_gpus = num_gpus * tonumber(job_res)
return total_gpus
end
end
return 0
end

function slurm_job_submit(job_desc, part_list, submit_uid)
local total_gpus = job_total_gpus(job_desc)
slurm.log_debug("Job total number of GPUs: %s", tostring(total_gpus));

if total_gpus == 0 then
for partition in string.gmatch(tostring(job_desc.partition), '([^,]+)') do
if string.match(partition, '_gpu$') then
slurm.log_user(string.format('ERROR: GPU partition %s is not allowed for non-GPU jobs.', partition))
return ESLURM_INVALID_GRES
end
end
end

return slurm.SUCCESS
end

Ward

On 29/03/2023 01:24, Frank Pari wrote:
> Well, I wanted to avoid using lua. But, it looks like that's going to be the easiest way to do this without having to create a separate partition for the GPUs. Basically, check for at least one gpu in the job submission and if none exclude all GPU nodes for the job.
>

> image.png
>
> Now I'm wondering how to auto-gen the list of nodes with GPUs, so I don't have to remember to update job_submit.lua everytime we get new GPU nodes.
>
> -F
>

Wagner, Marcus

unread,

Mar 29, 2023, 3:35:27 AM3/29/23

to slurm...@lists.schedmd.com

Hi Frank,

use Features on the nodes, every cpu node gets e.g. "cpu", every gpu node e.g. "gpu".

If a job asks for no gpus, set an additional constraint "cpu" for the job.

Best
Marcus

René Sitt

unread,

Mar 29, 2023, 4:08:45 AM3/29/23

to slurm...@lists.schedmd.com

Hello,

maybe some additional notes:

While the cited procedure works great in general, it gets more
complicated for heterogeneous setups, i.e. if you have several GPU types
defined in gres.conf, since the 'tres_per_<x>' fields can then take the
form of either 'gres:gpu:N' or 'gres:gpu:<type>:N' - depending on
whether the job script specifies a GPU type or not.
Of course, you could omit the GPU type definition in gres.conf and
define the type as a node feature instead, as long as no nodes contain
multiple different GPU types.
Since the latter is the case in our cluster, I instead opted to check
only for the existence of 'gpu' in the 'tres_per_<x>' fields and to not
bother with parsing the actual number of GPUs. However, there is an
interesting edge case here, as users are free to set --gpus=0 - either
one has to filter for that specifically, or instruct one's users to not
do that.

Kind Regards,
René Sitt

Am 29.03.23 um 08:57 schrieb Ward Poelmans:

--
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg

Tel. +49 6421 28 23523
si...@hrz.uni-marburg.de
www.hkhlr.de

Markus Kötter

unread,

Mar 29, 2023, 5:01:40 AM3/29/23

to slurm...@lists.schedmd.com

Hello,

On 29.03.23 10:08, René Sitt wrote:
> While the cited procedure works great in general, it gets more
> complicated for heterogeneous setups
> , i.e. if you have several GPU types
> defined in gres.conf, since the 'tres_per_<x>' fields can then take the
> form of either 'gres:gpu:N' or 'gres:gpu:<type>:N' - depending on
> whether the job script specifies a GPU type or not.

Using lua match:

> for g in job_desc.gres:gmatch("[^,]*") do
> count = g:match("gres:gpu:%w+:(%d+)$") or g:match("gres:gpu:(%d+)$")
> if count then

MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security

Reply all

Reply to author

Forward