[slurm-users] Is it possible to define multiple partitions for the same node, but each one having a different subset of GPUs?

Cristóbal Navarro

unread,

Mar 31, 2021, 11:22:03 AM3/31/21

to slurm...@lists.schedmd.com

Hi Community,

I was checking the documentation but could find clear information on what I am trying to do.

Here at the university we have a large compute node with 3 classes of GPUs. Lets say the node's hostname is "gpuComputer", it is composed of:

4x large GPUs
4x medium GPUs (MIG devices)
16x small GPUs (Mig devices)

Our plan is that we want to have one partition for each class of GPUs.

So if a user chooses the "small" partition, it will only see up to 16x small GPUs, and would not interfere with other jobs running on the "medium" or "large" partitions.

Can I create three partitions and specify the corresponding subset of GPUs for each one?

If not, would NodeName and NodeHostname serve as an alternative way? i.e., to specify the node three times with different NodeName, but all using the same Hostname=gpuComputer, and specifying the corresponding subset of "Gres" resources for each one. Then on each partition, to choose the corresponding NodeName.

Any feedback or advice on the best way to accomplish this would be much appreciated.

best regards

--

Cristóbal A. Navarro

Brian Andrus

unread,

Mar 31, 2021, 1:47:12 PM3/31/21

to slurm...@lists.schedmd.com

So the node definition is separate from the partition definition.

You would need to define all the GPUs as part of the node. Partitions do not have physical characteristics, but they do have QOS capabilities that you may be able to use. You could also use a job_submit lua script to reject jobs that request resources you do not want used in a particular queue.

Both would take some research to find the best approach, but I think those are the two options available that may do what you are looking for.

Brian Andrus

Cristóbal Navarro

unread,

Mar 31, 2021, 9:36:35 PM3/31/21

to Sarlo, Jeffrey S, slurm...@lists.schedmd.com

Many thanks Brian and Jeffrey for your ideas,

Yes, at this moment I have all resources listed in the node's definition line, and just one partition (see below)

Indeed this config would work, with the collaboration of users to not abuse requesting all existing GPUs for their jobs.

But something that I still don't have 100% clear, will it allow multiple jobs to run at the same time if these request different GPUs ?

## Nodes List

NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=1024000 State=UNKNOWN Gres=gpu:a100:4,gpu:a100_20g:2,gpu:a100_10g:2,gpu:a100_5g:16 Feature=ht,gpu

## Partitions list
PartitionName=gpu MaxTime=INFINITE State=UP Nodes=nodeGPU01 Default=YES

On Wed, Mar 31, 2021 at 3:16 PM Sarlo, Jeffrey S <JSa...@central.uh.edu> wrote:

I think when you define the node in your slurm.conf, you could specify the different types you have and the number in the node. Then when the user submits the job, they could specify the number and type they want and that would all work in one partition. I have never done it because our nodes have the same type in them.

For example, we have V100 and P100 gpus and decided on the type names of volta and tesla

GresTypes=gpu

NodeName=compute-0-[36-43] Gres=gpu:tesla:2 Feature=gen9

NodeName=compute-4-[0-3] Gres=gpu:volta:8 Feature=gen9

The user then just uses the SBATCH directive --gpus=tesla:1 to request one P100 gpu.

This is an example from https://slurm.schedmd.com/slurm.conf.html

(e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_consume:4G")