[slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs

993 views
Skip to first unread message

taleint...@sjtu.edu.cn

unread,
Mar 23, 2022, 10:43:07 AM3/23/22
to slurm...@lists.schedmd.com

Hi, all:

 

We found a problem that slurm job with argument such as --gres gpu:1 didn’t be restricted with gpu usage, user still can see all gpu card on allocated nodes.

Our gpu node has 4 cards with their gres.conf to be:

> cat /etc/slurm/gres.conf

Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia0 CPUs=0-15

Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia1 CPUs=16-31

Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia2 CPUs=32-47

Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia3 CPUs=48-63

 

And for test, we submit simple job batch like:

#!/bin/bash

#SBATCH --job-name=test

#SBATCH --partition=a100

#SBATCH --nodes=1

#SBATCH --ntasks=6

#SBATCH --gres=gpu:1

#SBATCH --reservation="gpu test"

hostname

nvidia-smi

echo end

 

Then in the out file the nvidia-smi showed all 4 gpu cards. But we expect to see only 1 allocated gpu card.

 

Official document of slurm said it will set CUDA_VISIBLE_DEVICES env var to restrict the gpu card available to user. But we didn’t find such variable exists in job environment. We only confirmed it do exist in prolog script environment by adding debug command “echo $CUDA_VISIBLE_DEVICES” to slurm prolog script.

 

So how do slurm co-operate with nvidia tools to make job user only see its allocated gpu card? What is the requirement on nvidia gpu drivers, CUDA toolkit or any other part to help slurm correctly restrict the gpu usage?

John Hanks

unread,
Mar 23, 2022, 10:57:06 AM3/23/22
to Slurm User Community List
Do you have a matching Gres=gpu:4 or similar in your node config lines? I'm not sure if that is still required, but we have it in our config which does work to isolate GPUs to jobs they are assigned to. 

griznog

Brian Andrus

unread,
Mar 23, 2022, 10:57:45 AM3/23/22
to slurm...@lists.schedmd.com

It should exist in the user environment as well.

I would check the users .bashrc and .bash_profile settings to see if they are doing anything that will change that.

Brian Andrus

Sean Maxwell

unread,
Mar 23, 2022, 11:06:10 AM3/23/22
to Slurm User Community List
Hi,

If you are using cgroups for task/process management, you should verify that your /etc/slurm/cgroup.conf has the following line:

ConstrainDevices=yes

I'm not sure about the missing environment variable, but the absence of the above in cgroup.conf is one way the GPU devices can be unconstrained in the jobs.

-Sean


Tina Friedrich

unread,
Mar 23, 2022, 11:09:34 AM3/23/22
to slurm...@lists.schedmd.com
What does your cgroup.conf look like on the GPU nodes? (I don't think
it's possible to make it so it's properly not visible without using
cgroups restrictions.)

Tina


On 23/03/2022 14:42, taleint...@sjtu.edu.cn wrote:
> Hi, all:
>
> We found a problem that slurm job with argument such as *--gres gpu:1
> *didn’t be restricted with gpu usage, user still can see all gpu card on
> allocated nodes.
>
> Our gpu node has 4 cards with their gres.conf to be:
>
>> cat /etc/slurm/gres.conf
>
> Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia0 CPUs=0-15
>
> Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia1 CPUs=16-31
>
> Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia2 CPUs=32-47
>
> Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia3 CPUs=48-63
>
> And for test, we submit simple job batch like:
>
> #!/bin/bash
>
> #SBATCH --job-name=test
>
> #SBATCH --partition=a100
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks=6
>
> #SBATCH --gres=gpu:1
>
> #SBATCH --reservation="gpu test"
>
> hostname
>
> nvidia-smi
>
> echo end
>
> Then in the out file the nvidia-smi showed all 4 gpu cards. But we
> expect to see only 1 allocated gpu card.
>
> Official document of slurm said it will set *CUDA_VISIBLE_DEVICES *env
> var to restrict the gpu card available to user. But we didn’t find such
> variable exists in job environment. We only confirmed it do exist in
> prolog script environment by adding debug command “echo
> $CUDA_VISIBLE_DEVICES” to slurm prolog script.
>
> So how do slurm co-operate with nvidia tools to make job user only see
> its allocated gpu card? What is the requirement on nvidia gpu drivers,
> CUDA toolkit or any other part to help slurm correctly restrict the gpu
> usage?
>

--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Greg Wickham

unread,
Mar 23, 2022, 11:11:13 AM3/23/22
to Slurm User Community List

If it’s possible to see other GPUs within a job then that means that cgroups aren’t being used.

 

Look at the cgroup documentation of slurm (https://slurm.schedmd.com/cgroup.conf.html)

 

With cgroups activated an `nvidia-smi` will only show the GPU allocated to the job.

 

   -greg

taleint...@sjtu.edu.cn

unread,
Mar 24, 2022, 4:27:45 AM3/24/22
to Sean Maxwell, Slurm User Community List

Well, this is indeed the point. We didn’t set ConstrainDevices=yes in cgroup.conf. After adding this, gpu restriction works as expected.

But what is the relation between gpu restriction and cgroup? I never heard that cgroup can limit gpu card usage. Isn’t it a feature of cuda or nvidia driver?

 

发件人: Sean Maxwell <s...@case.edu>
发送时间: 2022323 23:05
收件人: Slurm User Community List <slurm...@lists.schedmd.com>
主题: Re: [slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs

Sean Maxwell

unread,
Mar 24, 2022, 7:18:03 AM3/24/22
to taleint...@sjtu.edu.cn, Slurm User Community List
cgroups can control access to devices (e.g. /dev/nvidia0), which is how I understand it to work.

-Sean
Reply all
Reply to author
Forward
0 new messages