Take a look at https://github.com/SchedMD/slurm/search?q=dri%2F
If the ROCM-SMI API is present, using AutoDetect=rsmi in gres.conf might be enough, if I'm reading this right.
Of course, this assumes the cards in question are AMD and not NVIDIA.
Just a quick addendum - rsmi_dev_drm_render_minor_get used in the plugin references the ROCM-SMI lib from https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/2e8dc4f2a91bfa7661f4ea289736b12153ce23c2/src/rocm_smi.cc#L1689 So the library (as an .so file) should be installed for this to work.
Hello, I'm reviving a bit of old thread, but I just noticed I don't see my January 2021 message in the archives, so I'm sending it again now that the issue again got live on our side.
To quickly recap, we want to add permissions not only to /dev/nvidia* devices based on the requested gres, but also to the corresponding /dev/dri/card* and /dev/dri/renderD* devices - they are all connected to the same GPU, but the additional two allow using the card for rendering instead of CUDA computations etc. I had some idea how to achieve that without changing SLURM codebase, and I got something that could almost work. It probably just needs some polishing. Could anybody please comment whether the proposed solution is a good idea?
The 15 Jan 2021 message:
So I started thinking if this could not be somehow handled by a prologue script and direct cgroup manipulation? I'm no expert in either, so please check my lines of thoughts.
#!/bin/bash PATH=/usr/bin/:/bin gpus=${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS} # or CUDA_VISIBLE_DEVICES when run inside the cgroup? cgroup=$(cat /proc/self/cgroup | grep devices | cut -d: -f3) # or something else? # blacklist all DRM devices (major 226) cgset -r devices.deny="a 226:* rwm" devices:${cgroup} for NVIDIA_SMI_ID in
${gpus//,/ }
; do # find on which PCI path does this device sit pci_id=$(nvidia-smi -i $NVIDIA_SMI_ID --query-gpu=pci.bus_id --format=noheader,csv | tail -c+5 | tr '[:upper:]' '[:lower:]') # find the DRM devices sitting on the same PCI bus card=$(ls /sys/bus/pci/devices/${pci_id}/drm/ | grep card | xargs basename) render=$(ls /sys/bus/pci/devices/${pci_id}/drm/ | grep renderD | xargs basename) # allow access to the DRM devices [ -n "${card}" ] &&
cgset -r devices.allow="c $(cat /sys/class/drm/${card}/dev) rw" devices:${cgroup} && echo "Allowed /dev/dri/${card} DRI device access"
[ -n "${render}" ] &&
cgset -r devices.allow="c $(cat /sys/class/drm/${render}/dev) rw" devices:${cgroup}
done
&& echo "Allowed /dev/dri/${render} render node access"
Now I wonder whether this should be Prolog=, TaskProlog= or something else (that would also change whether I look at CUDA_VISIBLE_DEVICES or SLURM_STEP_GPUS, and how I figure out the cgroup name). I guess that were this script run as the invoking user, then nothing would prevent him from gaining access to all devices again. So I'd incline to treat it as a Prolog= script run by root. How would I get the cgroup ID then? Compose it from parts as mentioned in the slurm cgroups docs? (/cgroup/cpuset/slurm/uid_100/job_123/step_0/task_2) Or is there a more reliable way?
A related but offtopic idea popped up in my head when thinking
about GPUs. Most of them are actually a consolidation of more
devices like stream processors, encoders, decoders, raytraces,
shaders, memory etc. Could it be possible (in future) to actually
offer each of these pieces as a different gres? The problem is
most of them do not have any special file which the user could
lock to tell the others he's playing there now. So it'd probably
require support at the level of cgroup implemetation, which, in
turn, would require changing all GPU drivers. And it would require
being able to request just chunks of GPU memory (not sure if
that's possible right now, but I think I saw some pull request
about that).
Thank you for hints!
Martin
Dne 21.10.2020 v 19:09 Martin Pecka napsal(a):
Or maybe could this be "emulated" by a set of 3 GRES per card that are "linked" together? I.e. rules like "if the user requests GRES /dev/dri/card0, he will also automatically need to claim /dev/dri/renderD128 and /dev/nvidia0"?
Dne 21.10.2020 v 18:52 Daniel Letai napsal(a):