Hello,
I would like to know if it would be possible to limit, using “sacctmgr”, use of a certain type of GPU according the name I have assigned in “gres.conf” file. For example, my small cluster has 3 GPUs nodes sharing 2 GPUs each one. Two of that GPUs are the same model but they are located in different servers. Because of my scenario, I would like to limit users to user only one of that GPU type and not allowing to use both of them.
For example:
What I want is users could user all of them but simultaniously, a user only could use one of the RTX3080.
Using QoS I have created a new “qos” with “sacctmgr add qos test-gpu-limit MaxTRESPerUser=gres/gpu=1”, but with this new QoS, users are limited to use only one GPU, even they need to use different GPUs models. I have tried with “sacctmgr add qos test-gpu-limit MaxTRESPerUser=gres/gpu:RTX3080:1” but system returns this error “sacctmgr: error: slurmdb_format_tres_str: no TRES id found for gres/gpu:RTX3080:1”.
So, could be possible to apply limits I want to apply?
Thanks.
--
Daniel Ruiz Molina |
Aquest missatge s'adreça exclusivament a la persona destinatària i pot contenir informació privada o confidencial. Si l'heu rebut per error, comuniqueu-nos-ho i destruïu-lo, i tingueu present que no teniu autorització per fer-ne cap ús. Abans d'imprimir aquest missatge penseu en el medi ambient. |
Hi,
because of my real scenario (in mi first post I explained my testing scenario), with several differents users of differents types (researchers, university students and/or teachers, etc), I have distributed my GPUs in 3 differents partitions:
Explanation:
Since now, I have not used “QoS”... but I’m going to install a new data/user server and I want to reconfigure SLURM. If I distributed GPUs in the way that Gerhard Strangar explains (both similar RTX3080) in a partition with restricted QoS and all other GPUs in other partition, some GPUs that now are restricted for “inside lab user” will be accessible from “outside lab user”. So I think (all teachers want this way ☹ ) I must have these partitions distribution. However, if I apply a QoS limiting only one GPU in each partition. it woulb be possible that a user could user one RTX3080 from outside lab and the other RTX3080 from inside lab… and this is what I want to deny.
I will reread documentation.
All help will be appreciated, of course!!!!
Thanks.