Hello,
I have three nodes, serving each one 2 GPUs. I would like to limit (qos??) that a user could user only one GPU from earch server, but user could user simultaneously three GPUs if each GPU belongs to different servers. With this QoS “sacctmgr add qos test-limit-GPUs MaxJobsPerUser=3 MaxTRESPerUser=gres/gpu=1” I can limit to one GPU, but then user can’t run other job in a GPU from other server. How must I configure QoS (or other method) to allow more than one job requesting GPUs but never in the same server?
Thanks.
You may want to look at MaxTRESPerNode and possibly MaxTRESPerJob. Doing it PerUser means all running jobs for that user, which may not be what you want.
Brian Andrus
MaxTRESPerNodewill achieve the desired job distribution because the limits are applied per job not across all a user's jobs. But… I’ve never used this limit, and I may be interpreting the docs incorrectly.
Definitely worth testing.SelectTypeParameters=CR_LLN would help distribute the workloads across least loaded nodes but also wouldn’t guarantee one job per user per node.Sebastian Smith
Seattle Children’s Hospital
DevOps Engineer, Principal
Email: sebasti...@seattlechildrens.org
Web: https://seattlechildrens.org
--