Hello,
I have added a new “qos” with these parameters: sacctmgr add qos test-GPUs MaxJobsPerUser=6 MaxTRESPerUser=gres/gpu=1 MaxSubmitJobsPerUser=25. With it, I only allow 6 running jobs per user, a total of 25 pending+running job per user and only 1 GPU. I have applied this qos directly to a partition in slurm.conf.
When a user submits to that partition requesting 2 or more GPUs, job remains “PD” (pending) and notifies “QOSMaxGRESPerUser” in NODELIST column, but I would like to know if it would be possible to direcly reject job and avoid that job remains at queue? For example, if I submit 50 jobs, after number 25 I get message “sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) sbatch: error: QOSMaxSubmitJobPerUserLimit” 25 times)
Thanks.
DenyOnLimit:
https://slurm.schedmd.com/qos.html#qos_other. Setting it on your QOS will cause Slurm to reject the job at submission if it exceeds Max or Grp limits. I think setting it will achieve the behavior you’re after.Sebastian Smith
Seattle Children’s Hospital
DevOps Engineer, Principal
Email: sebasti...@seattlechildrens.org
Web: https://seattlechildrens.org
--