[slurm-users] Enforcing -c and -t for fairshare scheduling and other setting

16 views
Skip to first unread message

r

unread,
May 13, 2022, 10:55:28 AM5/13/22
to slurm...@schedmd.com
Hi,

We've deployed a Slurm cluster and it works well. However, I would like to encourage users to conserve resources and to distribute jobs more fairly.

Below are some ideas I'd like to implement, please let me know if they are feasible and, if so, point me in a correct direction. Or let me know if there are better ways of achieving the above goal.

I would like to:
- Require users to specify -c and -t options. That is, to reject any jobs that do not specify these options. Optionally also --mem but that is of low priority to us.
- Forbid use of --cpu-bind=no or treat it as -c 64.
- Set up a fairshare scheduler and assign weight to values specified via -c and -t
- Enforce resource limits specified via -c, -t and -mem (-t and -c already work, at least without --cpu-bind=no)
- Either limit the overall number of CPU slots per partition or test for availability of licences before jobs are released from the queue. This is to prevent jobs from waiting for licenses at run time and potentially get killed when -t timeout is exceeded.
- Ideally, force jobs to queue for a certain period of time (a small fraction of -c * -t) even if partition has available resources left. This is to prevent large jobs from being submitted and dispatched ahead of smaller jobs, and to further reward conserving resources.

Many thanks,
-R


Reply all
Reply to author
Forward
0 new messages