[slurm-users] backfill on overlapping partitions problem

220 views

Skip to first unread message

Andrej Filipcic

unread,

Oct 26, 2021, 10:42:04 AM10/26/21

to slurm...@lists.schedmd.com

Hi,

We have a strange problem with backfilling, there are
large partition "cpu" and overlapping partition "largemem" which is a
subset of "cpu" nodes.

Now, user A is submitting low priority jobs to "cpu", user B high
priority jobs to "largemem"
If there are queued jobs in "largemem" (draining nodes there), the
slurmctld would never backfill the "cpu". At the extreme,
non-overlapping "cpu" nodes would get empty until higher prio jobs get
all running in "largemem"

Any hint or workaround here? backfill works quite fine if all the jobs
are submitted to "cpu" partition. User A has typically smaller and
shorter jobs, good for backfilling.

we use these settings with slurm:
PriorityType=priority/multifactor
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CORE_MEMORY,CR_CORE_DEFAULT_DIST_BLOCK
SchedulerParameters =
bf_max_job_test=2000,bf_window=1440,default_queue_depth=1000,bf_continue

Best regards,
Andrej

--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej....@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-425-7074
-------------------------------------------------------------

Matt Jay

unread,

Oct 26, 2021, 1:05:35 PM10/26/21

to Slurm User Community List

Hi Andrej,

Take a look at this, and see if it matches up with your issue (I'm not 100% sure based on your description):
https://bugs.schedmd.com/show_bug.cgi?id=3881

The takeaway from that is the following (quote from SchedMD): " If there are _any_ jobs pending (regardless of the reason for the job still pending) in a partition with a higher Priority, no jobs from a lower Priority will be launched on nodes that are shared in common."

The above is apparently pretty intrinsic to how Slurm scheduling works, and is unlikely to change.

We worked around this by keeping all partitions at the same priority, and using QOS instead for priority/preemption -- that has the unfortunate side effect of tying down your QOS's to be used for that purpose, but it works for our situation.

Best of luck,
-Matt

Matt Jay
Sr. HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology

Reply all

Reply to author

Forward

0 new messages