[slurm-users] clarification on slurm scheduler and the "nice" parameter

123 views
Skip to first unread message

Matteo F

unread,
Apr 14, 2020, 3:01:08 AM4/14/20
to Slurm User Community List
Hello there,
I am having problems understanding the slurm scheduler, with regard to the "nice" parameter.

I have two types of job: one is low priority and uses 4 CPUs (--nice=20), the other one is high priority and uses 24 CPUs (--nice=10).
When I submit, let's say, 50 low-priority jobs, only 6 are executed - this is fine since a job uses 4 CPUs and the node has 24.
However, when I submit my high priority job that must use 24 CPUs, things get strange.

What I was expecting:
- slurm would have stopped starting low-priority queued jobs (switching from PD -> R)
- waited to have 24 CPUs free (in this case, to have no running jobs)
- run the high priority job
- when the job has completed, start the low priority jobs as usual

What I instead observed:
- slurm keep starting queue job like I didn't specified a nice parameter.


(partial) slurm config:
SwitchType=switch/none
TaskPlugin=task/none
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
NodeName=node01 CPUs=24 RealMemory=120000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2   State=UNKNOWN

Low priority job:
#SBATCH --job-name=task4
#SBATCH --ntasks=4
#SBATCH --mem=1gb
#SBATCH --time=10:00:00
#SBATCH --output=%j.out
#SBATCH --error=%j.err
#SBATCH --partition=ogre
#SBATCH --account=ogre
#SBATCH --nice=20

High priority job:
#SBATCH --job-name=task24
#SBATCH --ntasks=24
#SBATCH --mem=1gb
#SBATCH --time=10:00:00
#SBATCH --output=%j.out
#SBATCH --error=%j.err
#SBATCH --partition=ogre
#SBATCH --account=ogre
#SBATCH --nice=10

Do you have any idea of what I am missing?

Thanks a lot.
Matteo

Lyn Gerner

unread,
Apr 14, 2020, 8:46:17 AM4/14/20
to Slurm User Community List
Hi Matteo,

Hard to say without seeing your priority config values, but I'm guessing you want to take a look at https://slurm.schedmd.com/priority_multifactor.html.

Regards,
Lyn

Matteo F

unread,
Apr 14, 2020, 9:19:35 AM4/14/20
to Slurm User Community List
Hello Lyn, thanks for your reply. 
I checked my configuration; the PriorityType was set to "PriorityType=priority/basic" initially, so my tests refer to that configuration.
After you post, I set it to "PriorityType=priority/multifactor" and ran the tests again: the results are the same.
I was hoping to find a quick fix but it seems that I need to dig into the docs and play with the "factors".
Matteo

mercan

unread,
Apr 14, 2020, 9:57:34 AM4/14/20
to Slurm User Community List, Matteo F
Hi;

Did you restart slurmctld after changing
"PriorityType=priority/multifactor"?

Also your nice values are too small. It is not unix nice. Its range is
+/-2147483645, and it race with other priority factors at priority
factor formula. Look priority factor formula at
https://slurm.schedmd.com/priority_multifactor.html

Because of that, I think you should test with very high numbers for low
priority nice such as --nice 100000.

Regards;

Ahmet M.


14.04.2020 16:18 tarihinde Matteo F yazdı:

Matteo F

unread,
Apr 15, 2020, 2:56:15 AM4/15/20
to Slurm User Community List
Hi, 
just wanted to update this thread in case someone has a similar problem. 

I managed to achieve this result by:
- changing SelectTypeParameters=CR_Core_Memory To SelectTypeParameters=CR_CPU_Memory 
- giving the low priority job a very high "nice" value, near the end of the scale

With this configuration, when I submit an high priority job, slurm stop running other low priority already-queued jobs to run my new job.

Thanks everyone
Matteo
Reply all
Reply to author
Forward
0 new messages