[slurm-users] multifactor priority calculation

308 views
Skip to first unread message

z1...@arcor.de

unread,
Jun 13, 2022, 8:53:39 AM6/13/22
to slurm-users

Dear all,

I noticed different priority calculations by running a pipe, the
settings are for example:

PriorityType=priority/multifactor
PriorityWeightJobSize=100000
AccountingStorageTRES=cpu,mem,gres/gpu
PriorityWeightTRES=cpu=1000,mem=2000,gres/gpu=3000

No age factor or something else from the plugin.


The calculated priority for memory and cpus:
mem 1024, cpus 1, priority 25932
mem 123904, cpus 14, priority 38499
mem 251904, cpus 28, priority 20652
mem 251904, cpus 28, priority 14739


gres or gpu was not available on the jobs/instances.


Someone know why the priority changed with the same cpu and mem input?

The priority with these settings should be descending, highest priority
for mem 251904 with cpus 28 and lowest priority for mem 1024 with cpus 1.


Many thanks,

Mike

Lyn Gerner

unread,
Jun 13, 2022, 2:51:08 PM6/13/22
to Slurm User Community List
Mike, it feels like there may be other PriorityWeight terms that are non-zero in your config. QoS or partition-related, perhaps?

Regards,
Lyn

z1...@arcor.de

unread,
Jun 13, 2022, 4:03:39 PM6/13/22
to slurm...@lists.schedmd.com

Hello Lyn,
only the priority settings I wrote as example are in the slurm config.

Maybe I found the missing peace.
It looks like the priority (for some jobs?) in the slurm (19.05.5)
database is not updated. I retrieve these values via slurmdb over
pyslurm.

This would be a problem for my purposes, the priority values from squeue
seem to fit.

Is this a bug?

Regards,
Mike

Williams, Gareth (IM&T, Black Mountain)

unread,
Jun 13, 2022, 7:10:08 PM6/13/22
to Slurm User Community List
Perhaps run 'sprio -l' and 'sprio -lw' to get more insight into the current priority calculation for pending jobs.

Gareth

z1...@arcor.de

unread,
Jun 14, 2022, 3:28:04 AM6/14/22
to slurm...@lists.schedmd.com
Thank you.

The output from sprio -lw:
---
JOBID PARTITION USER PRIORITY SITE AGE ASSOC
FAIRSHARE JOBSIZE PARTITION QOS NICE TRES
Weights 1 0
0 0 100000 0 0
cpu=1000,mem=2000
---

The gpu is now removed, I changed this setting.

To check the db data against squeue data, the results are not encouraging.

It seems that the priority is changing and not updated for already
pending jobs.

---
job priority for id 299185 up-to-date 38499 match with 38499 - req_cpus:
14 req_mem: 123904
job priority for id 299187 up-to-date 38499 match with 38499 - req_cpus:
14 req_mem: 123904
job priority for id 299189 up-to-date 38499 match with 38499 - req_cpus:
14 req_mem: 123904
<cut>
KeyError! job id from squeue 299250 not in db
KeyError! job id from squeue 299251 not in db
job priority for id 299177 up-to-date 25932 match with 25932 - req_cpus:
1 req_mem: 1024
job priority for id 299179 up-to-date 25932 match with 25932 - req_cpus:
1 req_mem: 1024
job priority for id 299181 up-to-date 25932 match with 25932 - req_cpus:
1 req_mem: 1024
<cut>
job priority for id 299248 outdated 25932 to 17282 - req_cpus: 1
req_mem: 1024
job priority for id 299249 outdated 25932 to 17282 - req_cpus: 1
req_mem: 1024
job priority for id 299252 outdated 25932 to 17282 - req_cpus: 1
req_mem: 1024
job priority for id 299178 outdated 38499 to 25581 - req_cpus: 14
req_mem: 123904
job priority for id 299180 outdated 38499 to 25581 - req_cpus: 14
req_mem: 123904
---


Following job id 299178 as a short example:
---
[06:33:00] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:33:16] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:33:28] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:33:47] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:34:05] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:36:10] job priority for id 299178 up-to-date 38499 match with 38499
- req_cpus: 14 req_mem: 123904
[06:43:29] job priority for id 299178 outdated 38499 to 25581 -
req_cpus: 14 req_mem: 123904
[06:43:29] job id 299178 pending
[06:43:49] job priority for id 299178 outdated 38499 to 25581 -
req_cpus: 14 req_mem: 123904
[06:43:49] job id 299178 running
[06:46:30] job priority for id 299178 outdated 38499 to 25581 -
req_cpus: 14 req_mem: 123904
[06:46:42] job priority for id 299178 outdated 38499 to 25581 -
req_cpus: 14 req_mem: 123904
[06:47:03] job priority for id 299178 outdated 38499 to 25581 -
req_cpus: 14 req_mem: 123904
---

I can see on other jobs the priority value is not updated in the db at all.
Why are the values even changing?


Regards,
Mike

Loris Bennett

unread,
Jun 14, 2022, 4:12:19 AM6/14/22
to Slurm User Community List
I may be stating the obvious, but if a user has running jobs consuming
resources, then the priority of the waiting jobs for that user will
fall. If not, the priority will stay the same.

Cheers,

Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Reply all
Reply to author
Forward
0 new messages