[slurm-users] Understanding fairshare factor

986 views
Skip to first unread message

Michał Kadlof

unread,
Jan 12, 2022, 6:01:50 AM1/12/22
to Slurm User Community List

Hello,

I'm trying to understand behavior of fairshare factor. I set a munin monitoring for several accounts and observe the changes in time, and they're not clear for me.

A background:
My users are split into two groups: sfglab and faculty,
in sfglab every one are equal, and in faculty they are additionally split into project accounts i which they they are also equal.

for example:
root
    sfglab
        sfglab_user_1
        sfglab_user_2
        ...
    faculty
        project_1
            faculty_user_1
            faculty_user_2
        project_2
            faculty_user_1
            faculty_user_3
        ...

I do have 2 particularly active users that run a large jobs in sfglab, and activity of faculty users is very variable. From very active to dead souls.

Now this is fairshare factor for last week:

https://i.imgur.com/2sfOUFn.png

Image also available on-line:
https://i.imgur.com/2sfOUFn.png

What I would expect there should be rather smooth changes instead of those high point changes.
I would also expect to see some small constant changes from PriorityDecayHalfLife, which is set for two weeks and recalculated every 5 minutes.

It would be great if someone could comment on that, before my users will start comply on low priority of their jobs.

Here is my Priority config.

PriorityParameters      = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife   = 14-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = No
PriorityFlags           =
PriorityMaxAge          = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType            = priority/multifactor
PriorityWeightAge       = 100000
PriorityWeightAssoc     = 0
PriorityWeightFairShare = 200000
PriorityWeightJobSize   = 0
PriorityWeightPartition = 0
PriorityWeightQOS       = 0
PriorityWeightTRES      = (null)


--
best regards | pozdrawiam serdecznie
Michał Kadlof

Ewan Roche

unread,
Jan 14, 2022, 10:14:40 AM1/14/22
to Slurm User Community List
Hello Michał,
the behaviour is what I’d expect from the fair-tree algorithm which is based on a binary search. Fair-tree has been the default for the past few Slurm releases.

Here the algorithm has to decide which of sgflab or faculty (they’re on the same level in the hierarchy) has the higher priority so goes to the front of the queue - it’s more or less 1 or 0 (numerically 0.9 vs 0.2 in your results)

When sgflab is at 0.9 it means that their accumulated usage taking into account the decay is less than that of all the faculty users and when they have used more than all the faculty it flips to 0.2.

If you were looking at the level fair-share then there might be a more gradual change but ultimately it’s a question of, at any one time, who has a higher priority. With only two accounts at this level in competition the result will always look extreme as it flips between who is ahead and who is behind.

The original presentation about fair-tree from Ryan Cox and Levi Morrison is well worth reading and is at

https://slurm.schedmd.com/SUG14/fair_tree.pdf



Ewan Roche

Division Calcul et Soutien à la Recherche
UNIL | Université de Lausanne


> On 12 Jan 2022, at 12:01, Michał Kadlof <m.ka...@mini.pw.edu.pl> wrote:
>
> Hello,
>
> I'm trying to understand behavior of fairshare factor. I set a munin monitoring for several accounts and observe the changes in time, and they're not clear for me.
>
> A background:
> My users are split into two groups: sfglab and faculty,
> in sfglab every one are equal, and in faculty they are additionally split into project accounts i which they they are also equal.
>
> for example:
> root
> sfglab
> sfglab_user_1
> sfglab_user_2
> ...
> faculty
> project_1
> faculty_user_1
> faculty_user_2
> project_2
> faculty_user_1
> faculty_user_3
> ...
>
> I do have 2 particularly active users that run a large jobs in sfglab, and activity of faculty users is very variable. From very active to dead souls.
>
> Now this is fairshare factor for last week:
>
>
>
Reply all
Reply to author
Forward
0 new messages