[slurm-users] memory limits:: why job is not killed but oom-killer steps up?

1,162 views
Skip to first unread message

Adrian Sevcenco

unread,
Jan 12, 2022, 5:05:18 PM1/12/22
to Slurm User Community List

Hi! I have a problem with the enforcing the memory limits...
I'm using the cgroup to enforce the limits and i had expected that when
cgroup memory limits are reach the job is killed ..
instead i see in log a lot of oom-killer reports that act only a certain process
from cgroup ...

Did i missed anything in my configuration? I have the following:

SelectType=select/cons_res
SelectTypeParameters=CR_CPU_MEMORY,CR_LLN

the partition have:
DefMemPerCPU=3950 MaxMemPerCPU=4010 (i understood that these are MiB, and physically i have 4GiB/thread)

cat cgroup.conf
CgroupAutomount=yes
TaskAffinity=no
ConstrainCores=yes
ConstrainRAMSpace=yes

ProctrackType=proctrack/cgroup

JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=task=15,filesystem=120
JobAcctGatherParams=UsePss

TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=autobind=threads

Is there a problem with my expectation that i should not see oom-killer?
or with my configuration?

Thank you!
Adrian

Hermann Schwärzler

unread,
Jan 13, 2022, 4:00:06 AM1/13/22
to slurm...@lists.schedmd.com
Hi Adrian,

ConstrainRAMSpace=yes

has the effect that when the memory the job requested is exhausted the
processes of the job will start paging/swapping.

If you want to stop jobs that use more memory (RSS to be precise) than
they reqeusted, you have to add this to your cgroup.conf:

ConstrainSwapSpace=yes
AllowedSwapSpace=0

Regards,
Hermann

Adrian Sevcenco

unread,
Jan 13, 2022, 5:50:50 AM1/13/22
to slurm...@lists.schedmd.com
On 13.01.2022 10:59, Hermann Schwärzler wrote:
> Hi Adrian,
Hi!

> ConstrainRAMSpace=yes
>
> has the effect that when the memory the job requested is exhausted the processes of the job will start paging/swapping.
>
> If you want to stop jobs that use more memory (RSS to be precise) than they reqeusted, you have to add this to your
> cgroup.conf:
>
> ConstrainSwapSpace=yes
> AllowedSwapSpace=0
ooh, thanks a lot!!! now i see that only AllowedSwapSpace have the comment:
"If the limit is exceeded, the job steps will be killed"

Thanks a lot!!
Adrian
--
----------------------------------------------
Adrian Sevcenco, Ph.D. |
Institute of Space Science - ISS, Romania |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------
Reply all
Reply to author
Forward
0 new messages