Drexel Internal Data
One should keep in mind that sacct results for memory usage are not accurate for Out Of Memory (OoM) jobs. This is due to the fact that the job is typically terminated prior to next sacct polling period, and also terminated prior to it reaching full memory allocation. Thus I wouldn't trust any of the results with regards to memory usage if the job is terminated by OoM. sacct just can't pick up a sudden memory spike like that and even if it did it would not correctly record the peak memory because the job was terminated prior to that point.
-Paul Edmon-
On Mar 15, 2021, at 12:53 PM, Chin,David <dw...@drexel.edu> wrote:
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
|
External. |
Drexel Internal Data
|
External. |
Drexel Internal Data
|
External. |
Drexel Internal Data
Drexel Internal Data
------------------------------------------------------------
Chad DeWitt, CISSP | University Research Computing
UNC Charlotte | Office of OneIT
ccde...@uncc.edu | https://oneit.uncc.edu
------------------------------------------------------------
[Caution: Email from External Sender. Do not click or open links or attachments unless you know this sender.]
UoM notice: External email. Be cautious of links, attachments, or impersonation attempts
|
External. |
Drexel Internal Data
UoM notice: External email. Be cautious of links, attachments, or impersonation attempts
Hi, Sean:
Slurm version 20.02.6 (via Bright Cluster Manager)
ProctrackType=proctrack/cgroupJobAcctGatherType=jobacct_gather/linux
JobAcctGatherParams=UsePss,NoShared
I just skimmed https://bugs.schedmd.com/show_bug.cgi?id=5549 because this job appeared to have left two slurmstepd zombie processes running at 100%CPU each, and changed to:
ProctrackType=proctrack/cgroupJobAcctGatherType=jobacct_gather/cgroupJobAcctGatherParams=UsePss,NoShared,NoOverMemoryKill
Have asked the user to re-run the job, but that has not happened, yet.
cgroup.conf:
CgroupMountpoint="/sys/fs/cgroup"CgroupAutomount=yesTaskAffinity=yesConstrainCores=yesConstrainRAMSpace=yesConstrainSwapSpace=noConstrainDevices=yesConstrainKmemSpace=yesAllowedRamSpace=100.00AllowedSwapSpace=0.00MinKmemSpace=200MaxKmemPercent=100.00MemorySwappiness=100MaxRAMPercent=100.00MaxSwapPercent=100.00MinRAMSpace=200