[slurm-users] High log rate on messages like "Node nodeXX has low real_memory size"

14 views
Skip to first unread message

Per Lönnborg

unread,
May 12, 2022, 7:33:38 AM5/12/22
to slurm...@lists.schedmd.com
Greetings,

is there a way to lower the log rate on error messages in slurmctld for nodes with hardware errors? 

We see for example this for a node that has DIMM errors:

[2022-05-12T07:07:34.757] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:35.760] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:36.763] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:37.766] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:38.769] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:39.773] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:40.776] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:41.779] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:42.781] error: Node node37 has low real_memory size (257642 < 257660)
[2022-05-12T07:07:45.143] error: Node node37 has low real_memory size (257642 < 257660)

The log warning is correct, the node has DIMM errors, but that´s one log entry per second. That doesn´t seem right with such high log rate?

Thanks,
/ Per Lonnborg




_______________________________________________________________
Annons: Handla enkelt och smidigt hos Clas Ohlson

Bjørn-Helge Mevik

unread,
May 12, 2022, 8:19:22 AM5/12/22
to slurm...@schedmd.com
Per Lönnborg <per...@passagen.se> writes:

> Greetings,

God dag!

> is there a way to lower the log rate on error messages in slurmctld for nodes with hardware errors?

You don't say which version of Slurm you are running, but I think this
was changed in 21.08, so the node will only try to register once if it
has too little memory, thus only giving one such message. (The node
will then hva state "inval" in sinfo.)

--
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

signature.asc

Per Lönnborg

unread,
May 12, 2022, 8:41:27 AM5/12/22
to slurm...@lists.schedmd.com

Ok, that sounds promising!

I "forgot" to tell our version because it´s a bit embarrising - 19.05.8...

Thanks,
/ Per

Bjørn-Helge Mevik

unread,
May 12, 2022, 8:57:36 AM5/12/22
to slurm...@schedmd.com
Per Lönnborg <per.lo...@fra.se> writes:

> I "forgot" to tell our version because it´s a bit embarrising - 19.05.8...

Haha! :D

--
B/H
signature.asc

Paul Edmon

unread,
May 12, 2022, 9:41:59 AM5/12/22
to slurm...@lists.schedmd.com

They fix this in newer versions of Slurm.  We had the same issue with older versions so we hard to run with the config_override option on to keep the logs quiet.  They changed the way logging was done in the more recent releases and its not as chatty.

-Paul Edmon-

Reply all
Reply to author
Forward
0 new messages