[mej/nhc] 73d3c3: lbnl_cmd.nhc: Remove line numbers from dmesg check

0 views
Skip to first unread message

Michael Jennings

unread,
Sep 18, 2023, 5:27:06 PM9/18/23
to nhc-...@lbl.gov
Branch: refs/heads/mej/fix/143-nuke-dmesg-line-numbers
Home: https://github.com/mej/nhc
Commit: 73d3c399b607461b0ea5eebf8d08ecea08d3ecd9
https://github.com/mej/nhc/commit/73d3c399b607461b0ea5eebf8d08ecea08d3ecd9
Author: Michael Jennings <m...@lanl.gov>
Date: 2023-09-15 (Fri, 15 Sep 2023)

Changed paths:
M scripts/common.nhc
M scripts/lbnl_cmd.nhc
M test/nhc-test
M test/test_common.nhc
M test/test_lbnl_cmd.nhc

Log Message:
-----------
lbnl_cmd.nhc: Remove line numbers from dmesg check

When using `check_cmd_dmesg()` directly (as written in `scripts/lbnl_cmd.nhc`)
with a negated match string, the default behavior of
`check_cmd_output()` (which `check_cmd_dmesg()` wraps) used for error
reporting causes the "Reason" field to contain not only the match
string that was found (and shouldn't have been) but also the **_line
number_** where the match was found. In the case of `dmesg` output,
the line number is almost completely useless; moreover, it prevents
Slurm and other schedulers/RMs from being able to group all the
affected nodes together -- because the line numbers almost always
differ!

Granted that users/admins can override the default failure message
generation behavior (via `-M` entries, all of which are passed
directly to `check_cmd_output()`), but in the specific case of
`check_cmd_dmesg()`, I think the default behavior should suppress the
line numbers and use a simpler, more concise message instead.

This changeset adds a bit of pre-processing to `check_cmd_dmesg()`.
Each "mstr/message" pair" is examined, and for each match string (`-m`
argument) that doesn't have a corresponding message (`-M` argument)
that overrides the default will have a new default provided to it that
omits the extraneous information.

Addresses #143.


Reply all
Reply to author
Forward
0 new messages