Branch: refs/heads/1.4.4-dev
Home:
https://github.com/mej/nhc
Commit: b08769bbe0e7c93626eebd57dfdbfddd4ad5cac3
https://github.com/mej/nhc/commit/b08769bbe0e7c93626eebd57dfdbfddd4ad5cac3
Author: Michael Jennings <
m...@lanl.gov>
Date: 2023-04-17 (Mon, 17 Apr 2023)
Changed paths:
M scripts/common.nhc
M scripts/lbnl_cmd.nhc
Log Message:
-----------
scripts/lbnl_cmd.nhc: Use consistent spawn code
For some reason (probably simplicity), the `check_cmd_status()` check
function was using a different method of spawning a subprocess with a
timout than `check_cmd_status()` was. The `nhc_cmd_with_timeout()`
function was written specifically to facilitate consistency in coding
that exact functionality, but only `check_cmd_output()` was using it.
The `check_cmd_status()` check function was launching the subcommand
directly and trying to use `nhcmain_watchdog_timer()` to create its
watchdog timer process.
As observed in #104, `check_cmd_status()` (but *not*
`check_cmd_output()`) was leaving behind watchdog timer and `sleep`
processes that should have been terminated when the subcommand
exited. Unfortunately, `nhcmain_watchdog_timer()` was not written
with this use case in mind, nor was `kill_watchdog()` expecting to
have to clean up multiple child processes.
This will be addressed in 2 ways. For the "fix-only" 1.4.4 tree, I've
switched `check_cmd_status()` to using `nhc_cmd_with_timeout()` since
it uses a different mechanism and has not displayed this behavior.
The longer term fix will be to refactor the watchdog code in `nhc`
itself and use that code whenever launching subcommands is needed.
This should address #104 for the 1.4.4 branch but will be applied to
both for now, once tested and verified.
(cherry picked from commit 1dd668e57eaabd621153b08771c224ada6d806c0)