I am wanting to add the command “/usr/lpp/mmfs/bin/mmfsadm test verbs status” to NHC looking for the string “started” to verify systems are using RDMA for GPFS and not falling back to ethernet. It seems that setting any value for the environment variable DEBUG will cause the GPFS command to dump a huge amount of output, like doing “set -x”. If I set DEBUG= (no value) before the command then the normal output is given. I tried putting DEBUG= at beginning of command used for -e with check_cmd_output but that didn’t work [1]. Is there a way to set DEBUG= with check_cmd_output in a way that won’t break normal debugging, ie not putting something like “* || export DEBUG=”. I have verified that putting the mmfsadm command in a wrapper script and using that for -e works around the problem but would prefer not using a wrapper script if possible.
Thanks,
- Trey
[1]:
Running check: "check_cmd_output -t 3 -m /started/ -e 'DEBUG= /usr/lpp/mmfs/bin/mmfsadm test verbs status'"
DEBUG=: line 0: exec: DEBUG=: not found
ERROR: nhc-monitor: Health check failed: check_cmd_output: 127 returned by "DEBUG= /usr/lpp/mmfs/bin/mmfsadm test verbs status".
--
Trey Dockendorf
HPC Systems Engineer
Ohio Supercomputer Center
--
You received this message because you are subscribed to the Google Groups "LBNL Node Health Check" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nhc+uns...@lbl.gov.
To post to this group, send email to n...@lbl.gov.
Visit this group at https://groups.google.com/a/lbl.gov/group/nhc/.
Thanks for pointing me at mmhealth, looks like a system not using RDMA will show as unhealthy which is good. The issue of DEBUG exists with mmhealth too where any value set for DEBUG environment variable will produce huge amounts of output and I’m guessing NHC internally is setting DEBUG=0.
Thanks,
- Trey
--
Trey Dockendorf
HPC Systems Engineer
Ohio Supercomputer Center