Hello!
A while ago (before GCS 5.4.78 came out) I had cause to watch the status of the GCS services a little more closely. We put our DTNs into SLURM—even though we don’t run jobs on them—so that node problems (typically identified by NHC) can be flagged to us through SLURM (by draining the node, even though it doesn’t run jobs).
While doing some NHC configuration updates, I thought others might be interested in the Globus-related NHC entries I have, so I’m sharing them here!
Here are the nhc.conf entries I’m using:
HOSTNAME || check_ps_service globus-gridftp-server
HOSTNAME || check_ps_service -u apache httpd
HOSTNAME || check_ps_service -f -m "* /opt/globus/bin/gunicorn *" -u gcsweb gcs_manager
HOSTNAME || check_file_test -S /run/gcs_manager.sock
HOSTNAME || check_file_test -S /var/run/globus-connect-server/control
HOSTNAME || check_file_test -S /var/run/globus-connect-server/ipc
HOSTNAME || check_ps_service -f -m "* /opt/globus/bin/globus-connect-server assistant" -u gcsweb gcs_manager_assistant
Lines 1, 2, 3-6, and 7, respectively, are used to monitor GridFTP, Apache, the GCS Manager, and the GCS Manager Assistant. The GCS Manager and GCS Manager Assistant are Python-based services, so matching the correct process name is a little complicated. Also, this is on an Enterprise Linux system; your Apache process and/or user names may be different.
Also, we don’t run any other web services on our DTNs, so checking “Is Apache running?” is enough for us.
Finally, the socket-file checks on Line 4-6 might not really be needed. All NHC does is check if the socket file exists, but it doesn’t check if the socket file is actually hooked up to anything. Still, I decided to add it anyway.
Hopefully these NHC entries are useful to others!
--
A. Karl Kornel | Info. Sys. Specialist
UIT Research Computing | Stanford University
+1 (650) 736-932