Using NetCheck to check server health on a distributed system

15 views
Skip to first unread message

rsignell

unread,
Aug 12, 2009, 6:52:24 AM8/12/09
to IOOS Model Data Interoperability Working Group
In any distributed system (especially one involving a bunch of new
TDS4 installations!), it's a good idea to have something periodically
checking to make sure servers are up and delivering datasets as
intended.

There is a very nice java program that Bob Simons (in Roy
Mendelssohn's group at NOAA's Southwest Fisheries Science Center)
wrote specifically for this purpose:

http://coastwatch.pfeg.noaa.gov/coastwatch/NetCheck.html

I have this running to check a number of IOOS TDS4 top level catalogs
as well as some specific OPeNDAP responses for certain datasets. The
way I'm doing this (suggested by Bob Simons) is to simply so a small
subset on a certain variable (say a 10x10 lon/lat region at a single
level at a single time step) and request an ASCII response from
OPeNDAP. I then compare the response to see if it produced an array
of the right type and size, and if so, the test succeeds.

I have it set up to ping all the servers listed on
http://coast-enviro.er.usgs.gov/thredds/ioos_catalog_top.html
every 5 minutes, and I get a report on the entire system only if the
status is changed (from "All PASSED" to "Some FAILED" or vice
versa). On one specific test, I currently have changes in status
sent as a text message to my cell phone, just as a test. And in the
near future, I plan to add the e-mails of the folks responsible for
those systems so that they are notified when their individual servers
fail.

I've uploaded my sample "NetCheck.xml" config file to the files
section
http://groups.google.com/group/ioos_model_data_interop/files

-Rich
Reply all
Reply to author
Forward
0 new messages