Hey Cameron,
Monitoring the health of a cluster is rather tricky. I think that a great first stab is to check
the /v1/status/leader endpoint is returning a node. If not, then your cluster is in an outage
situation and that should be critical.
After that, you can just check for failed servers by using /v1/health/service/consul endpoint
to query all the known Consul servers. Then you can use another request to /v1/health/service/consul?passing=1
to get the ones passing health checks.
This will let you determine what number of nodes are failed, and warn / fail as appropriate.
Hope that helps!
Best Regards,
Armon Dadgar