Hey Darron,
Many of the metrics aren’t immediately useful, and are mostly there to assist with debugging.
The key ones I would watch:
* consul.raft.commitTime - The number of write transactions and associated latency
* consul.serf.queue.Event - The backlog of events in the queue, good to catch a bad client from flooding
* The ACL metrics (cache_hits, cache_miss, fault, resolveToken) are useful if ACLs are enabled
1) Ultimately however, we do much higher level monitoring of health. We treat Consul as a black box.
Just attempting a periodic read/write of a key is enough for our monitoring of the cluster.
2) MeasureSince is in msec yes.
3) No, there isn’t a metric for this. You can always query the nodes to ask them, but
it’s not generally an actionable metrics.
4) Not really, since a commit only considers that a quorum of nodes agree. It could be indefinite for
all the peers to commit a change (assuming 1/3 servers has failed).
Hope that helps!
Best Regards,
Armon Dadgar