Monitoring consul agent health

488 views
Skip to first unread message

cosmop...@gmail.com

unread,
Apr 23, 2017, 11:36:46 PM4/23/17
to Prometheus Users
Hi,

Is there a suggested method for monitoring the health of consul agents to catch instances that haven't joined the consul cluster, or have left but not rejoined?

Our Prometheus deployment uses Consul for service discovery and primarily monitors infrastructure on AWS. My current thinking is that we should use EC2 service discovery to enumerate instances and scrape an exporter on some predefined port that exposes metrics that indicate whether consul is up and whether it has joined a cluster. Is that a sane approach, or am I thinking of this the wrong way?

If my thinking is correct, should I be deploying the regular consul exporter for this purpose? It seems like it's intended to be run only on consul servers, not on regular agents[0] and it exports quite a few metrics which aren't meaningful in the agent context. Is support for a flag that exposes only agent-specific metrics something that would be considered if I put in a PR, or would that be best left to a separate consul_agent_exporter?

Tobias Schmidt

unread,
Apr 24, 2017, 3:35:38 PM4/24/17
to cosmop...@gmail.com, Prometheus Users
Your thinking is sound. Unfortunately I believe there isn't public code supporting that so far.

At SoundCloud, before the consul_exporter existed, I built a custom module which executed `consul info` (or what the command is called) and parsed the text output to provide such metrics. That module is embedded in a private project of our custom service discovery layer.

Ideally we would have similar functionality in consul_exporter, but unfortunately consul doesn't export the required metrics via its HTTP stats interface, see https://github.com/prometheus/consul_exporter/issues/29.

You might be able to achieve something similar by using the blackbox_exporter.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a92cb428-9881-49da-9edb-f5ea363bb03c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cosmop...@gmail.com

unread,
Apr 25, 2017, 5:13:28 AM4/25/17
to Prometheus Users, cosmop...@gmail.com
Thanks for the quick reply Tobias.


On Tuesday, 25 April 2017 05:35:38 UTC+10, Tobias Schmidt wrote:
Ideally we would have similar functionality in consul_exporter, but unfortunately consul doesn't export the required metrics via its HTTP stats interface, see https://github.com/prometheus/consul_exporter/issues/29.

I believe it provided an HTTP endpoint for the metrics exposed via `consul info` and the old status RPC interface.

curl localhost:8500/v1/agent/self | jq .Stats

    "Stats": {
        "agent": {
            "check_monitors": "0",
            "check_ttls": "0",
            "checks": "1",
            "services": "1"
        },
        "build": {
            "prerelease": "",
            "revision": "'21f2d5a",
            "version": "0.7.5"
        },
        "consul": {
            "known_servers": "3",
            "server": "false"
        },
        "runtime": {
            "arch": "amd64",
            "cpu_count": "1",
            "goroutines": "34",
            "max_procs": "2",
            "os": "linux",
            "version": "go1.7.5"
        },
        "serf_lan": {
            "encrypted": "true",
            "event_queue": "0",
            "event_time": "13",
            "failed": "0",
            "health_score": "0",
            "intent_queue": "0",
            "left": "0",
            "member_time": "928",
            "members": "75",
            "query_queue": "0",
            "query_time": "22"
        }
    }

That may be a new feature, possibly due to the recent deprecation/removal of the RPC interface. Unfortunately it doesn't expose the full telemetry that Consul can publish to statsd (https://www.consul.io/docs/agent/telemetry.html), but most of that looks to be server-specific anyway.

Tobias Schmidt

unread,
May 13, 2017, 2:51:50 PM5/13/17
to cosmop...@gmail.com, Prometheus Users

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages