Consul ESM (External Service Monitor) -

557 views
Skip to first unread message

Amal Gangodkar

unread,
Apr 26, 2018, 1:09:42 AM4/26/18
to Consul
We are in the process of using Consul to monitor external services. We had earlier setup consul in a traditional way with three Consul Servers and 3 Consul Agents. The Agents were monitoring external services with checks being performed via HTTP API. All of this seems to be working perfectly fine. Except that we now have an issue when a agent node suddenly does down or losses its connectivity to the Server nodes the service status is incorrectly shown as failing on the Consul Server.  

consul_esm has been called out for such a use case , however other then the github - https://github.com/hashicorp/consul-esm/ I have not found any document which describes the setup.

Here is what we did so far a) Spun a new instance and registered that with Consul as an External-Node with Services and checks its likely to monitor b) Setup the consul_esm demon on an Consul Agent node. It seems the checks were able to be executed and passing on the remote node.  As I understand multiple such ESM deamons can be setup and it would ensure that it has a leader to execute the checks. If the external deamon is unable to reach the node for a specified period of time the Services are unregistered. But in case the node itself is down the service is marked as failing - which is identical to the normal consul setup. 

I feel a bit confused about what value-add is consul_esm providing over traditional Consul setup; with Agent nodes doing the checks and communicating with Consul servers. As opposed to going via the ESM deamon

The main question is has anyone implemented consul_esm on a large scale? Is there any Architecture recommendation for this kind of setup? Is there any detailed discussion/documentation about such a use case - utilizing consul more for Service Monitoring ( External) rather then using it more as a Service Discovery tool etc.


pba...@hashicorp.com

unread,
Apr 26, 2018, 7:48:15 AM4/26/18
to Consul
One reason to use ESM is that it doesn't participate in Gossip. That means you can run it on an instance that for some reason can't run a full agent, and still have local-only check scripts.

Some people use this if they have legacy infra that for some reason can't run an agent but still wants to expose services via consul. Reasons it can't run an agent might include restrictive network policy that won't allow gossip traffic to that host or the legacy service running on a OS or CPU architecture that Consul can't be built on (ESM is way simpler and buildable on several platforms Consul itself doesn't support).

Honestly ESM has pretty niche use-cases that have been important to a few people but that's one reason there aren't lots of guides etc. for it.

> The main question is has anyone implemented consul_esm on a large scale?

I believe it is in use in production by at least one large HashiCorp customer. I can't comment on who or what as I don't believe they've chosen to make that info public anywhere.

> Is there any Architecture recommendation for this kind of setup? Is there any detailed discussion/documentation about such a use case - utilizing consul more for Service Monitoring ( External) rather then using it more as a Service Discovery tool etc.

I think you've found all that's been published. We could certainly explain a little more about the cases where ESM is useful in the readme and/or external services guide. In general it's been pretty low key as it's largely solving for some fairly specific use-cases.

I don't think ESM changes the focus on Consul to be "more" monitoring focussed, it is a solution for extending the existing feature set in specific situations.
Reply all
Reply to author
Forward
0 new messages