Docker Health Checks in Consul

2,409 views
Skip to first unread message

Brandon Okert

unread,
Feb 25, 2016, 10:26:37 PM2/25/16
to Consul
The Consul documentation supports a docker health check, as defined here: https://www.consul.io/docs/agent/checks.html. However, it does not make it clear how to make this check accessible by the consul agent. To clarify: The consul client is a container, and the other services are on separate containers. I have a script on the service containers that serves as the health check script, and I want the consul client container to be able to execute it.

The description says you can either use the Docker HTTP API, or use the docker unix socket. The latter seems like a bad idea, due to the common warning thats passing the socket to your containers enables them to gain root access (https://www.lvh.io/posts/dont-expose-the-docker-socket-not-even-to-a-container.html).

The former looks more desirable, but the docker documentation seems to imply that you need to expose the socket (or overwrite it with your own) in order to use the api. This doesn't seem to avoid the security issue above.

Is there another way to enable consul to run health checks on containers? I'm toying with the idea of just setting up a python simple webserver that serves the output of a health check result file on each container, and letting a local service run periodically to update the file, but this obviously seems dirty (though it's basically how sensu and nagious and other related checks work).

Currently when I run the health check, I just get:
Unable to create Exec, error: Post http://unix.sock/containers/haproxy_1/exec: dial unix /var/run/docker.sock: connect: no such file or directory

If mounting the socket is the only way, then 1 last question: how do I mount a file from a docker-machine into a container, from the host. I run docker run... from my osx machine, and /var/run/docker.sock does not exist.

Cheers!

James Phillips

unread,
Feb 25, 2016, 10:38:46 PM2/25/16
to consu...@googlegroups.com
Hi Brandon,

You can get Consul to use Docker's HTTP API by setting DOCKER_HOST=tcp://127.0.0.1:2375 in Consul's environment. I'm definitely not a Docker security expert so maybe folks on the list can chime in, but it looks like TLS with cert verification is an option for securing that interface - https://docs.docker.com/engine/security/https/.

That's probably simpler than the embedded web server if you can get it working. Hope that helps!

-- James

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/957ebc50-53ef-40c1-b394-08cd6127bc5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brandon Okert

unread,
Feb 25, 2016, 11:56:18 PM2/25/16
to consu...@googlegroups.com
Thanks for the quick reply James!

I set the variable, and the situation has improved marginally: I no longer get the same error, but now I get: "Unable to create Exec, error: cannot connect to Docker endpoint"

I assumed that like everything else I could use the container name instead of the id. But trying with the actual id has the same issue.

Here's the full json payload to register the check:

{
"id": "elophant_haproxy_1",
"name": "elophant-haproxy",
"address": "elophant_haproxy_1",
"port": 81,
"check": {
  "id": "elophant_haproxy_1",
  "name": "Container Local Health Check",
  "docker_container_id": "349d188308d1c2766a49649f511a8b1031ed03e9e96fae19a98bf044b6984243",
  "shell": "/bin/bash",
  "script": "/usr/bin/health-check.sh",
  "interval": "15s",
  "timeout": "5s"
}
}

And the request on the container:
curl -X PUT -H "content-type:application/json" --data "<json from above>" ${CONSUL_CLIENT_ADDRESS}/v1/agent/service/register

The service registers fine, and the check too. Consul just can't call the script.

Are there further consul logs I can enable to see what's wrong, or a way to test the check locally? The consul client logs don't seem all that useful:

2016-02-26_04:13:10.68805     2016/02/26 04:13:10 [INFO] agent: Synced service 'elophant_haproxy_1'
2016-02-26_04:13:11.01571     2016/02/26 04:13:11 [INFO] agent: Synced service 'elophant_haproxy_2'
2016-02-26_04:16:22.26907     2016/02/26 04:16:22 [INFO] agent: Synced check 'service:elophant_haproxy_1'
2016-02-26_04:19:48.61722     2016/02/26 04:19:48 [INFO] agent: Synced check 'service:elophant_haproxy_2'


You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/ScKedLQXmxg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAGoWc04WhEicRZTFbFWeq-179S9Ha-bwypR9tEhdoxhiVZ2z1Q%40mail.gmail.com.

Brandon Okert

unread,
Feb 25, 2016, 11:58:20 PM2/25/16
to Consul
Thanks for the quick reply James!

I set the variable, and the situation has improved marginally: I no longer get the same error, but now I get: "Unable to create Exec, error: cannot connect to Docker endpoint"

I assumed that like everything else I could use the container name instead of the id. But trying with the actual id has the same issue.

Here's the full json payload to register the check:

{
"id": "elophant_haproxy_1",
"name": "elophant-haproxy",
"address": "elophant_haproxy_1",
"port": 81,
"check": {
  "id": "elophant_haproxy_1",
  "name": "Container Local Health Check",
  "docker_container_id": "349d188308d1c2766a49649f511a8b1031ed03e9e96fae19a98bf044b6984243",
  "shell": "/bin/bash",
  "script": "/usr/bin/health-check.sh",
  "interval": "15s",
  "timeout": "5s"
}
}

And the request on the container:
curl -X PUT -H "content-type:application/json" --data "<json from above>" ${CONSUL_CLIENT_ADDRESS}/v1/agent/service/register

The service registers fine, and the check too. Consul just can't call the script.

Are there further consul logs I can enable to see what's wrong, or a way to test the check locally? The consul client logs don't seem all that useful:

2016-02-26_04:13:10.68805     2016/02/26 04:13:10 [INFO] agent: Synced service 'elophant_haproxy_1'
2016-02-26_04:13:11.01571     2016/02/26 04:13:11 [INFO] agent: Synced service 'elophant_haproxy_2'
2016-02-26_04:16:22.26907     2016/02/26 04:16:22 [INFO] agent: Synced check 'service:elophant_haproxy_1'
2016-02-26_04:19:48.61722     2016/02/26 04:19:48 [INFO] agent: Synced check 'service:elophant_haproxy_2'

James Phillips

unread,
Feb 26, 2016, 12:08:34 AM2/26/16
to consu...@googlegroups.com
There's a log-level=debug option you can set, but in this case it's putting the information in the health check output - looks like it is trying to use the HTTP interface but it's not there. I think you have to tell Docker to listen on the HTTP interface. I was looking at my development setup and had this in the bootstrapping script to do that:

echo 'DOCKER_OPTS=\"-H tcp://127.0.0.1:2375\"' >>/etc/default/docker

Brandon Okert

unread,
Feb 26, 2016, 12:32:53 AM2/26/16
to Consul
I'm afraid I'm not familiar enough with docker-machine to know where to put that, or how to setup a bootstrapping script. Is there a command in the machine that I can run to set it temporarily for testing? 

Also, I thought that docker machines automatically were listing on the http interface to enable docker environment setup. Here's some more info on the machine:

$ docker-machine env default
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.101:2376"
export DOCKER_CERT_PATH="XXX/.docker/machine/machines/default"
export DOCKER_MACHINE_NAME="default"

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
default   *        virtualbox   Running   tcp://192.168.99.101:2376           v1.10.1

Thanks again for all your help!

Brandon Okert

unread,
Feb 26, 2016, 2:18:16 AM2/26/16
to Consul
I checked the daemon inside the machine and it's already passing -H tcp://0.0.0.0:2376 to the daemon.

I tried updating the container to use port 2376 but still no luck...

James Phillips

unread,
Mar 10, 2016, 1:21:57 AM3/10/16
to consu...@googlegroups.com
Hi Brandon,

Not sure if you figured this out yet but we did find an issue with newer versions of Docker that will be fixed in the next release of Consul - https://github.com/hashicorp/consul/issues/1706 - that might be related to what you were seeing.

-- James

Brandon Okert

unread,
Mar 30, 2016, 5:21:44 PM3/30/16
to consu...@googlegroups.com
Hey James,

I ended up finding a different solution to running health checks. I did manage to setup TLS exposure of the docker socket in a separate instance though - there was probably a mis-configuration in how I originally had it.

Reply all
Reply to author
Forward
0 new messages