Command health-checks with docker

1,476 views
Skip to first unread message

David McKinley

unread,
Apr 23, 2015, 3:26:32 PM4/23/15
to marathon-...@googlegroups.com
I have been trying to figure out a practical way to execute a health-check command within an application task that is deployed via the docker containerizer.  I have not seen any discussions on this topic, and it seems like something of a challenge.  The "obvious" solution would seem to be to have the command that is listed within the health-check specification execute within the application container - i.e., via a "docker exec" operation.  Unfortunately, what seems to happen is that the command is executed within the context of the mesos-slave - which itself is running in a docker container in my CoreOS environment.

I can run a "docker exec" command from there, but the problem is not knowing the container ID for the application.  The command can include references to the IP address and assigned PORTs of the application task, but I have not seen that there is a similar substitution variable for container ID -- and I can imagine it would be more of a challenge to have such a thing, as I guess the HOST and PORTs are part of the resource offer made by mesos to marathon, whereas Container ID would not be.

I have found a way to do this, assuming that there is a mapped port that can be used to identify which container is which on the mesos-slave.  In my particular case, I have a mapping for port 22 (sshd) defined to an allocated port number.  With this, I can create a command line attached to the health-check specification that extracts the container ID from a "docker ps" command output, and then use it in a "docker exec" command.  Here is the JSON file that creates my application:

{
  "id":"hc-test",
  "cpus":0.1,
  "mem":16,
  "instances":4,
  "container": {
    "type":"DOCKER",
    "docker": {
      "image": "registry.cgbu/ubuntu_sshd",
      "network":"BRIDGE",
      "portMappings":[
        { "containerPort":22, "hostPort":0 }
      ]
    }
  },
  "healthChecks": [
    {
      "protocol":"COMMAND",
      "command": { "value":"docker exec $(docker ps | grep 0\\.0\\.0\\.0:$PORT-\\>22/tcp | cut -f1 -d\\ ) /root/bin/health-check" },
      "gracePeriodSeconds":30,
      "intervalSeconds":20,
      "timeoutSeconds":5,
      "maxConsecutiveFailures":3
    }
  ]
}

The problem with this approach, it seems to me, is just that it is a rather convoluted way to accomplish what seems like would be a common need - the ability to run a command within the application task container that reports on its health.

Have I missed something that would make this simpler?  Several things could potentially simplify it:

a) The COMMAND health check, when run for a task started by the docker containerizer could by default execute the command within the appropriate container.  That would be a change in behavior, it seems, but I haven't found anything that actually documents what context the command runs in today (through experimentation, I found it runs in the mesos-slave container).

b) There could be an option added to the healthCheck input in the JSON file, something like "context":"container" or "context":"host", to identify where the command line should be executed - within the container or in the host.

c) There could be an environment variable set with the container ID, which could be referenced by the health-check command, so the command could be simplified to something like, "docker exec $CONTAINER_ID /root/bin/health-check".

Have there been any previous discussions about this?

Jakub Veverka

unread,
Apr 26, 2015, 5:26:58 PM4/26/15
to marathon-...@googlegroups.com
Hi David, 

I am also using coreos+mesos stack ;) 

I haven't got into this but I think you did nice research about this and I agree docker ps ... is definitely not good looking. 

I haven't tested it but I believe you could use marathon task specific env MESOS_TASK_ID (https://mesosphere.github.io/marathon/docs/task-environment-vars.html) to compose docker container name. 
I didn't test it though, just throwing idea your direction ;) 

Let me know how it went.

Jakub

Dario Rexin

unread,
Apr 26, 2015, 5:49:37 PM4/26/15
to Jakub Veverka, marathon-...@googlegroups.com
Hi,

unfortunately Marathon does not get any information about the docker id from Mesos. I think it would be a great addition, so please check if there are open tickets on Mesos and +1 them or create a new one it it doesn't exist, yet. When this feature lands in Mesos, we will add it to Marathon asap.

Thanks,
Dario


--
You received this message because you are subscribed to the Google Groups "marathon-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marathon-framew...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bekir Dogan

unread,
Jul 10, 2015, 2:20:17 PM7/10/15
to marathon-...@googlegroups.com, veverk...@gmail.com
Hi,

I just fell into very same problem but not able to use port mapping to find the docker instance because i dont have any ports mapping.

But thanks to Jakub's suggestion, i was able to write a similar hacky more proper health checks below.
I'm basically iterating over all running docker instances to find the correct instance with matching MESOS_TASK_ID
I'm running mesos-slaves on ubuntu 14.04

==> healthy.json <==
{
  "cpus": 0.01,
  "mem": 100,
  "id": "healthy",
  "instances": 15,
  "cmd": "sleep 1000",

  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "ubuntu"

    }
  },
  "healthChecks": [{
    "protocol": "COMMAND",
    "command": {"value": "docker exec $(for i in $(docker ps -q --no-trunc); do docker inspect $i | grep -sq MESOS_TASK_ID=${MESOS_TASK_ID:?} && echo $i; done) ls /"}
  }]
}

==> nonhealthy.json <==
{
  "cpus": 0.01,
  "mem": 100,
  "id": "nonhealthy",
  "instances": 15,
  "cmd": "sleep 1000",

  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "ubuntu"

    }
  },
  "healthChecks": [{
    "protocol": "COMMAND",
    "command": {"value": "docker exec $(for i in $(docker ps -q --no-trunc); do docker inspect $i | grep -sq MESOS_TASK_ID=${MESOS_TASK_ID:?} && echo $i; done) ls /ali"}
  }]

Gareth Kirwan

unread,
Mar 2, 2016, 9:07:17 PM3/2/16
to marathon-framework, veverk...@gmail.com, bek...@gmail.com
I had the same issue.

I finally settled on:
if [[ -z $(docker ps -q --filter name=$MARATHON_CONTAINER_ID) ]]; then exit 1; fi

Which seems to work.
It relies on the MARATHON_CONTAINER_ID env var, which seems undocumented, but exists.
Failing that there were decreasingly sane options, but I didn't need to go that far.

I can't see any value in execing in the container, *unless* you have a healthcheck to run in the container itself like the OP.
If you just want to know it's up, then ps only shows running containers, and that's good enough.

Darshan Zend

unread,
Mar 16, 2016, 9:24:38 AM3/16/16
to marathon-framework
Looks like the behaviour has changed in mesos 0.27.0. Now the COMMAND executes inside the docker container. I don't see it documented anywhere.
Reply all
Reply to author
Forward
0 new messages