Orphaned docker containers

738 views
Skip to first unread message

David Challoner

unread,
May 24, 2015, 1:42:03 PM5/24/15
to coreo...@googlegroups.com
We occasionally get orphaned containers that don't get shut down when their corresponding fleet service is destroyed.   For these environments we do deployments with a wrapper tool that just restarts the service to pull latest (i.e fleetctl destroy, fleetctl start).   They're develop environments so they have users constantly bouncing the services.

Our clusters are 3 nodes of identical CoreOS VM. Version 0.8.3 of fleet.  CoreOS 522.6.0.

One example service files looks like:
[Unit]
Description=worker-sk_v1.cmd.%i
After=docker.service

[Service]
User=core
EnvironmentFile=/etc/environment
TimeoutStartSec=10m
Restart=on-failure
RestartSec=5
ExecStartPre=/bin/sh -c "/usr/bin/docker pull our_org/our-container:latest"
ExecStartPre=/bin/sh -c "docker inspect worker-sk_v1.cmd.%i >/dev/null && /usr/bin/docker rm -f worker-sk_v1.cmd.%i || true"
ExecStart=/bin/sh -c "/usr/bin/docker run --hostname=$COLOR-`hostname` --dns-search=$CLUSTER_FQDN --env-file=/etc/environment --env-file=/etc/cluster_environment --name worker-sk_v1.cmd.%i -P our_org/our-container:latest foreman start"
ExecStopPost=/usr/bin/docker rm -f worker-sk_v1.cmd.%i

[X-Fleet]
X-Conflicts=worker-sk_v1.cmd.%i.service

Any ideas?  Any timeouts to tweak or would upgrading to the latest coreos/fleet help?

Thanks,
David

Jakub Veverka

unread,
May 24, 2015, 2:55:27 PM5/24/15
to coreo...@googlegroups.com
Hi, 

I am solving simillar problem by running docker rm before starting the unit using 

ExecStartPre=/bin/sh -c "docker kill container && docker rm container"

another solution would be running while loop on stop until there's no container. 

Unfortunately I don't have any more elegant solution...

Alex Polvi

unread,
May 24, 2015, 3:04:31 PM5/24/15
to Jakub Veverka, coreos-user

If you are up for a little weekend experimentation, one of the driving design decisions behind rkt was to allow the container to be managed by the init system correctly.

You should be able to construct an ExecStart= that has a /usr/bin/rkt run that pulls a docker container and avoids all the extra systemd hacks. The container will be a direct child of systemd so it will be able to clean up appropriately.

Would appreciate feedback on your experience with this if you give it a shot.

Alex

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Salvatore Poliandro

unread,
May 24, 2015, 4:15:41 PM5/24/15
to David Challoner, coreo...@googlegroups.com
The default timeout for docker shutdown may not be enough for your process or foreman is not trapping docker term signals properly. We we have found that docker stop doesn't really play nice with a lot of apps or scripts because after the 10 second default timeout it sends a kill to pid 1 regardless of what pid your app is at so getting a trap on the first process that docker launches is essential. 

Also you can do a fleetctl stop or unload and then a start which if you set docker rm as a pre start command will prevent orphans and allow you to troubleshoot the signals more. Fleet will then just stop the service and start it on the same node. 

Sent from my iPhone

Alex Polvi

unread,
May 25, 2015, 1:05:22 PM5/25/15
to David Challoner, coreos-user
I realized part of this thread went off-list. 

Using a tool that directly forks the container will allow systemd to clean up and track state correctly for you. The systemd hacks get out of control because a "docker run" is a http client. If the connection is broken or anything goes wrong, systemd loses track of the state of the actual container because the state it is tracking the http client, not the container. 

rkt fixes this, and thus something like:

ExecStart=/usr/bin/rkt --insecure-skip-verify run docker://coreos/etcd 

...will act how you expect. rkt also supports auth, so you can talk to private repos and natively download docker images. 


One of the core reasons we built rkt was to fix these issues with dockers process model to avoid all these crazy systemd hacks. 

-Alex

Reply all
Reply to author
Forward
0 new messages