Hello,
I've got a "single node cluster" running on AWS, I'm on stable channel and my machine rebooted due to CoreOS upgrade on 2016-03-26.
After the reboot docker.service failed to start and so did all the dependant units.
I had to manually start all of them via systemctl and now everything is up and running.
This machine is managed with ansible so it has minimal AWS user-data and a cloud-config.yml uploaded in the user home.
There are other systemd units that are loaded and enabled with ansible roles.
I think the relevant part is here:
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Failed to start Network fabric for containers.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd-resolved[831]: System hostname changed to 'ip-172-17-0-21'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd-hostnamed[838]: Changed static host name to 'ip-172-17-0-21.service.consul'
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd-hostnamed[838]: Changed host name to 'ip-172-17-0-21.service.consul'
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for internal Custom Fields Manager Docker Container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: production-internal-cfm-web.service: Job production-internal-cfm-web.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for internal Kerberos Docker Container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: vpn.service: Job vpn.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for bolton Monitor Workers Docker Container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: production-bolton-monitor-workers.service: Job production-bolton-monitor-workers.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for nginx Docker Container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: nginx.service: Job nginx.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for bolton Redis Docker Container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: production-bolton-redis.service: Job production-bolton-redis.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for Docker Application Container Engine.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for Consul container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Dependency failed for Registrator container.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: docker-registrator.service: Job docker-registrator.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: docker-consul.service: Job docker-consul.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: flanneld.service: Unit entered failed state.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: flanneld.service: Failed with result 'exit-code'.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Reloading.
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.451055460Z" level=info msg="Firewalld running: false"
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: libcontainer-760-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Starting etcd2...
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Starting Garbage Collection for rkt...
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Starting Generate /run/coreos/motd...
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Reached target Run Docker containers before main Docker starts up.
Mar 26 01:30:34 ip-172-17-0-21.service.consul coreos-cloudinit[809]: 2016/03/26 01:30:34 Ensuring runtime unit file "etcd2.service" is unmasked
Mar 26 01:30:34 ip-172-17-0-21.service.consul coreos-cloudinit[809]: 2016/03/26 01:30:34 Ensuring runtime unit file "fleet.service" is unmasked
Mar 26 01:30:34 ip-172-17-0-21.service.consul coreos-cloudinit[809]: 2016/03/26 01:30:34 Ensuring runtime unit file "locksmithd.service" is unmasked
Mar 26 01:30:34 ip-172-17-0-21.service.consul coreos-cloudinit[809]: 2016/03/26 01:30:34 Calling unit command "restart" on "oem-cloudinit.service"'
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Started Garbage Collection for rkt.
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Started Generate /run/coreos/motd.
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.690490617Z" level=info msg="Loading containers: start."
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.691007939Z" level=info msg="Loading containers: done."
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.691024843Z" level=info msg="Daemon has completed initialization"
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.691074403Z" level=info msg="Docker daemon" commit=9894698 execdriver=native-0.2 graphdriver=overlay version=1.9.1
Mar 26 01:30:34 ip-172-17-0-21.service.consul dockerd[760]: time="2016-03-26T01:30:34.696872089Z" level=info msg="API listen on /var/run/early-docker.sock"
Mar 26 01:30:34 ip-172-17-0-21.service.consul systemd[1]: Reached target Multi-User System.
$ curl http://169.254.169.254/2009-04-04/user-data
#cloud-config
coreos:
etcd2:
#single node cluster setup
name: core0
initial-cluster: core0=http://$private_ipv4:2380
initial-advertise-peer-urls: http://$private_ipv4:2380
advertise-client-urls: http://$private_ipv4:2379
# listen on both the official ports and the legacy ports
# legacy ports can be omitted if your application doesn't depend on them
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://0.0.0.0:2380
# generate a new token for each unique cluster from https://discovery.etcd.io/new?size=3
# specify the initial size of your cluster with ?size=X
#discovery: https://discovery.etcd.io/e42cedcdc7c8df727706971000a973ad
# multi-region and multi-cloud deployments need to use $public_ipv4
#advertise-client-urls: http://$private_ipv4:2379,http://$public_ipv4:2379
#initial-advertise-peer-urls: http://$private_ipv4:2380,http://$public_ipv4:7001
# listen on both the official ports and the legacy ports
# legacy ports can be omitted if your application doesn't depend on them
#listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
#listen-peer-urls: http://$private_ipv4:2380,http://$public_ipv4:7001
update:
reboot-strategy: "etcd-lock"
fleet:
public-ip: "$public_ipv4"
flannel:
public-ip: "$public_ipv4"
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
- name: var-lib-docker.mount
command: start
content: |
[Unit]
Description=Mount ephemeral0 to /var/lib/docker
Requires=dev-xvdb.device
[Mount]
What=/dev/xvdb
Where=/var/lib/docker
Type=ext3
- name: cache.mount
command: start
content: |
[Unit]
Description=Mount ephemeral1 to /cache
Requires=dev-xvdc.device
[Mount]
What=/dev/xvdc
Where=/cache
Type=ext3
- name: flanneld.service
drop-ins:
- name: 50-network-config.conf
content: |
[Service]
ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.17.0.0/16", "Backend": {"Type": "aws-vpc"} }'
command: start
- name: docker.service
command: start
drop-ins:
- name: 10-wait-docker.conf
content: |
[Unit]
After=var-lib-docker.mount
After=cache.mount
Requires=var-lib-docker.mount
Requires=cache.mount
Requires=flanneld.service
After=flanneld.service
$ cat /home/core/cloud-config.yml
#cloud-config
hostname: core-0
coreos:
etcd2:
#single node cluster setup
name: core0
initial-cluster: core0=http://$private_ipv4:2380
initial-advertise-peer-urls: http://$private_ipv4:2380
advertise-client-urls: http://$private_ipv4:2379
# listen on both the official ports and the legacy ports
# legacy ports can be omitted if your application doesn't depend on them
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://0.0.0.0:2380
fleet:
public-ip: "$public_ipv4"
metadata: "dc=aws-eu-west-1,availability_zone=eu-west-1a"
flannel:
public-ip: "$public_ipv4"
update:
reboot-strategy: "best-effort"
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
- name: "docker-consul.service"
command: "start"
enable: true
content: |
[Unit]
Description=Consul container
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill consul
ExecStartPre=-/usr/bin/docker rm consul
ExecStartPre=/usr/bin/docker pull gliderlabs/consul-agent:0.6
ExecStart=/usr/bin/docker run --rm --name consul \
-v /data/consul:/data -v /etc/consul.d/:/etc/consul.d:ro \
-v /var/run/docker.sock:/var/run/docker.sock --net=host \
-p 8300:8300 -p 8301:8301 -p 8301:8301/udp \
-p 8400:8400 -p 127.0.0.1:8500:8500 -p 53:8600/udp \
-e SERVICE_8500_NAME=consul -e SERVICE_TAGS="http,monitoring" \
-e SERVICE_8300_IGNORE=true -e SERVICE_8301_IGNORE=true \
-e SERVICE_8400_IGNORE=true -e SERVICE_8600_IGNORE=true \
gliderlabs/consul-agent:0.6 -ui-dir /ui -config-dir=/etc/consul.d \
-advertise=$private_ipv4 -node=core-0
ExecStop=/usr/bin/docker stop consul
[Install]
WantedBy=multi-user.target
- name: "docker-registrator.service"
command: "start"
enable: true
content: |
[Unit]
Description=Registrator container
After=docker-consul.service
Requires=docker-consul.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill registrator
ExecStartPre=-/usr/bin/docker rm registrator
ExecStartPre=/usr/bin/docker pull gliderlabs/registrator:latest
ExecStart=/usr/bin/docker run --rm --name registrator \
-v /var/run/docker.sock:/tmp/docker.sock --net=host \
gliderlabs/registrator -internal -resync 60 consul://127.0.0.1:8500
ExecStop=/usr/bin/docker stop registrator
[Install]
WantedBy=multi-user.target
Thank you,
Alessio