We’re seeing some weird dependency issues with regard to mounting ephemeral storage on EC2 at /var/lib/docker:
---
coreos:
units:
- name: format-ephemeral.service
command: start
content: |
[Unit]
Description=Formats the ephemeral drive
ConditionPathExists=/dev/xvdb
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/wipefs -f /dev/xvdb
ExecStart=/usr/sbin/mkfs.ext4 /dev/xvdb
- name: var-lib-docker.mount
command: start
content: |
[Unit]
Description=Mount ephemeral to /var/lib/docker
After=format-ephemeral.service
Before=docker.service
Requires=format-ephemeral.service
[Mount]
What=/dev/xvdb
Where=/var/lib/docker
Type=ext4
% journalctl -f -u docker -n2000000 | head -n20
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 11:59:44 systemd[1]: Dependency failed for Docker Application Container Engine.
May 13 11:59:44 systemd[1]: Job docker.service/start failed with result 'dependency'.
May 13 12:00:05 systemd[1]: Starting Docker Application Container Engine...
May 13 12:00:05 systemd[1]: Started Docker Application Container Engine.
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="+job serveapi(fd://)"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="Listening for HTTP on fd ()"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="+job init_networkdriver()"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="-job init_networkdriver() = OK (0)"
% journalctl -f -u var-lib-docker.mount -n2000000 | head -n20
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 12:01:04 systemd[1]: var-lib-docker.mount: Directory /var/lib/docker to mount over is not empty, mounting anyway.
May 13 12:01:04 systemd[1]: Mounting Mount ephemeral to /var/lib/docker...
May 13 12:01:04 systemd[1]: Mounted Mount ephemeral to /var/lib/docker.
% journalctl -f -u format-ephemeral.service -n2000000 | head -n20
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 12:01:03 systemd[1]: Starting Formats the ephemeral drive...
May 13 12:01:03 wipefs[1526]: offset type
May 13 12:01:03 wipefs[1526]: ----------------------------------------------------------------
May 13 12:01:03 wipefs[1526]: 0x438 ext3 [filesystem]
May 13 12:01:03 wipefs[1526]: UUID: b47e48a5-7828-419c-a040-1f6f205dcaa0
May 13 12:01:03 mkfs.ext4[1527]: mke2fs 1.42.10 (18-May-2014)
May 13 12:01:03 mkfs.ext4[1527]: Creating filesystem with 7862400 4k blocks and 1966080 inodes
May 13 12:01:03 mkfs.ext4[1527]: Filesystem UUID: 6945c573-7c00-4ad0-bddb-e0a089adaeb3
May 13 12:01:03 mkfs.ext4[1527]: Superblock backups stored on blocks:
May 13 12:01:03 mkfs.ext4[1527]: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
May 13 12:01:03 mkfs.ext4[1527]: 4096000
May 13 12:01:03 mkfs.ext4[1527]: [57B blob data]
May 13 12:01:03 mkfs.ext4[1527]: [54B blob data]
May 13 12:01:03 mkfs.ext4[1527]: Creating journal (32768 blocks): done
May 13 12:01:04 mkfs.ext4[1527]: [91B blob data]
May 13 12:01:04 systemd[1]: Started Formats the ephemeral drive.
As you can see in the above logs, Docker actually starts before the drive is
formatted, even though format-ephemeral.service is of type oneshot, and
should be blocking during the formatting.
As seen by the logs, Docker eventually crashes when its data is suddenly gone:
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="-job execCreate(consul-server) = OK (0)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="POST /v1.18/exec/0c97267d6d224df170c601f40d0fc08a1d7e2b80886fd5356dc4e272c3a2bf63/start"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="+job execStart(0c97267d6d224df170c601f40d0fc08a1d7e2b80886fd5356dc4e272c3a2bf63)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="+job log(exec_start: consul join 10.13.1.117, 983b33e36c8974895e48bc5cae555b5c727ce42a253bababc0d2229599bb8e7e, progrium/consul:latest)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="-job log(exec_start: consul join 10.13.1.117, 983b33e36c8974895e48bc5cae555b5c727ce42a253bababc0d2229599bb8e7e, progrium/consul:latest) = OK (0)"
May 13 12:01:48 dockerd[861]: panic: runtime error: invalid memory address or nil pointer dereference
May 13 12:01:48 dockerd[861]: [signal 0xb code=0x1 addr=0x0 pc=0x46ab82]
May 13 12:01:48 dockerd[861]: goroutine 1693 [running]:
May 13 12:01:48 systemd[1]: docker.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May 13 12:01:48 systemd[1]: Unit docker.service entered failed state.
May 13 12:01:48 systemd[1]: docker.service failed.
May 13 12:01:48 dockerd[861]: github.com/docker/docker/daemon.(*Daemon).Run(0xc208088ea0, 0xc20821e000, 0xc2083ef170, 0xc20822b070, 0x0, 0xc20849dc00, 0x0, 0x0)
May 13 12:01:48 dockerd[861]: /build/amd64-usr/var/tmp/portage/app-emulation/docker-1.6.0-r1/work/docker-1.6.0/.gopath/src/github.com/docker/docker/daemon/daemon.go:1097 +0x95
And an ls /var/lib/docker confirms the data directory is empty (except for the
standard lost+found). The mount is also confirmed:
% df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 1.1M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/xvda9 29G 533M 27G 2% /
/dev/xvda3 985M 353M 581M 38% /usr
tmpfs 3.7G 0 3.7G 0% /tmp
tmpfs 3.7G 0 3.7G 0% /media
/dev/xvda6 108M 56K 99M 1% /usr/share/oem
/dev/xvdb 30G 524M 28G 2% /var/lib/docker
A manual restart using systemctl restart docker brings
it back online.