Docker starting before /var/lib/docker mount is ready

685 views
Skip to first unread message

Jean Mertz

unread,
May 13, 2015, 8:19:50 AM5/13/15
to coreo...@googlegroups.com

We’re seeing some weird dependency issues with regard to mounting ephemeral storage on EC2 at /var/lib/docker:

---
coreos:
  units:
  - name: format-ephemeral.service
    command: start
    content: |
      [Unit]
      Description=Formats the ephemeral drive
      ConditionPathExists=/dev/xvdb

      [Service]
      Type=oneshot
      RemainAfterExit=yes

      ExecStart=/usr/sbin/wipefs -f /dev/xvdb
      ExecStart=/usr/sbin/mkfs.ext4 /dev/xvdb
  - name: var-lib-docker.mount
    command: start
    content: |
      [Unit]
      Description=Mount ephemeral to /var/lib/docker
      After=format-ephemeral.service
      Before=docker.service
      Requires=format-ephemeral.service

      [Mount]
      What=/dev/xvdb
      Where=/var/lib/docker
      Type=ext4
% journalctl -f -u docker -n2000000 | head -n20 
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 11:59:44 systemd[1]: Dependency failed for Docker Application Container Engine.
May 13 11:59:44 systemd[1]: Job docker.service/start failed with result 'dependency'.
May 13 12:00:05 systemd[1]: Starting Docker Application Container Engine...
May 13 12:00:05 systemd[1]: Started Docker Application Container Engine.
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="+job serveapi(fd://)"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="Listening for HTTP on fd ()"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="+job init_networkdriver()"
May 13 12:00:05 dockerd[861]: time="2015-05-13T12:00:05Z" level=info msg="-job init_networkdriver() = OK (0)"
% journalctl -f -u var-lib-docker.mount -n2000000 | head -n20
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 12:01:04 systemd[1]: var-lib-docker.mount: Directory /var/lib/docker to mount over is not empty, mounting anyway.
May 13 12:01:04 systemd[1]: Mounting Mount ephemeral to /var/lib/docker...
May 13 12:01:04 systemd[1]: Mounted Mount ephemeral to /var/lib/docker.
% journalctl -f -u format-ephemeral.service -n2000000 | head -n20
-- Logs begin at Wed 2015-05-13 11:58:51 UTC. --
May 13 12:01:03 systemd[1]: Starting Formats the ephemeral drive...
May 13 12:01:03 wipefs[1526]: offset               type
May 13 12:01:03 wipefs[1526]: ----------------------------------------------------------------
May 13 12:01:03 wipefs[1526]: 0x438                ext3   [filesystem]
May 13 12:01:03 wipefs[1526]: UUID:  b47e48a5-7828-419c-a040-1f6f205dcaa0
May 13 12:01:03 mkfs.ext4[1527]: mke2fs 1.42.10 (18-May-2014)
May 13 12:01:03 mkfs.ext4[1527]: Creating filesystem with 7862400 4k blocks and 1966080 inodes
May 13 12:01:03 mkfs.ext4[1527]: Filesystem UUID: 6945c573-7c00-4ad0-bddb-e0a089adaeb3
May 13 12:01:03 mkfs.ext4[1527]: Superblock backups stored on blocks:
May 13 12:01:03 mkfs.ext4[1527]: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
May 13 12:01:03 mkfs.ext4[1527]: 4096000
May 13 12:01:03 mkfs.ext4[1527]: [57B blob data]
May 13 12:01:03 mkfs.ext4[1527]: [54B blob data]
May 13 12:01:03 mkfs.ext4[1527]: Creating journal (32768 blocks): done
May 13 12:01:04 mkfs.ext4[1527]: [91B blob data]
May 13 12:01:04 systemd[1]: Started Formats the ephemeral drive.

As you can see in the above logs, Docker actually starts before the drive is
formatted, even though format-ephemeral.service is of type oneshot, and
should be blocking during the formatting.

As seen by the logs, Docker eventually crashes when its data is suddenly gone:

May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="-job execCreate(consul-server) = OK (0)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="POST /v1.18/exec/0c97267d6d224df170c601f40d0fc08a1d7e2b80886fd5356dc4e272c3a2bf63/start"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="+job execStart(0c97267d6d224df170c601f40d0fc08a1d7e2b80886fd5356dc4e272c3a2bf63)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="+job log(exec_start: consul join 10.13.1.117, 983b33e36c8974895e48bc5cae555b5c727ce42a253bababc0d2229599bb8e7e, progrium/consul:latest)"
May 13 12:01:48 dockerd[861]: time="2015-05-13T12:01:48Z" level=info msg="-job log(exec_start: consul join 10.13.1.117, 983b33e36c8974895e48bc5cae555b5c727ce42a253bababc0d2229599bb8e7e, progrium/consul:latest) = OK (0)"
May 13 12:01:48 dockerd[861]: panic: runtime error: invalid memory address or nil pointer dereference
May 13 12:01:48 dockerd[861]: [signal 0xb code=0x1 addr=0x0 pc=0x46ab82]
May 13 12:01:48 dockerd[861]: goroutine 1693 [running]:
May 13 12:01:48 systemd[1]: docker.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May 13 12:01:48 systemd[1]: Unit docker.service entered failed state.
May 13 12:01:48 systemd[1]: docker.service failed.
May 13 12:01:48 dockerd[861]: github.com/docker/docker/daemon.(*Daemon).Run(0xc208088ea0, 0xc20821e000, 0xc2083ef170, 0xc20822b070, 0x0, 0xc20849dc00, 0x0, 0x0)
May 13 12:01:48 dockerd[861]: /build/amd64-usr/var/tmp/portage/app-emulation/docker-1.6.0-r1/work/docker-1.6.0/.gopath/src/github.com/docker/docker/daemon/daemon.go:1097 +0x95

And an ls /var/lib/docker confirms the data directory is empty (except for the
standard lost+found). The mount is also confirmed:

% df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.7G     0  3.7G   0% /dev
tmpfs           3.7G     0  3.7G   0% /dev/shm
tmpfs           3.7G  1.1M  3.7G   1% /run
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
/dev/xvda9       29G  533M   27G   2% /
/dev/xvda3      985M  353M  581M  38% /usr
tmpfs           3.7G     0  3.7G   0% /tmp
tmpfs           3.7G     0  3.7G   0% /media
/dev/xvda6      108M   56K   99M   1% /usr/share/oem
/dev/xvdb        30G  524M   28G   2% /var/lib/docker

A manual restart using systemctl restart docker brings
it back online.

Jason Waldrip

unread,
May 20, 2015, 2:11:17 AM5/20/15
to coreo...@googlegroups.com
This is happening to us as well. It seems as though the mount is not complete by the time docker starts. docker reboot does fix it

Rimas Mocevicius

unread,
May 20, 2015, 3:19:31 PM5/20/15
to coreo...@googlegroups.com
It does happen for me on GCE too
Tested on beta and alpha latest CoreOS versions
Reply all
Reply to author
Forward
0 new messages