Cannot start containers and lost docker volumes after reboot

864 views
Skip to first unread message

Marco Monteiro

unread,
Sep 11, 2015, 12:50:23 PM9/11/15
to coreo...@googlegroups.com
My CoreOS cluster is composed of 3 nodes, 2 older ones with btrfs root and a newer one with ext4.

After rebooting them to update to 766.3.0, the node with ext4 file systems has a few problems.
The other two nodes seem fine.

I'm not sure what exactly the problem is, but some of the containers cannot be started; the message I get is something like:

Cannot start container <container-name>: no such file or directory

It's as if the container volumes don't have any files. Other containers start fine, so it's not a problem with all of them

Another issue is that some of the volume containers seem to have lost the volumes.
These containers where created with, for example,

/usr/bin/docker create --name postgres-volumes --volume /var/lib/postgresql/data busybox

When I start a container with --volumes-from postgres-volumes, the volume is not available in the new container. If I try to run the images that where sources to the containers, all of them fail with the "no such file or directory" message from above.

I've had to manually recreated the containers, which I could only do after removing some of the images, as starting containers from some of the images would give me the "no such file" error.

Now I'm a little afraid of restarting that node because I'll need to spend another afternoon cleaning up and recreating stuff if this happens again.

This seems like the layers for some of the images/containers where lost and so there are no files. Does this make sense?

Does anyone have any idea about what can be happening? And what can I do to figure out what the problem was so I can more confidently restart the machine?

Also, any idea of how can I try to access some of the volumes lost, to try to recover any data?

Thanks,
Marco

Brandon Philips

unread,
Sep 14, 2015, 11:29:41 AM9/14/15
to Marco Monteiro, coreo...@googlegroups.com
Hello Marco-

Can you attach more details? For example how did you setup these containers? How is your filesystem setup? Can you attach a more complete log out of docker? What tool is giving you "Cannot start container <container-name>: no such file or directory"?

Brandon

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marco Monteiro

unread,
Sep 20, 2015, 4:02:31 AM9/20/15
to Brandon Philips, coreo...@googlegroups.com
Hi, Brandon!

Here is a concrete example. That host has been running a set of containers that provide gitlab service. One of the containers is postgres. It is run with from gitlab-postgres.service, that contains:

---
[Unit]
Description=PostgreSQL server for GitLab
After=docker.service
Requires=docker.service

[Service]
ExecStartPre=-/usr/bin/docker create --name %p-volumes --volume /var/lib/postgresql/data busybox
ExecStartPre=-/usr/bin/docker kill %p
ExecStartPre=-/usr/bin/docker rm --volumes %p
ExecStartPre=/usr/bin/docker create                     \
                             --env SERVICE_5432_NAME=%p \
                             --name %p                  \
                             --volumes-from %p-volumes  \
                             postgres:9.4
ExecStartPre=/usr/bin/docker start %p
ExecStart=/usr/bin/docker wait %p
ExecStop=/usr/bin/docker stop %p
Restart=on-failure
RestartSec=5s
TimeoutStartSec=0
User=core

[X-Fleet]
MachineMetadata="hostname=worker4" "type=worker"
---

Now, from the shell

---
core@worker4 ~ $ docker ps -a |grep gitlab-postgres-volumes
2fc6ccec730c        busybox                                                                     "/bin/sh"              11 weeks ago        Exited (0) 8 days ago                       gitlab-postgres-volumes                    
core@worker4 ~ $ docker run --rm -it --volumes-from gitlab-postgres-volumes masm/tools /bin/bash
[root@fef870635eb0 /]# ls /var/lib/postgres
ls: cannot access /var/lib/postgres: No such file or directory
[root@fef870635eb0 /]#
---

Notice that there is  no /var/lib/postgresql folder, that should have been provided by the volume. The volume container has been create 3 months ago and has always worked until last week, but it seems that the volume has been lost:

---
core@worker4 ~ $ docker inspect gitlab-postgres-volumes
...
    "Volumes": {},
    "VolumesRW": {},
...
    "Config": {
        "Volumes": {
            "/var/lib/postgresql/data": {}
        },
...
---

What might explain this is docker removing the volume when I run /usr/bin/docker rm --volumes gitlab-postgres, in the service file, ignoring the fact that that volume is also used in gitlab-postgres-volumes.

As for your other questions: The filesystem being used is ext4, and I didn't make any change from what coreos set up. I cannot paste any relevant log as I don't see any error besides what I already mentioned. And the tool that was giving me the "Cannot start container <container-name>: no such file or directory" was runnig "docker run <args>" with some of the containers. To fix those, I had to remove the images being, pull them again, and then it worked.

Thanks,
Marco
Reply all
Reply to author
Forward
0 new messages