How to mount Ceph RBD from docker container on CoreOS

3,692 views
Skip to first unread message

Satoru Funai

unread,
Oct 3, 2014, 8:41:17 AM10/3/14
to coreo...@googlegroups.com
Hi all,
I found that CoreOS kernel includes Ceph RBD (Block device driver) since 440.
I installed CoreOS 444.3.0 (beta) and tried to mount Ceph RBD from container.
I build Ubuntu 14.04 docker container and install ceph-common, setup to connect my ceph cluster.
The container can connect Ceph such as;
root@9106b80bf5e9:~# rbd --pool test-pool ls -l
2014-10-03 12:33:36.753167 7fa7ceb72700  0 -- :/1000172 >> 192.168.100.114:6789/0 pipe(0x7fa7d0d8f340 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fa7d0d8f5b0).fault
NAME        SIZE PARENT FMT PROT LOCK
test-image 1024M          1z

But could not map it as "
rbd: add failed: (30) Read-only file system"

root@9106b80bf5e9:~# rbd map test-image --pool test-pool
rbd: add failed: (30) Read-only file system

Please help me.
Any suggestions or ideas would be appreciated.

Best regards,
Satoru Funai

Brian Harrington

unread,
Oct 3, 2014, 11:36:51 AM10/3/14
to coreo...@googlegroups.com
Satoru,

The container will need to be run in "privileged" mode so that it doesn't have CAP_SYS_ADMIN stripped away.

--Brian 'redbeard' Harrington
--CoreOS
--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Satoru Funai

unread,
Oct 3, 2014, 10:57:49 PM10/3/14
to coreo...@googlegroups.com
Hi Brian,
Thanks for your suggestion.
I tried with privileged mode, but still failed as below;

On CoreOS:
core@core002 ~ $ docker run --privileged -v /etc/ceph:/etc/ceph -d -p 1022:22 -p 6789:6789 496d183593fb /usr/sbin/sshd -D
792fa908455f514e4133b3c0e1fe5cb855ead033ed18a588977148563bbe9905
core@core002 ~ $ docker ps
CONTAINER ID        IMAGE               COMMAND               CREATED             STATUS              PORTS                                          NAMES
792fa908455f        ossl/ubuntu1:0.1    "/usr/sbin/sshd -D"   4 seconds ago       Up 1 seconds        0.0.0.0:1022->22/tcp, 0.0.0.0:6789->6789/tcp   naughty_brown
core@core002 ~ $ docker inspect 792fa908455f
[{
    "Args": [
        "-D"
    ],
    "Config": {
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "/usr/sbin/sshd",
            "-D"
        ],
        "CpuShares": 0,
        "Cpuset": "",
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ],
        "ExposedPorts": {
            "22/tcp": {},
            "6789/tcp": {}
        },
        "Hostname": "792fa908455f",
        "Image": "496d183593fb",
        "Memory": 0,
        "MemorySwap": 0,
        "NetworkDisabled": false,
        "OnBuild": null,
        "OpenStdin": false,
        "PortSpecs": null,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "Created": "2014-10-04T02:46:27.461827338Z",
    "Driver": "btrfs",
    "ExecDriver": "native-0.2",
    "HostConfig": {
        "Binds": [
            "/etc/ceph:/etc/ceph"
        ],
        "CapAdd": null,
        "CapDrop": null,
        "ContainerIDFile": "",
        "Devices": [],
        "Dns": null,
        "DnsSearch": null,
        "Links": null,
        "LxcConf": [],
        "NetworkMode": "bridge",
        "PortBindings": {
            "22/tcp": [
                {
                    "HostIp": "",
                    "HostPort": "1022"
                }
            ],
            "6789/tcp": [
                {
                    "HostIp": "",
                    "HostPort": "6789"
                }
            ]
        },
        "Privileged": true,
        "PublishAllPorts": false,
        "RestartPolicy": {
            "MaximumRetryCount": 0,
            "Name": ""
        },
        "VolumesFrom": null
    },
    "HostnamePath": "/var/lib/docker/containers/792fa908455f514e4133b3c0e1fe5cb855ead033ed18a588977148563bbe9905/hostname",
    "HostsPath": "/var/lib/docker/containers/792fa908455f514e4133b3c0e1fe5cb855ead033ed18a588977148563bbe9905/hosts",
    "Id": "792fa908455f514e4133b3c0e1fe5cb855ead033ed18a588977148563bbe9905",
    "Image": "496d183593fbbbce9ba3b49b6beab39dc2ef6e43efbccaa71287822888ef329a",
    "MountLabel": "",
    "Name": "/naughty_brown",
    "NetworkSettings": {
        "Bridge": "docker0",
        "Gateway": "172.17.42.1",
        "IPAddress": "172.17.0.26",
        "IPPrefixLen": 16,
        "PortMapping": null,
        "Ports": {
            "22/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "1022"
                }
            ],
            "6789/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "6789"
                }
            ]
        }
    },
    "Path": "/usr/sbin/sshd",
    "ProcessLabel": "",
    "ResolvConfPath": "/var/lib/docker/containers/792fa908455f514e4133b3c0e1fe5cb855ead033ed18a588977148563bbe9905/resolv.conf",
    "State": {
        "ExitCode": 0,
        "FinishedAt": "0001-01-01T00:00:00Z",
        "Paused": false,
        "Pid": 12835,
        "Restarting": false,
        "Running": true,
        "StartedAt": "2014-10-04T02:46:29.488055642Z"
    },
    "Volumes": {
        "/etc/ceph": "/etc/ceph"
    },
    "VolumesRW": {
        "/etc/ceph": true
    }
}


On Docker container
root@792fa908455f:~# rbd --pool test-pool ls -l
NAME        SIZE PARENT FMT PROT LOCK
test-image 1024M          1
root@792fa908455f:~# rbd map --image test-image --pool test-pool
rbd: add failed: (22) Invalid argument

Any ideas?
Satoru Funai

2014年10月4日土曜日 0時36分51秒 UTC+9 Brian Harrington:

Satoru Funai

unread,
Oct 6, 2014, 4:37:35 AM10/6/14
to coreo...@googlegroups.com
I also found error when loading rbd on Ubuntu14.04 container on CoreOS 444.3.0
root@792fa908455f:~# modprobe rbd
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.16.2+/modules.dep.bin'

root@792fa908455f:~# ls /lib/modules/*/modules.dep.bin
ls: cannot access /lib/modules/*/modules.dep.bin: No such file or directory

root@792fa908455f:~# uname -a
Linux 792fa908455f 3.16.2+ #2 SMP Wed Oct 1 23:00:59 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@792fa908455f:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"



2014年10月3日金曜日 21時41分17秒 UTC+9 Satoru Funai:

Christopher Armstrong

unread,
Oct 7, 2014, 12:05:35 AM10/7/14
to Satoru Funai, coreos-user
Hi Satoru,

I've been fighting this as well. As far as I can tell, it's a known issue with mounting RBD volumes from within a Docker container. See: https://lists.linuxcontainers.org/pipermail/lxc-users/2013-October/005795.html

Some related issues in the Ceph tracker:

The bug appears to be with the `rbd map` command. When debugging with joshd in IRC, we actually got the volume to mount, but only by completely disabling Ceph authentication and on the CoreOS host, echoing directly into the rbd bus. The volume was then accessible from within the container, but this felt like a pretty hacky workaround. 

Our IRC conversation can be seen here: http://irclogs.ceph.widodh.nl/index.php?date=2014-09-04

Excerpts are as follows - sorry for the wall of text, but thought it was important to preserve on the coreos-user list. Seems there's a lot of Ceph interest as of late:

[2:16] <carmstrong> well, running the container as --privileged allowed me to get back the device creation error, but now I'm getting `rbd: add failed: (22) Invalid argument`
[2:16] <carmstrong> I'm running `rbd map $pool/$name`
[2:16] <carmstrong> tried also specifying --pool separately
[3:23] <joshd> carmstrong: to get more logging out of the kernel you can 'mount -t debugfs none /sys/kernel/debug', run https://raw.githubusercontent.com/ceph/ceph/master/src/script/kcon_all.sh and then try rbd map again - it'll appear in dmesg
[4:05] <carmstrong> ok. [ 874.254853] rbd: Error adding device 172.17.8.100:6789 name=admin,key=client.admin deis db
[4:05] <carmstrong> potentially an auth issue?
[4:06] <joshd> anything before that?
[4:13] <carmstrong> joshd: some docker-related networking, but that's about it
[4:33] <joshd> carmstrong: that means it's failing very early, not even talking over the network
[4:34] <carmstrong> joshd: not a good sign :(
[4:34] <joshd> I'm suspicious it may have to do with the way auth info is passed to the kernel (maybe some extra capability is needed inside a container)
[4:34] <joshd> can you map one outside of a container with this kernel?
[4:35] <carmstrong> I'm unable to install the ceph packages in the root CoreOS machine. do you know of another way to test it?
[4:35] <joshd> which version is the kernel?
[4:37] <carmstrong> 3.15.8
[4:37] <joshd> it's worth trying in the container with cephx auth disabled ('auth supported = none' in the [global] section of /etc/ceph/ceph.conf on every node and restart the cluster)
[4:40] <carmstrong> that's just where I was headed! ok lemme try
[4:44] <joshd> carmstrong: with auth disabled it's also easy to try out on the host with echo "172.17.8.100:6789 name=admin deis db" > /sys/bus/rbd/add
[4:45] <joshd> it'll show up as /dev/rbd0 if it works, and you can remove it with echo 0 > /sys/bus/rbd/remove
[21:46] <carmstrong> joshd1: so with auth disabled, doing an echo "172.17.8.100:6789 name=admin deis test" > /sys/bus/rbd/add as root causes a hang for a few minutes, then the machine crashes and reboots
[21:46] <carmstrong> that's on the host machine
[21:50] <joshd1> carmstrong: any log of the crash in syslog or dmesg or anything?
[21:50] <carmstrong> joshd1: dmesg only has the most recent boot, nothing before. it mentioned that the system journal wasn't closed correctly because of the crash, but that's about it
[1:25] <joshd1> carmstrong: that hang on the host may have been http://tracker.ceph.com/issues/8818, which started occurring in 3.15 (fixed in the stable kernel trees now). it's certainly separate from the issue inside the container, so it seems there'd need to be extra debugging added to the rbd module to figure out what the EINVAL is coming from
[1:25] * alram (~al...@cpe-172-250-2-46.socal.res.rr.com) Quit (Quit: leaving)
[1:25] <carmstrong> joshd1: gotcha. thanks for all your help
[1:25] <carmstrong> I'm going down the route of just using the radosgw for now for blob storage
[1:25] <carmstrong> and we'll revisit the RBD volume in the future
[1:29] <joshd1> carmstrong: you're welcome, that makes sense for now. I'll add a bug about the container issue
[1:37] <joshd1> carmstrong: http://tracker.ceph.com/issues/9355
[1:39] <carmstrong> joshd1: I also commented on 8818, in case anyone else with my kernel and coreos stumbles across it



Chris Armstrong
Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


--

Christopher Armstrong

unread,
Oct 7, 2014, 12:09:13 AM10/7/14
to Satoru Funai, coreos-user
My comment about getting it to mount and be accessible may have been on the 3.16.2 kernel - it looks like my debugging that day just resulted in the machine hanging and rebooting, which looks like http://tracker.ceph.com/issues/8818

I DO recall it working by probing the kernel module, disabling ceph authentication and restarting all daemons, then on the CoreOS host: echo "172.17.8.100:6789 name=admin deis db" > /sys/bus/rbd/add (substituting your pool name and volume name). 

I am *very* interested in getting this working as we need it for Deis, so please let me know if you make any progress or if I can help!

Chris Armstrong
Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


Satoru Funai

unread,
Oct 7, 2014, 7:56:33 AM10/7/14
to coreo...@googlegroups.com, satoru...@gmail.com
Hi Chris,
Thank you very much for you info.
It works well in my environment without the machine hanging and rebooting,
Also mounted volume on CoreOS can be accessed from containers with docker run -v option.
root@974c0461a790:~# rbd showmapped
id pool      image      snap device
0  test-pool test-image -    /dev/rbd0
root@974c0461a790:~# ls -l /mnt/ceph
total 4
-rw-r--r-- 1 root root 12 Oct  7 10:34 test.txt
It's enough for our experimental environment, but NOT enough for production due to lack of cephx auth.
Are there any information to fix this problem?
Thanks again,
Satoru Funai


2014年10月7日火曜日 13時09分13秒 UTC+9 Christopher Armstrong:

Satoru Funai

unread,
Oct 7, 2014, 8:06:12 AM10/7/14
to coreo...@googlegroups.com, satoru...@gmail.com
BTW, I run RBD operations on CoreOS as below;
core@core002 ~ $ sudo modprobe rbd
core@core002 ~ $ sudo echo "192.168.100.112 name=admin test-pool test-image" | sudo tee /sys/bus/rbd/add
core@core002 ~ $ sudo mkdir /mnt/ceph
core@core002 ~ $ sudo mount /dev/rbd0 /mnt/ceph



2014年10月7日火曜日 20時56分33秒 UTC+9 Satoru Funai:

Christopher Armstrong

unread,
Oct 7, 2014, 12:30:55 PM10/7/14
to Satoru Funai, coreos-user
Hi Satoru,

Glad you at least are able to play around with it. I petitioned the Ceph mailing list last night after updating my ticket with a lot more debugging information: http://tracker.ceph.com/issues/9355

Ilya Dryomov changed my ticket status to "verified" just a few minutes ago, so I'm hoping that means they're working on a fix.

Chris Armstrong
Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


Steve

unread,
Oct 7, 2014, 12:56:12 PM10/7/14
to coreo...@googlegroups.com, satoru...@gmail.com
Hi all,

    We've been experimenting with using Ceph RBD volumes as our persistent storage. While I probably can't release the solution we created I can give you an overview of how we 'solved' the issue.

- The CoreOS clusters we run is are baremetal installs that we provision with our internal iPXE/FAI setup.

- Using this we drop a custom daemon/webservice that uses the /sys/bus/rbd/* devices to mount rbd volumes on the CoreOS host. This service is a static Go binary but you could probably do a similar thing with a script deployed via cloudinit.

- Docker containers that need persistent storage mount/umount the volume they need via their systemd/fleet unit file, eg:

      ExecStartPre=/usr/bin/curl -d '{"volume":"mypersistentvolume"}' -X POST http://localhost:3000/mount
      ExecStopPost=/usr/bin/curl -d '{"volume":"mypersistentvolume"}' -X POST http://localhost:3000/unmount

- The docker container accesses the volume via the -v flag
- The volume moves around with the container as fleet reschedules it.

We did spend some time trying to get the containers to mount the volumes directly but we ran into the same problems as above. Obviously the service is a little more complicated by authentication, locking, crashes etc but that's the fun bit :)

Cheers
    Steve


Notice:  This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.

If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. 

References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time.  The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.

Satoru Funai

unread,
Oct 8, 2014, 1:30:45 AM10/8/14
to coreo...@googlegroups.com, satoru...@gmail.com
Thanks a lot!!
Satoru

2014年10月8日水曜日 1時30分55秒 UTC+9 Christopher Armstrong:

Satoru Funai

unread,
Oct 8, 2014, 1:31:29 AM10/8/14
to coreo...@googlegroups.com, satoru...@gmail.com
Hi Steve,
Did you disabled cephx auth?
Satoru

2014年10月8日水曜日 1時56分12秒 UTC+9 Steve:

Steve

unread,
Oct 8, 2014, 4:54:04 AM10/8/14
to coreo...@googlegroups.com, satoru...@gmail.com
No, the service is a little more complicated than the example I showed, we have a lot more parameters in addition to "volume" that we can pass, pool name, monitors, secrets etc.


Message has been deleted

Satoru Funai

unread,
Oct 8, 2014, 11:28:53 PM10/8/14
to coreo...@googlegroups.com, satoru...@gmail.com
Hi Steve,
Could you tell me how to map rbd image on CoreOS please?
I failed always with enabled cephx auth as below;
core@core004 ~ $ sudo modprobe rbd
core@core004 ~ $ sudo echo "192.168.100.112 name=admin test-pool test-image"|sudo tee /sys/bus/rbd/add
192.168.100.112 name=admin test-pool test-image
tee: /sys/bus/rbd/add: Invalid argument
It looks like http://tracker.ceph.com/issues/9355 as Chris posted.
Satoru Funai

2014年10月8日水曜日 17時54分04秒 UTC+9 Steve:

Steve

unread,
Oct 9, 2014, 4:41:47 AM10/9/14
to coreo...@googlegroups.com, satoru...@gmail.com
I think your problem might be that you aren't passing in the 'password'. Try it with name=admin,secret=XXX, where XXX is the key from the admin keyring. You should also probably pass in a list of monitors rather than just one. Remember we are doing this on CoreOS itself rather than inside the container.

Satoru Funai

unread,
Oct 9, 2014, 5:22:11 AM10/9/14
to coreo...@googlegroups.com
Hi Steve,
Thank you very much!!! It works fine!
I will try to make units and fleet for auto mount as th next step.
Thanks again and beat regards,
Satoru Funai

Christopher Armstrong

unread,
Oct 9, 2014, 12:33:35 PM10/9/14
to Satoru Funai, coreos-user
Hey guys,

Good news!! Ilya investigated the ticket and gave me a hint as to the issue - we need to use `--net host` on the consuming container so that the network context is what Ceph expects. I am now running my test container like so:
docker run -i -v /sys:/sys --net host 172.21.12.100:5000/deis/store-base:git-3d4ca8f /bin/bash
Note that we also had to bind-mount /sys so that it's not read-only within the container. And I can confirm that it works!

Chris Armstrong
Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


Satoru Funai

unread,
Oct 11, 2014, 1:38:41 AM10/11/14
to coreo...@googlegroups.com, satoru...@gmail.com
Hi Chris,
Thanks for your info.
But I don't understand how to use "--net host" option.
Descriptions from "Docker run options"  https://docs.docker.com/reference/run/ ;
"Mode: host

With the networking mode set to host a container will share the host's network stack and all interfaces from the host will be available to the container. The container's hostname will match the hostname on the host system. Publishing ports and linking to other containers will not work when sharing the host's network stack."

So it shares all of host's network stack, The container can't isolate network settings from it's host.
Please help me.
Satoru Funai

2014年10月10日金曜日 1時33分35秒 UTC+9 Christopher Armstrong:
Reply all
Reply to author
Forward
0 new messages