Shipping kernel modules into docker containers

Debab Ramzi

unread,

Sep 16, 2014, 12:28:24 PM9/16/14

to docke...@googlegroups.com

Hi,

I have a question please concerning the additional kernel modules needed in a given container for example. Let us say that the host is based on a linux distribution kernel (CoreOS for example). I create a container based on an other linux distribution (Fedora for example) running an application needing a custom kernel module M1. How this module will be executed? It will be integrated into the original host kernel (CoreOS) or it will be integrated into another layer ? Any additional links explaining deep docker internals will be welcome.

Regards.

Ramzi.

Greg KH

unread,

Sep 16, 2014, 1:18:04 PM9/16/14

to Debab Ramzi, docke...@googlegroups.com

On Tue, Sep 16, 2014 at 09:28:24AM -0700, Debab Ramzi wrote:
> Hi,
> I have a question please concerning the additional kernel modules needed in a
> given container for example. Let us say that the host is based on a linux
> distribution kernel (CoreOS for example). I create a container based on an
> other linux distribution (Fedora for example) running an application needing a
> custom kernel module M1.

What type of application needs a custom kernel module? Specifics
please.

> How this module will be executed?

It will not be, unless it matches the "host" kernel exactly.

> It will be integrated into the original host kernel (CoreOS) or it
> will be integrated into another layer ? Any additional links
> explaining deep docker internals will be welcome.

Docker containers are not virtual machines with different kernels and
modules, think of them as just another application running on the same
machine.

If you "need" different kernels and kernel modules, just use a virtual
machine.

Or re-evaluate your "need" for a custom kernel module, getting rid of
that dependancy is a better idea over the long term...

hope this helps,

greg k-h

Phil Estes

unread,

Sep 16, 2014, 1:45:35 PM9/16/14

to Greg KH, Debab Ramzi, docke...@googlegroups.com

Greg KH wrote:

On Tue, Sep 16, 2014 at 09:28:24AM -0700, Debab Ramzi wrote:

Hi,
I have a question please concerning the additional kernel modules needed in a
given container for example. Let us say that the host is based on a linux
distribution  kernel (CoreOS for example). I create a container based on an
other linux distribution (Fedora for example) running an application needing a
custom  kernel module M1.

What type of application needs a custom kernel module?  Specifics
please.

How this module will be executed?

It will not be, unless it matches the "host" kernel exactly.

It will be integrated into the original host kernel (CoreOS) or it
will be integrated into another layer ? Any additional links
explaining deep docker internals will be welcome.

Docker containers are not virtual machines with different kernels and
modules, think of them as just another application running on the same
machine.

If you "need" different kernels and kernel modules, just use a virtual
machine.

In addition, I believe you will immediately lose a couple key properties containers offer:

- portability: you potentially need to "ship" a version of your kernel module for every possible host!

- isolation/protection: you are asking to perform a privileged operation on the host that could affect other containers

Debab Ramzi

unread,

Sep 16, 2014, 4:00:38 PM9/16/14

to docke...@googlegroups.com, r_d...@esi.dz

Hi Greg,

Thanx a lot for your response.

What type of application needs a custom kernel module? Specifics
please.

Let us suppose a custom nginx server based on a custom firewall developped as a kernel module.

Regards.

Ramzi.

Greg KH

unread,

Sep 16, 2014, 4:14:40 PM9/16/14

to Debab Ramzi, docke...@googlegroups.com

That's a really funny supposition, it's as if people never learn from
history :)

You are seriously on your own here, good luck with that.

greg k-h

p.s. if you want to run something like that, you really don't want to
use containers.

Sven Dowideit

unread,

Sep 16, 2014, 10:37:56 PM9/16/14

to docke...@googlegroups.com, r_d...@esi.dz

I'm tempted to think the idea of building and loading the virtualbox / vmware etc kernel modules from within a container FS might be mildly useful.

That way, the compiler tools needed are not installed in the host - and could actually be built in a known distro (and then installed and run to something more primitive), the kernel sources can be contained, and then when the module is built it can be thrown at the host OS.

in the boot2docker case, we're using kernel.org releases patched with random versions of aufs - so without a copy of the container used to build the iso, this will get a little complicated - but we're working on making those builds more repeatable too.

Sven

Greg KH

unread,

Sep 17, 2014, 12:48:08 AM9/17/14

to Sven Dowideit, docke...@googlegroups.com, r_d...@esi.dz

On Tue, Sep 16, 2014 at 07:37:55PM -0700, Sven Dowideit wrote:
> I'm tempted to think the idea of building and loading the virtualbox / vmware
> etc kernel modules from within a container FS might be mildly useful.

Not really.

> That way, the compiler tools needed are not installed in the host - and could
> actually be built in a known distro (and then installed and run to something
> more primitive), the kernel sources can be contained, and then when the module
> is built it can be thrown at the host OS.

Why does it matter? It's still needed to be the kernel modules built
against the host kernel / os, it doesn't matter at all what is in the
container.

Unless you just want to use docker as a way to ship kernel modules as a
"package". If so, sure, use it that way, but realize it's just a fancy
way to circumvent the distro's "native" way to package things up. :)

> in the boot2docker case, we're using kernel.org releases patched with random
> versions of aufs - so without a copy of the container used to build the iso,
> this will get a little complicated - but we're working on making those builds
> more repeatable too.

Ick, aufs, please please stay away from that thing, can't you just use
btrfs or dm now?

Good luck,

greg k-h

Debab Ramzi

unread,

Sep 17, 2014, 4:03:07 AM9/17/14

to docke...@googlegroups.com, r_d...@esi.dz

I think that Docker as an idea can be used in the kernel space. What about adding a layer in the container that is executed in the kernel space. So the container will be divided into two spaces; User space and kernel space. Doing so, we can ensure the isolation of this kernel module for example. Shipping device drivers could be very useful I suppose. What about containers in the kernel space also?

Ramzi.

Greg KH

unread,

Sep 17, 2014, 12:14:36 PM9/17/14

to Debab Ramzi, docke...@googlegroups.com

On Wed, Sep 17, 2014 at 01:03:07AM -0700, Debab Ramzi wrote:
> I think that Docker as an idea can be used in the kernel space. What about
> adding a layer in the container that is executed in the kernel space. So the
> container will be divided into two spaces; User space and kernel space. Doing
> so, we can ensure the isolation of this kernel module for example. Shipping
> device drivers could be very useful I suppose. What about containers in the
> kernel space also?

No one is working on containers in kernel space, sorry, the idea doesn't
really work there.

greg k-h

Leen Besselink

unread,

Sep 17, 2014, 12:32:26 PM9/17/14

to docke...@googlegroups.com

If you really want your own bit of kernel space maybe User-mode Linux is a better idea.

> greg k-h
>
> --
> You received this message because you are subscribed to the Google Groups "docker-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sven Dowideit

unread,

Sep 23, 2014, 1:31:08 AM9/23/14

to Greg KH, docker-dev, r_d...@esi.dz

On Wed, Sep 17, 2014 at 2:47 PM, Greg KH <gre...@linuxfoundation.org> wrote:

Ick, aufs, please please stay away from that thing, can't you just use
btrfs or dm now?

it seems that those 2 don't allow processes to share memory, whereas aufs can -
so running 1000 containers running the same code will use 1000 times the memory?

damn.

greg k-h

--

Docker Support Engineer

Ask me anything ...

Brisbane, Australia (UTC+10)

Greg KH

unread,

Sep 23, 2014, 9:15:08 AM9/23/14

to Sven Dowideit, docker-dev, r_d...@esi.dz

On Tue, Sep 23, 2014 at 03:31:03PM +1000, Sven Dowideit wrote:
> On Wed, Sep 17, 2014 at 2:47 PM, Greg KH <gre...@linuxfoundation.org> wrote:
>
>
>
> Ick, aufs, please please stay away from that thing, can't you just use
> btrfs or dm now?
>
>
>
> it seems that those 2 don't allow processes to share memory, whereas aufs can -

aufs "might", but I don't think we are using it that way in Docker,
right?

> so running 1000 containers running the same code will use 1000 times the
> memory?

I don't know, if this is a real issue, it can be worked on. Are there
people with this issue?

thanks,

greg k-h

Jérôme Petazzoni

unread,

Sep 23, 2014, 1:24:32 PM9/23/14

to Greg KH, Sven Dowideit, docker-dev, r_d...@esi.dz

On Tue, Sep 23, 2014 at 6:14 AM, Greg KH <gre...@linuxfoundation.org> wrote:

On Tue, Sep 23, 2014 at 03:31:03PM +1000, Sven Dowideit wrote:
> On Wed, Sep 17, 2014 at 2:47 PM, Greg KH <gre...@linuxfoundation.org> wrote:
>
>
>
> Ick, aufs, please please stay away from that thing, can't you just use
> btrfs or dm now?
>
>
>
> it seems that those 2 don't allow processes to share memory, whereas aufs can -

aufs "might", but I don't think we are using it that way in Docker,
right?

Actually, AUFS does share memory correctly.

We've been using that for dotCloud over 4+ years and it works beautifully.

(For a liberal interpretation of "beautifully" :-))

I can't blame people for using AUFS, because:

- it's much more memory efficient

- it's also much faster (since it doesn't incur as much reads from disk)

- people have reported data corruption with DM

- BTRFS has serious performance degradation issues (after using it for a few months for my Docker storage, I had to give up because of that, and that was with a post-3.14 kernel)

- BTRFS also has serious garbage collection issues, you have to run a "rebalance" once in a while, otherwise you experience disk full issues even though you only use a few GB over a 250 GB partition...

That being said, I'm pretty sure that we could get all the advantages of AUFS (efficiency) without the inconvenience (not in vanilla tree) by working on e.g. overlayfs :-)

> so running 1000 containers running the same code will use 1000 times the
> memory?

I don't know, if this is a real issue, it can be worked on. Are there
people with this issue?

People doing PAAS or CI/CD.

E.g. dotCloud was running hundreds of containers per instance (with 32 GB of RAM) thanks to AUFS.

--

@jpetazzo

Latest blog post: http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/

Greg KH

unread,

Sep 23, 2014, 6:05:13 PM9/23/14

to Jérôme Petazzoni, Sven Dowideit, docker-dev, r_d...@esi.dz

On Tue, Sep 23, 2014 at 10:24:30AM -0700, Jérôme Petazzoni wrote:
>
>
> On Tue, Sep 23, 2014 at 6:14 AM, Greg KH <gre...@linuxfoundation.org> wrote:
>
> On Tue, Sep 23, 2014 at 03:31:03PM +1000, Sven Dowideit wrote:
> > On Wed, Sep 17, 2014 at 2:47 PM, Greg KH <gre...@linuxfoundation.org>
> wrote:
> >
> >
> >
> > Ick, aufs, please please stay away from that thing, can't you just
> use
> > btrfs or dm now?
> >
> >
> >
> > it seems that those 2 don't allow processes to share memory, whereas aufs
> can -
>
> aufs "might", but I don't think we are using it that way in Docker,
> right?
>
>
> Actually, AUFS does share memory correctly.
> We've been using that for dotCloud over 4+ years and it works beautifully.
> (For a liberal interpretation of "beautifully" :-))
>
> I can't blame people for using AUFS, because:
> - it's much more memory efficient
> - it's also much faster (since it doesn't incur as much reads from disk)
> - people have reported data corruption with DM

You forgot:
- locking is wrong in aufs, causing to bad kernel bugs at times.
:)

> - BTRFS has serious performance degradation issues (after using it for a few
> months for my Docker storage, I had to give up because of that, and that was
> with a post-3.14 kernel)
> - BTRFS also has serious garbage collection issues, you have to run a
> "rebalance" once in a while, otherwise you experience disk full issues even
> though you only use a few GB over a 250 GB partition...
>
> That being said, I'm pretty sure that we could get all the advantages of AUFS
> (efficiency) without the inconvenience (not in vanilla tree) by working on e.g.
> overlayfs :-)

I totally agree, please help out with the overlayfs/unionfs solution
upstream please, that is greatly needed.

thanks,

greg k-h

Jeremy Eder

unread,

Sep 30, 2014, 11:17:59 AM9/30/14

to Greg KH, Jérôme Petazzoni, Sven Dowideit, docker-dev, r debab

Greg/Jerome, we have some indicative scalability numbers and page cache sharing data for the various permutations of union/non-union filesystems/raw block available:

https://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/

For OverlayFS, it's based on 3.17-rc1+overlayfs v23+Alex Larsson's experimental docker/overlayfs tree.
More detail in the post.

Jérôme Petazzoni

unread,

Sep 30, 2014, 11:44:27 AM9/30/14

to Jeremy Eder, Greg KH, Sven Dowideit, docker-dev, r debab

Neat. Kudos for doing those benchmarks; I believe that they will be very useful!

A couple of remarks:

- too bad you didn't test with AUFS as well ;-) I believe it will have the same profile as overlayfs

- too bad you didn't show memory usage when running 100x the same container in parallel (to highlight that scalability issues of thinp and btrfs)

If you scripted those tests, it'd be awesome to share the methodology so others can reproduce (in particular, after tweaking Docker, to see performance improvements?)

Thanks!

Jeremy Eder

unread,

Sep 30, 2014, 12:14:12 PM9/30/14

to Jérôme Petazzoni, Greg KH, Sven Dowideit, docker-dev, r debab

Understood; as I tried to explain in the blog, AUFS is not a target for us at the moment. Perhaps things will change, but ... anyway so that's why we're looking at OverlayFS.

- too bad you didn't show memory usage when running 100x the same container in parallel (to highlight that scalability issues of thinp and btrfs)

If I understand you correctly, that's precisely what the 2nd graph "Docker Page Cache Usage Test" shows (well, 3 containers vs 100, but same effect).

If you scripted those tests, it'd be awesome to share the methodology so others can reproduce (in particular, after tweaking Docker, to see performance improvements?)

Everything's automated; but the scripts aren't out in the open at the moment. Where would one contribute performance regression tests to Docker ? We're maintaining all this internally at the moment. I've seen stuff from Duke and IBM that are independently developed and on Github. No coordinated effort/location yet ?

Problem as always with perf scalability and regression tests is repeatability/hardware requirements and tuning. Look at the hoops Intel's LKP goes through to improve reliability of test results.

I submitted a DockerCon EU proposal on the broader container performance topic.

Thanks!

--
@jpetazzo
Latest blog post: http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/

Jérôme Petazzoni

unread,

Sep 30, 2014, 12:47:47 PM9/30/14

to Jeremy Eder, Greg KH, Sven Dowideit, docker-dev, r debab

Yes, I totally understand. As a long-time user of AUFS, I get that it's a pain to use outside of Debian/Ubuntu environments; but at the same time, with all its flaws, it has been incredibly more reliable and dependable than all the other options in the last 4 years. (We've been running the dotCloud PAAS on AUFS because basically nothing else remotely worked under real stress conditions.)

- too bad you didn't show memory usage when running 100x the same container in parallel (to highlight that scalability issues of thinp and btrfs)

If I understand you correctly, that's precisely what the 2nd graph "Docker Page Cache Usage Test" shows (well, 3 containers vs 100, but same effect).

Yes, but IMVHO, it lacks "impact". I mean -- I was definitely looking for that graph, and I was expecting to see a huge difference (basically, OverlayFS flatlining, while the others would just increase linearly). Instead, it gives the impression that "oh, OverlayFS is just 3 times more efficient, that's it" -- which is already quite good, of course, but doesn't reflect the fact that it has a completely different scaling model. I hope this makes sense. And again, I don't want to sound like I'm downplaying your work here. I know benchmarks are hard and tedious, and I appreciate immensely what has been done there!

If you scripted those tests, it'd be awesome to share the methodology so others can reproduce (in particular, after tweaking Docker, to see performance improvements?)
Everything's automated; but the scripts aren't out in the open at the moment. Where would one contribute performance regression tests to Docker ? We're maintaining all this internally at the moment. I've seen stuff from Duke and IBM that are independently developed and on Github. No coordinated effort/location yet ?

Problem as always with perf scalability and regression tests is repeatability/hardware requirements and tuning. Look at the hoops Intel's LKP goes through to improve reliability of test results.

I submitted a DockerCon EU proposal on the broader container performance topic.

Great! I'll look forward to it.

Thanks again!

Thanks!

--
@jpetazzo
Latest blog post: http://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/

--
You received this message because you are subscribed to the Google Groups "docker-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to docker-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward