[PATCH v4 0/4] cgroups: support for module-loadable subsystems

Ben Blum

unread,

Dec 31, 2009, 12:11:33 AM12/31/09

to linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, ak...@linux-foundation.org, men...@google.com, bb...@andrew.cmu.edu

[This is a revision of http://lkml.org/lkml/2009/12/21/211 ]

This patch series implements support for building, loading, and
unloading subsystems as modules, both within and outside the kernel
source tree. It provides an interface cgroup_load_subsys() and
cgroup_unload_subsys() which modular subsystems can use to register and
depart during runtime. The net_cls classifier subsystem serves as the
example for a subsystem which can be converted into a module using these
changes.

Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear. Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section
of the array is protected by cgroup_mutex.

Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.

Patch #3 adds a mechanism for unloading modular subsystems, which
includes a more advanced rework of the rudimentary reference counting
introduced in patch 2.

Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.

Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves. In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy). The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys
was the only one in the dummy cgroup (and therefore all css_sets should
be unlinked from it) - however, as all css_sets always stay attached to
the dummy cgroup anyway, these cases are ignored. Any fix that addresses
this issue should also make sure these cases are addressed in the
subsystem loading and unloading code.

-- bblum

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ben Blum

unread,

Dec 31, 2009, 12:13:05 AM12/31/09

to linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, ak...@linux-foundation.org, men...@google.com, bb...@andrew.cmu.edu

cgroups-revamp-subsys-array.patch

Ben Blum

unread,

Dec 31, 2009, 12:13:55 AM12/31/09

to linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, ak...@linux-foundation.org, men...@google.com, bb...@andrew.cmu.edu

cgroups-subsys-module-interface.patch

Ben Blum

unread,

Dec 31, 2009, 12:14:57 AM12/31/09

to linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, ak...@linux-foundation.org, men...@google.com, bb...@andrew.cmu.edu

cgroups-subsys-unloading.patch

Ben Blum

unread,

Dec 31, 2009, 12:15:35 AM12/31/09

to linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, ak...@linux-foundation.org, men...@google.com, bb...@andrew.cmu.edu, net...@vger.kernel.org

cgroups-net_cls-as-module.patch

Andrew Morton

unread,

Jan 6, 2010, 7:04:34 PM1/6/10

to Ben Blum, linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, men...@google.com

On Thu, 31 Dec 2009 00:10:50 -0500
Ben Blum <bb...@andrew.cmu.edu> wrote:

> This patch series implements support for building, loading, and
> unloading subsystems as modules, both within and outside the kernel
> source tree. It provides an interface cgroup_load_subsys() and
> cgroup_unload_subsys() which modular subsystems can use to register and
> depart during runtime. The net_cls classifier subsystem serves as the
> example for a subsystem which can be converted into a module using these
> changes.

What is the value in this? What are the usage scenarios? Why does the
benefit of this change exceed the cost/risk/etc of merging it?

Ben Blum

unread,

Jan 6, 2010, 8:26:39 PM1/6/10

to Andrew Morton, linux-...@vger.kernel.org, conta...@lists.linux-foundation.org, li...@cn.fujitsu.com, men...@google.com

On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> On Thu, 31 Dec 2009 00:10:50 -0500
> Ben Blum <bb...@andrew.cmu.edu> wrote:
>
> > This patch series implements support for building, loading, and
> > unloading subsystems as modules, both within and outside the kernel
> > source tree. It provides an interface cgroup_load_subsys() and
> > cgroup_unload_subsys() which modular subsystems can use to register and
> > depart during runtime. The net_cls classifier subsystem serves as the
> > example for a subsystem which can be converted into a module using these
> > changes.
>
> What is the value in this? What are the usage scenarios? Why does the
> benefit of this change exceed the cost/risk/etc of merging it?

As discussed in the first posting of these patches, this provides the
ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
would have already been a module except for a lack of support from
cgroups, and the change also allows other module-loadable classifiers
to add subsystems of their own.

KAMEZAWA Hiroyuki

unread,

Jan 6, 2010, 10:10:56 PM1/6/10

to Ben Blum, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Wed, 6 Jan 2010 20:26:06 -0500
Ben Blum <bb...@andrew.cmu.edu> wrote:

> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > On Thu, 31 Dec 2009 00:10:50 -0500
> > Ben Blum <bb...@andrew.cmu.edu> wrote:
> >
> > > This patch series implements support for building, loading, and
> > > unloading subsystems as modules, both within and outside the kernel
> > > source tree. It provides an interface cgroup_load_subsys() and
> > > cgroup_unload_subsys() which modular subsystems can use to register and
> > > depart during runtime. The net_cls classifier subsystem serves as the
> > > example for a subsystem which can be converted into a module using these
> > > changes.
> >
> > What is the value in this? What are the usage scenarios? Why does the
> > benefit of this change exceed the cost/risk/etc of merging it?
>
> As discussed in the first posting of these patches, this provides the
> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> would have already been a module except for a lack of support from
> cgroups, and the change also allows other module-loadable classifiers
> to add subsystems of their own.

Hmm, do you have your own module in plan ?

Thanks,
-Kame

Li Zefan

unread,

Jan 7, 2010, 1:42:31 AM1/7/10

to KAMEZAWA Hiroyuki, Ben Blum, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

KAMEZAWA Hiroyuki wrote:
> On Wed, 6 Jan 2010 20:26:06 -0500
> Ben Blum <bb...@andrew.cmu.edu> wrote:
>
>> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
>>> On Thu, 31 Dec 2009 00:10:50 -0500
>>> Ben Blum <bb...@andrew.cmu.edu> wrote:
>>>
>>>> This patch series implements support for building, loading, and
>>>> unloading subsystems as modules, both within and outside the kernel
>>>> source tree. It provides an interface cgroup_load_subsys() and
>>>> cgroup_unload_subsys() which modular subsystems can use to register and
>>>> depart during runtime. The net_cls classifier subsystem serves as the
>>>> example for a subsystem which can be converted into a module using these
>>>> changes.
>>> What is the value in this? What are the usage scenarios? Why does the
>>> benefit of this change exceed the cost/risk/etc of merging it?
>> As discussed in the first posting of these patches, this provides the
>> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
>> would have already been a module except for a lack of support from
>> cgroups, and the change also allows other module-loadable classifiers
>> to add subsystems of their own.
>
> Hmm, do you have your own module in plan ?
>

Maybe the new blkio_cgroup can also be made module-able.

KAMEZAWA Hiroyuki

unread,

Jan 7, 2010, 2:19:52 AM1/7/10

to Li Zefan, Ben Blum, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Thu, 07 Jan 2010 14:42:19 +0800
Li Zefan <li...@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bb...@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bb...@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.
>

Hmm, I read the patch slightly. I'm not enough expert to review this patch..

I requst following as TODO.
(No objection to the direction/patch.)

1. Add documentation about load/unlod module.
It seems module unloading will not succuess while subsystem is mounted.
Right ?

2. Making this to be reasonable value.
#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
I can't find why.

3. show whehter a subsys is a loadable module or not via /proc/cgroups

4. how to test ? load/unload NET_CLS is enough ?

Last one is question.

5. Is following path is safe ?

find_css_set() {
....
read_lock(&css_set_lock);
get template including pointer
read_unlock(&css_set_lock);

use template to build new css_set.

Thanks,
-Kame

Ben Blum

unread,

Jan 7, 2010, 2:48:50 AM1/7/10

to KAMEZAWA Hiroyuki, Li Zefan, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

could perhaps use more, i suppose.

> It seems module unloading will not succuess while subsystem is mounted.
> Right ?

yes, when you mount it, it takes a reference on the module, so you get
"module is in use".

> 2. Making this to be reasonable value.
> #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> I can't find why.

"We limit to this many since cgroupfs_root has subsys_bits to keep track
of all of them." should it be less, perhaps? the memory footprint is not
great, it is true, but implementing dynamically sized subsystem tracking
data structures requires much cleverer code in many places.

> 3. show whehter a subsys is a loadable module or not via /proc/cgroups

with just "y" or "n"? possible, and probably easy. do note that since
they are sorted by subsys_id, all the ones above a certain line will be
"n" and all below will be "y".

> 4. how to test ? load/unload NET_CLS is enough ?

load, mount, remount, unmount, mount with different combinations, unload
was the general approach I took, plus using gdb to examine state.

>
> Last one is question.
>
> 5. Is following path is safe ?
>
> find_css_set() {
> ....
> read_lock(&css_set_lock);
> get template including pointer
> read_unlock(&css_set_lock);
>
> use template to build new css_set.

should be, because that code is dealing with a cgrp's/css's specific
->subsys array, which state is protected by refcounts held by the
already mounted hierarchy, and the other entries in the array are not
cared about by the particular css in question.

KAMEZAWA Hiroyuki

unread,

Jan 7, 2010, 2:54:40 AM1/7/10

to Ben Blum, Li Zefan, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Thu, 7 Jan 2010 02:48:12 -0500
Ben Blum <bb...@andrew.cmu.edu> wrote:

> > 2. Making this to be reasonable value.
> > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > I can't find why.
>
> "We limit to this many since cgroupfs_root has subsys_bits to keep track
> of all of them." should it be less, perhaps?

It's ok if it's clear that
"this decistion is done by implementation choice, not by cgroup's nature"

> the memory footprint is not
> great, it is true, but implementing dynamically sized subsystem tracking
> data structures requires much cleverer code in many places.
>

yes. I don't request that.

> > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
>
> with just "y" or "n"? possible, and probably easy. do note that since
> they are sorted by subsys_id, all the ones above a certain line will be
> "n" and all below will be "y".
>

yes.

#subsys_name hierarchy num_cgroups enabled module
cpuset 0 1 1 0

and 0/1 to show y/n ? (but this cause interface incompatibility...)

Ben Blum

unread,

Jan 7, 2010, 3:06:13 AM1/7/10

to KAMEZAWA Hiroyuki, Li Zefan, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Thu, Jan 07, 2010 at 04:51:17PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 7 Jan 2010 02:48:12 -0500
> Ben Blum <bb...@andrew.cmu.edu> wrote:
>
> > > 2. Making this to be reasonable value.
> > > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > > I can't find why.
> >
> > "We limit to this many since cgroupfs_root has subsys_bits to keep track
> > of all of them." should it be less, perhaps?
>
> It's ok if it's clear that
> "this decistion is done by implementation choice, not by cgroup's nature"

mhm, well, it is the upper limit due to nature, but why it and not a
smaller number is by choice.

>
> > the memory footprint is not
> > great, it is true, but implementing dynamically sized subsystem tracking
> > data structures requires much cleverer code in many places.
> >
> yes. I don't request that.

it might be possible to have a config option as CGROUP_EXTRA_SUBSYSTEMS
(with max being 64 or 32) which would add slots for subsystems outside
of the kernel tree, to avoid using up a lot of blank slots in typical
use cases. not entirely sure how to implement it in the scope of the
configuration world, just speculation.

> > > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
> >
> > with just "y" or "n"? possible, and probably easy. do note that since
> > they are sorted by subsys_id, all the ones above a certain line will be
> > "n" and all below will be "y".
> >
> yes.
>
> #subsys_name hierarchy num_cgroups enabled module
> cpuset 0 1 1 0
>
> and 0/1 to show y/n ? (but this cause interface incompatibility...)

well, format should be agreed upon. 1/0 would be consistent with the
rest of the output.

Ben Blum

unread,

Jan 7, 2010, 3:15:17 AM1/7/10

to Li Zefan, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bb...@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bb...@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.

certainly looks promising. one thing I note while looking over it is
that it wants .use_id = 1, and the load_subsys interface neglects to
make a call to init_idr. adding calls to cgroup_init_idr has a few
possible complications... what use does blkio have for use_id?

Ben Blum

unread,

Jan 7, 2010, 3:23:28 AM1/7/10

to Li Zefan, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

oh, oops, while tracing the calls that init_idr does i ended up looking
at the wrong functions and made it out to be harder than it was (in
several places, even). still it would need to be included in load_subsys
and the single error case handled appropriately.

Ben Blum

unread,

Jan 8, 2010, 12:28:48 AM1/8/10

to Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, vgo...@redhat.com, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bb...@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bb...@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.

Ok, the following two patches make this happen (or at least pretend to
well enough to fool me). The first one adds use_id initialization in
cgroup_load_subsys, and the second rearranges config options and some
code as appropriate in block/ and adds EXPORT_SYMBOLs in cgroup.c.

-- bblum

Ben Blum

unread,

Jan 8, 2010, 12:29:43 AM1/8/10

to Li Zefan, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

cgroups-module-use_id-support.patch

Ben Blum

unread,

Jan 8, 2010, 12:31:20 AM1/8/10

to Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, vgo...@redhat.com, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

cgroups-blkio-as-module.patch

Vivek Goyal

unread,

Jan 8, 2010, 10:11:54 AM1/8/10

to Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bb...@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>

Hi Ben,

I will give this patch a try.

So from blk-cgroup perspective, the advantage of allowing it as module
will be that we can save some memory if we are not using the controller?

> Signed-off-by: Ben Blum <bb...@andrew.cmu.edu>
> ---
> block/Kconfig | 2 +-
> block/Kconfig.iosched | 2 +-
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
> block/blk-cgroup.h | 10 +++++++-
> include/linux/iocontext.h | 2 +-
> kernel/cgroup.c | 8 +++++++
> 6 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index e20fbde..62a5921 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
> Protection. If in doubt, say N.
>
> config BLK_CGROUP
> - bool
> + tristate
> depends on CGROUPS
> default n
> ---help---
> diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
> index b71abfb..fc71cf0 100644
> --- a/block/Kconfig.iosched
> +++ b/block/Kconfig.iosched
> @@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
>
> config IOSCHED_CFQ
> tristate "CFQ I/O scheduler"
> + select BLK_CGROUP if CFQ_GROUP_IOSCHED
> default y
> ---help---
> The CFQ I/O scheduler tries to distribute bandwidth equally
> @@ -35,7 +36,6 @@ config IOSCHED_CFQ
> config CFQ_GROUP_IOSCHED
> bool "CFQ Group Scheduling support"
> depends on IOSCHED_CFQ && CGROUPS
> - select BLK_CGROUP
> default n
> ---help---
> Enable group IO scheduling in CFQ.
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1fa2654..6c73380 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
> struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
> +static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
> + struct cgroup *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> + struct task_struct *, bool);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> + struct cgroup *, struct task_struct *, bool);
> +static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
> +static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
> +
> +struct cgroup_subsys blkio_subsys = {
> + .name = "blkio",
> + .create = blkiocg_create,
> + .can_attach = blkiocg_can_attach,
> + .attach = blkiocg_attach,
> + .destroy = blkiocg_destroy,
> + .populate = blkiocg_populate,
> +#ifdef CONFIG_BLK_CGROUP
> + /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
> + .subsys_id = blkio_subsys_id,
> +#endif
> + .use_id = 1,
> + .module = THIS_MODULE,
> +};
> +EXPORT_SYMBOL_GPL(blkio_subsys);

Why are you moving blkio_subsys declaration up in the file. That way now
you have to declare all the functions before you can instanciate
blkio_subsys?

> +
> bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
> {
> if (!css_tryget(&blkcg->css))
> @@ -267,7 +292,8 @@ remove_entry:
> done:
> free_css_id(&blkio_subsys, &blkcg->css);
> rcu_read_unlock();
> - kfree(blkcg);
> + if (blkcg != &blkio_root_cgroup)
> + kfree(blkcg);
> }
>
> static struct cgroup_subsys_state *
> @@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
> task_unlock(tsk);
> }
>
> -struct cgroup_subsys blkio_subsys = {
> - .name = "blkio",
> - .create = blkiocg_create,
> - .can_attach = blkiocg_can_attach,
> - .attach = blkiocg_attach,
> - .destroy = blkiocg_destroy,
> - .populate = blkiocg_populate,
> - .subsys_id = blkio_subsys_id,
> - .use_id = 1,
> -};
> -
> void blkio_policy_register(struct blkio_policy_type *blkiop)
> {
> spin_lock(&blkio_list_lock);
> @@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
> spin_unlock(&blkio_list_lock);
> }
> EXPORT_SYMBOL_GPL(blkio_policy_unregister);
> +
> +static int __init init_cgroup_blkio(void)
> +{
> + return cgroup_load_subsys(&blkio_subsys);
> +}
> +
> +static void __exit exit_cgroup_blkio(void)
> +{
> + cgroup_unload_subsys(&blkio_subsys);
> +}
> +
> +module_init(init_cgroup_blkio);
> +module_exit(exit_cgroup_blkio);
> +MODULE_LICENSE("GPL");
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 4d316df..57648c6 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -15,7 +15,13 @@
>
> #include <linux/cgroup.h>
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> +
> +#ifndef CONFIG_BLK_CGROUP
> +/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
> +extern struct cgroup_subsys blkio_subsys;
> +#define blkio_subsys_id blkio_subsys.subsys_id
> +#endif
>

Could above if conditions be simplified to just check for BLK_CGROUP_MODULE.

#ifdef CONFIG_BLK_CGROUP_MODULE
extern struct cgroup_subsys blkio_subsys;
#define blkio_subsys_id blkio_subsys.subsys_id
#endif

Thanks
Vivek

> struct blkio_cgroup {
> struct cgroup_subsys_state css;
> @@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
> struct blkio_group *blkg, unsigned long dequeue) {}
> #endif
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> extern struct blkio_cgroup blkio_root_cgroup;
> extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index a632359..b9f109d 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -68,7 +68,7 @@ struct io_context {
> unsigned short ioprio;
> unsigned short ioprio_changed;
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> unsigned short cgroup_changed;
> #endif
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5af59eb..391ff41 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -690,6 +690,7 @@ void cgroup_lock(void)
> {
> mutex_lock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock);
>
> /**
> * cgroup_unlock - release lock on cgroup changes
> @@ -700,6 +701,7 @@ void cgroup_unlock(void)
> {
> mutex_unlock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_unlock);
>
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> @@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
>
> static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
> const char *buffer)
> @@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
> rcu_read_unlock();
> WARN_ON_ONCE(val < 1);
> }
> +EXPORT_SYMBOL_GPL(__css_put);
>
> /*
> * Notify userspace when a cgroup is released, by running the
> @@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
> return cssid->id;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_id);
>
> unsigned short css_depth(struct cgroup_subsys_state *css)
> {
> @@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
> return cssid->depth;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_depth);
>
> bool css_is_ancestor(struct cgroup_subsys_state *child,
> const struct cgroup_subsys_state *root)
> @@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
> spin_unlock(&ss->id_lock);
> call_rcu(&id->rcu_head, __free_css_id_cb);
> }
> +EXPORT_SYMBOL_GPL(free_css_id);
>
> /*
> * This is called by init or create(). Then, calls to this function are
> @@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
>
> return rcu_dereference(cssid->css);
> }
> +EXPORT_SYMBOL_GPL(css_lookup);
>
> /**
> * css_get_next - lookup next cgroup under specified hierarchy.

Vivek Goyal

unread,

Jan 8, 2010, 11:34:28 AM1/8/10

to Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bb...@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>

> Signed-off-by: Ben Blum <bb...@andrew.cmu.edu>

Two quick observations with testing.

You need to EXPORT cgroup_path.

Second, after loading the module, I mounted the blkio controller. But creating
a cgroup directory crashed.

Vivek

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
PGD 1a7fe0067 PUD 1a0f19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Pid: 3984, comm: mkdir Not tainted 2.6.33-rc3-cgroup-module #2 /ProLiant DL380 G5
RIP: 0010:[<ffffffff8106bcb4>] [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP: 0018:ffff8801a0f17e38 EFLAGS: 00010203
RAX: ffff8801a8cbde00 RBX: ffffffffa02cc8e0 RCX: ffff8801a8cbde00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff8801a770c000 R08: 0000000000000040 R09: ffff8801a0f17d48
R10: ffff8801a0f17ec8 R11: ffffffff8113efb4 R12: 0000000000000001
R13: ffff8801a0f22000 R14: ffff8801948aa980 R15: ffff8801a0f22030
FS: 00007ff9ecf5a710(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 00000001a257e000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkdir (pid: 3984, threadinfo ffff8801a0f16000, task ffff8801a8e2c040)
Stack:
0000000000000003 000001ed948a8db0 ffff8801a0e1e800 0000000000000000
<0> ffff8801a7745940 ffff8801948a8db0 ffff8801948aa980 0000000000000000
<0> 00000000000001ed 00007fff661cb6eb 0000000000000000 ffffffff810d837e
Call Trace:
[<ffffffff810d837e>] ? vfs_mkdir+0xd5/0x164
[<ffffffff810da743>] ? sys_mkdirat+0x85/0xd1
[<ffffffff810773be>] ? audit_syscall_entry+0x1b9/0x1e4
[<ffffffff8100286b>] ? system_call_fastpath+0x16/0x1b
Code: 4c 24 18 e8 82 f3 ff ff 31 f6 48 3d 00 f0 ff ff 48 89 c1 76 20 85 c0 74 35 45 31 e4 e9 5a 01 00 00 48 8b 7c 24 18 48 63 d6 ff c6 <66> 8b 44 57 20 66 89 44 51 20 44 39 e6 7c e7 8b 51 08 49 63 c4
RIP [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP <ffff8801a0f17e38>
CR2: 0000000000000020
---[ end trace 09377bc5e5c2b563 ]---

KAMEZAWA Hiroyuki

unread,

Jan 11, 2010, 7:24:30 PM1/11/10

to Vivek Goyal, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Fri, 8 Jan 2010 10:10:38 -0500
Vivek Goyal <vgo...@redhat.com> wrote:

> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <bb...@andrew.cmu.edu>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
>
> Hi Ben,
>
> I will give this patch a try.
>
> So from blk-cgroup perspective, the advantage of allowing it as module
> will be that we can save some memory if we are not using the controller?
>

Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
introduced ?

Thanks,
-Kame

Ben Blum

unread,

Jan 12, 2010, 7:35:18 PM1/12/10

to Vivek Goyal, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org, bb...@andrew.cmu.edu

On Fri, Jan 08, 2010 at 11:33:52AM -0500, Vivek Goyal wrote:
> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <bb...@andrew.cmu.edu>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
> > Signed-off-by: Ben Blum <bb...@andrew.cmu.edu>
>
> Two quick observations with testing.
>
> You need to EXPORT cgroup_path.
>
> Second, after loading the module, I mounted the blkio controller. But creating
> a cgroup directory crashed.
>
> Vivek

argh, good catches on both of them. didn't test with DEBUG_CFQ_IOSCHED
(for cgroup_path) or with making a sub-cgroup (for the crash); shame on
me. turns out it crashed because I had init_idr before init_css_set, and
init_css_set sets css->id = NULL explicitly. fixed patches forthcoming.

-- bblum

kernel/cgroup.c | 34 ++++++++++++++++++++++++-----
6 files changed, 80 insertions(+), 23 deletions(-)

Ben Blum

unread,

Jan 12, 2010, 7:35:22 PM1/12/10

to Ben Blum, Vivek Goyal, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

cgroups-blkio-as-module.patch

Ben Blum

unread,

Jan 12, 2010, 7:35:50 PM1/12/10

to Ben Blum, Vivek Goyal, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

cgroups-module-use_id-support.patch

Jens Axboe

unread,

Jan 14, 2010, 4:16:11 AM1/14/10

to Ben Blum, Vivek Goyal, Li Zefan, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Tue, Jan 12 2010, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bb...@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.

I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
cgroup side of things (for patch 1/2).

--
Jens Axboe

Li Zefan

unread,

Jan 14, 2010, 4:29:44 AM1/14/10

to Jens Axboe, Ben Blum, Vivek Goyal, ry...@valinux.co.jp, KAMEZAWA Hiroyuki, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

Jens Axboe wrote:
> On Tue, Jan 12 2010, Ben Blum wrote:
>> Convert blk-cgroup to be buildable as a module
>>
>> From: Ben Blum <bb...@andrew.cmu.edu>
>>
>> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> enhanced to support the new module dependency.
>
> I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
> cgroup side of things (for patch 1/2).
>

Yes, it's for .34. But these 2 patches can't be applied to block-tree,
because the patches that implement cgroup modular feature are in -mm,
and they will go into .34.

Balbir Singh

unread,

Jan 14, 2010, 4:32:24 AM1/14/10

to KAMEZAWA Hiroyuki, Vivek Goyal, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
<kamezaw...@jp.fujitsu.com> wrote:
> On Fri, 8 Jan 2010 10:10:38 -0500
> Vivek Goyal <vgo...@redhat.com> wrote:
>
>> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
>> > Convert blk-cgroup to be buildable as a module
>> >
>> > From: Ben Blum <bb...@andrew.cmu.edu>
>> >
>> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> > enhanced to support the new module dependency.
>> >
>>
>> Hi Ben,
>>
>> I will give this patch a try.
>>
>> So from blk-cgroup perspective, the advantage of allowing it as module
>> will be that we can save some memory if we are not using the controller?
>>
> Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> introduced ?
>

My guess is it won't be, unless we start exposing page_cgroup API and
then make the module depend on memcg.

Balbir

Vivek Goyal

unread,

Jan 14, 2010, 6:43:45 AM1/14/10

to Balbir Singh, KAMEZAWA Hiroyuki, Li Zefan, ax...@kernel.dk, ry...@valinux.co.jp, Andrew Morton, men...@google.com, conta...@lists.linux-foundation.org, linux-...@vger.kernel.org

On Thu, Jan 14, 2010 at 03:02:09PM +0530, Balbir Singh wrote:
> On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
> <kamezaw...@jp.fujitsu.com> wrote:
> > On Fri, 8 Jan 2010 10:10:38 -0500
> > Vivek Goyal <vgo...@redhat.com> wrote:
> >
> >> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> >> > Convert blk-cgroup to be buildable as a module
> >> >
> >> > From: Ben Blum <bb...@andrew.cmu.edu>
> >> >
> >> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> >> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> >> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> >> > enhanced to support the new module dependency.
> >> >
> >>
> >> Hi Ben,
> >>
> >> I will give this patch a try.
> >>
> >> So from blk-cgroup perspective, the advantage of allowing it as module
> >> will be that we can save some memory if we are not using the controller?
> >>
> > Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> > introduced ?
> >
>
> My guess is it won't be, unless we start exposing page_cgroup API and
> then make the module depend on memcg.

I think I agree. When we introduce page_cgroup based page tracking, either
we need to export page_cgroup API or we can force blkio controller to
compile as in-kernel if user selects the CONFIG_PAGE_TRACKING option.

So as of now, I can't think why we should not we allow compiling blkio as
module as long as core cgroup functionality supports it safely.

Thanks
Vivek