Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 3/3] mm: audit/fix non-modular users of module_init in core code

2 views
Skip to first unread message

Paul Gortmaker

unread,
Jan 14, 2014, 3:50:02 PM1/14/14
to
Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular. So using module_init as
an alias for __initcall can be somewhat misleading.

Fix these up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

The audit targets the following module_init users for change:
mm/ksm.c bool KSM
mm/mmap.c bool MMU
mm/huge_memory.c bool TRANSPARENT_HUGEPAGE
mm/mmu_notifier.c bool MMU_NOTIFIER

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for these files) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).

However no observable impact of that difference has been observed
during testing.

One might think that core_initcall (l2) or postcore_initcall (l3)
would be more appropriate for anything in mm/ but if we look
at some actual init functions themselves, we see things like:

mm/huge_memory.c --> hugepage_init --> hugepage_init_sysfs
mm/mmap.c --> init_user_reserve --> sysctl_user_reserve_kbytes
mm/ksm.c --> ksm_init --> sysfs_create_group

and hence the choice of subsys_initcall (l4) seems reasonable,
and at the same time minimizes the risk of changing the priority
too drastically all at once. We can adjust further in the future.

Also, several instances of missing ";" at EOL are fixed.

Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>
---
mm/huge_memory.c | 2 +-
mm/ksm.c | 2 +-
mm/mmap.c | 6 +++---
mm/mmu_notifier.c | 3 +--
4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95d1acb0f3d2..c6e7e0f60708 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -655,7 +655,7 @@ out:
hugepage_exit_sysfs(hugepage_kobj);
return err;
}
-module_init(hugepage_init)
+subsys_initcall(hugepage_init);

static int __init setup_transparent_hugepage(char *str)
{
diff --git a/mm/ksm.c b/mm/ksm.c
index 175fff79dc95..dce1ad1cb696 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2438,4 +2438,4 @@ out_free:
out:
return err;
}
-module_init(ksm_init)
+subsys_initcall(ksm_init);
diff --git a/mm/mmap.c b/mm/mmap.c
index 834b2d785f1e..0a7717664c9e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3140,7 +3140,7 @@ static int init_user_reserve(void)
sysctl_user_reserve_kbytes = min(free_kbytes / 32, 1UL << 17);
return 0;
}
-module_init(init_user_reserve)
+subsys_initcall(init_user_reserve);

/*
* Initialise sysctl_admin_reserve_kbytes.
@@ -3161,7 +3161,7 @@ static int init_admin_reserve(void)
sysctl_admin_reserve_kbytes = min(free_kbytes / 32, 1UL << 13);
return 0;
}
-module_init(init_admin_reserve)
+subsys_initcall(init_admin_reserve);

/*
* Reinititalise user and admin reserves if memory is added or removed.
@@ -3231,4 +3231,4 @@ static int __meminit init_reserve_notifier(void)

return 0;
}
-module_init(init_reserve_notifier)
+subsys_initcall(init_reserve_notifier);
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 93e6089cb456..41cefdf0aadd 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -329,5 +329,4 @@ static int __init mmu_notifier_init(void)
{
return init_srcu_struct(&srcu);
}
-
-module_init(mmu_notifier_init);
+subsys_initcall(mmu_notifier_init);
--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Paul Gortmaker

unread,
Jan 14, 2014, 3:50:03 PM1/14/14
to
The use of __initcall is to be eventually replaced by choosing
one from the prioritized groupings laid out in init.h header:

pure_initcall 0
core_initcall 1
postcore_initcall 2
arch_initcall 3
subsys_initcall 4
fs_initcall 5
device_initcall 6
late_initcall 7

In the interim, all __initcall are mapped onto device_initcall,
which as can be seen above, comes quite late in the ordering.

Currently the mm_kobj is created with __initcall in mm_sysfs_init().
This means that any other initcalls that want to reference the
mm_kobj have to be device_initcall (or later), otherwise we will
for example, trip the BUG_ON(!kobj) in sysfs's internal_create_group().
This unfairly restricts those users; for example something that clearly
makes sense to be an arch_initcall will not be able to choose that.

However, upon examination, it is only this way for historical
reasons (i.e. simply not reprioritized yet). We see that sysfs is
ready quite earlier in init/main.c via:

vfs_caches_init
|_ mnt_init
|_ sysfs_init

well ahead of the processing of the prioritized calls listed above.

So we can recategorize mm_sysfs_init to be a pure_initcall, which
in turn allows any mm_kobj initcall users a wider range (1 --> 7)
of initcall priorities to choose from.

Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>
---
mm/mm_init.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 68562e92d50c..857a6434e3a5 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -202,5 +202,4 @@ static int __init mm_sysfs_init(void)

return 0;
}
-
-__initcall(mm_sysfs_init);
+pure_initcall(mm_sysfs_init);

Paul Gortmaker

unread,
Jan 14, 2014, 3:50:03 PM1/14/14
to
Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular. So using module_init as
an alias for __initcall can be somewhat misleading.

Fix these up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

The audit targets the following module_init users for change:
kernel/user.c obj-y
kernel/kexec.c bool KEXEC (one instance per arch)
kernel/profile.c bool PROFILING
kernel/hung_task.c bool DETECT_HUNG_TASK
kernel/sched/stats.c bool SCHEDSTATS
kernel/user_namespace.c bool USER_NS

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for these files) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that difference has been observed
during testing.

Also, two instances of missing ";" at EOL are fixed in kexec.

Cc: Ingo Molnar <mi...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Eric Biederman <ebie...@xmission.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>
---
kernel/hung_task.c | 3 +--
kernel/kexec.c | 4 ++--
kernel/profile.c | 2 +-
kernel/sched/stats.c | 2 +-
kernel/user.c | 3 +--
kernel/user_namespace.c | 2 +-
6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 9328b80eaf14..7899ee9dd212 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -244,5 +244,4 @@ static int __init hung_task_init(void)

return 0;
}
-
-module_init(hung_task_init);
+subsys_initcall(hung_task_init);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 9c970167e402..418f069b0314 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1234,7 +1234,7 @@ static int __init crash_notes_memory_init(void)
}
return 0;
}
-module_init(crash_notes_memory_init)
+subsys_initcall(crash_notes_memory_init);


/*
@@ -1628,7 +1628,7 @@ static int __init crash_save_vmcoreinfo_init(void)
return 0;
}

-module_init(crash_save_vmcoreinfo_init)
+subsys_initcall(crash_save_vmcoreinfo_init);

/*
* Move into place and start executing a preloaded standalone
diff --git a/kernel/profile.c b/kernel/profile.c
index 6631e1ef55ab..b37576b22acc 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -604,5 +604,5 @@ int __ref create_proc_profile(void) /* false positive from hotcpu_notifier */
hotcpu_notifier(profile_cpu_callback, 0);
return 0;
}
-module_init(create_proc_profile);
+subsys_initcall(create_proc_profile);
#endif /* CONFIG_PROC_FS */
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index da98af347e8b..a476bea17fbc 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -142,4 +142,4 @@ static int __init proc_schedstat_init(void)
proc_create("schedstat", 0, NULL, &proc_schedstat_operations);
return 0;
}
-module_init(proc_schedstat_init);
+subsys_initcall(proc_schedstat_init);
diff --git a/kernel/user.c b/kernel/user.c
index c006131beb77..294fc6a94168 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -222,5 +222,4 @@ static int __init uid_cache_init(void)

return 0;
}
-
-module_init(uid_cache_init);
+subsys_initcall(uid_cache_init);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 240fb62cf394..4f211868e6a2 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -902,4 +902,4 @@ static __init int user_namespaces_init(void)
user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC);
return 0;
}
-module_init(user_namespaces_init);
+subsys_initcall(user_namespaces_init);

Paul Gortmaker

unread,
Jan 14, 2014, 3:50:03 PM1/14/14
to
This series had an interesting genesis in chain on effects, typical
of how things can creep and spill over.

I wanted to clobber pointless instances of #include <linux/init.h>
mostly left behind from __cpuinit and __devinit removal. But to
fully complete that, I had to plan to move module_init into module.h;
which meant to get rid of the non-modular callers of module_init().

But I couldn't replace them with the 1:1 mapping to __initcall(),
because we aren't supposed to use that anymore. And finally, I
couldn't just use the 1:1 mapping of __initcall to device_initcall(),
because it looks like crap to be using device_initcall in what is
clearly not device/driver code. And hence we end up being faced with
checking and/or changing initcall ordering.

Here we fix up kernel/ and mm/ -- the one point of interest was
uncovering an oops when trying to make ksm_init a subsys_initcall,
which in turn led to the 1st patch in the series, which reprioritized
creation of the mm_kobj.

There are other __initcall in mm/ that we probably want to look at
and reprioritize in the future. But for now I just fixed the one that
was obviously problematic and blocking the other required priority
changes in ksm.c and huge_memory.c

Boot tested on today's master, with x86-32 and powerpc (sbc8548).

For completeness, here is what happens when one tries to make ksm_init
as subsys_initcall, if the mm_kobj is created too late in device_initcall.

Paul.

------------[ cut here ]------------
kernel BUG at fs/sysfs/group.c:94!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc8+ #12
Hardware name: Dell Computer Corporation OptiPlex GX270 /0Y1057, BIOS A07 06/26/2006
task: f5860000 ti: f5846000 task.ti: f5846000
EIP: 0060:[<c11884d4>] EFLAGS: 00010246 CPU: 0
EIP is at internal_create_group+0x244/0x270
EAX: 00000000 EBX: c199e5c9 ECX: c1929f60 EDX: 00000000
ESI: f59947c0 EDI: 00000000 EBP: f5847ed8 ESP: f5847ea4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffe14000 CR3: 01a2b000 CR4: 000007d0
Stack:
f59947c0 f5847ed4 c106fdfb 00000000 34561000 00000000 00000000 00000000
c1929f60 f59947c0 c199e5c9 f59947c0 00000000 f5847ee0 c118852c f5847f00
c199e68c c111cde0 00000000 ffffffff c1856da4 c199e5c9 00000004 f5847f70
Call Trace:
[<c106fdfb>] ? try_to_wake_up+0x18b/0x230
[<c199e5c9>] ? procswaps_init+0x27/0x27
[<c118852c>] sysfs_create_group+0xc/0x10
[<c199e68c>] ksm_init+0xc3/0x15b
[<c111cde0>] ? run_store+0x2b0/0x2b0
[<c199e5c9>] ? procswaps_init+0x27/0x27
[<c10004b2>] do_one_initcall+0xe2/0x140
[<c1061463>] ? parameq+0x13/0x70
[<c197e4cc>] ? do_early_param+0x7a/0x7a
[<c1061699>] ? parse_args+0x1d9/0x340
[<c107c6a0>] ? __wake_up+0x40/0x50
[<c197eb73>] kernel_init_freeable+0x129/0x1cd
[<c197e4cc>] ? do_early_param+0x7a/0x7a
[<c16bd19b>] kernel_init+0xb/0x100
[<c16d44b7>] ret_from_kernel_thread+0x1b/0x28
[<c16bd190>] ? rest_init+0x60/0x60

---

Paul Gortmaker (3):
mm: make creation of the mm_kobj happen earlier than device_initcall
kernel: audit/fix non-modular users of module_init in core code
mm: audit/fix non-modular users of module_init in core code

kernel/hung_task.c | 3 +--
kernel/kexec.c | 4 ++--
kernel/profile.c | 2 +-
kernel/sched/stats.c | 2 +-
kernel/user.c | 3 +--
kernel/user_namespace.c | 2 +-
mm/huge_memory.c | 2 +-
mm/ksm.c | 2 +-
mm/mm_init.c | 3 +--
mm/mmap.c | 6 +++---
mm/mmu_notifier.c | 3 +--
11 files changed, 14 insertions(+), 18 deletions(-)
0 new messages