[PATCH] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()

36 views
Skip to first unread message

Vlastimil Babka

unread,
Oct 7, 2021, 5:58:43 AM10/7/21
to Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
from memblock, even if stack depot ends up not actually used. The default size
of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option and
has overhead on its own. But it's an issue for functionality that has to be
actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
and thus the memory might be wasted. This was raised as an issue when trying
to add stackdepot support for SLUB's debug object tracking functionality. It's
common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot
only when needed, or create specific kmem caches with debugging for testing
purposes.

It would thus be more efficient if stackdepot's table was allocated only when
actually going to be used. This patch thus makes the allocation (and whole
stack_depot_init() call) optional:

- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
select this flag.
- Other users have to call stack_depot_init() as part of their own init when
it's determined that stack depot will actually be used. This may depend on
both config and runtime conditions. Convert current users which are
page_owner and several in the DRM subsystem. Same will be done for SLUB
later.
- Because the init might now be called after the boot-time memblock allocation
has given all memory to the buddy allocator, change stack_depot_init() to
allocate stack_table with kvmalloc() when memblock is no longer available.
Also handle allocation failure by disabling stackdepot (could have
theoretically happened even with memblock allocation previously), and don't
unnecessarily align the memblock allocation to its own size anymore.

[1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiL...@mail.gmail.com/

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
Cc: Marco Elver <el...@google.com>
Cc: Vijayanand Jitta <vji...@codeaurora.org>
Cc: Maarten Lankhorst <maarten....@linux.intel.com>
Cc: Maxime Ripard <mri...@kernel.org>
Cc: Thomas Zimmermann <tzimm...@suse.de>
Cc: David Airlie <air...@linux.ie>
Cc: Daniel Vetter <dan...@ffwll.ch>
Cc: Andrey Ryabinin <ryabin...@gmail.com>
Cc: Alexander Potapenko <gli...@google.com>
Cc: Andrey Konovalov <andre...@gmail.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Geert Uytterhoeven <ge...@linux-m68k.org>
Cc: Oliver Glitta <gli...@gmail.com>
Cc: Imran Khan <imran....@oracle.com>
---
Hi, I'd appreciate review of the DRM parts - namely that I've got correctly
that stack_depot_init() is called from the proper init functions and iff
stack_depot_save() is going to be used later. Thanks!

drivers/gpu/drm/drm_dp_mst_topology.c | 1 +
drivers/gpu/drm/drm_mm.c | 4 ++++
drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++
include/linux/stackdepot.h | 19 ++++++++-------
init/main.c | 3 ++-
lib/Kconfig | 3 +++
lib/Kconfig.kasan | 1 +
lib/stackdepot.c | 32 +++++++++++++++++++++----
mm/page_owner.c | 2 ++
9 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 2d1adab9e360..bbe972d59dae 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5490,6 +5490,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr,
mutex_init(&mgr->probe_lock);
#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+ stack_depot_init();
#endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 7d1c578388d3..8257f9d4f619 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
add_hole(&mm->head_node);

mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+ stack_depot_init();
+#endif
}
EXPORT_SYMBOL(drm_mm_init);

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 0d85f3c5c526..806c32ab410b 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -68,6 +68,9 @@ static noinline depot_stack_handle_t __save_depot_stack(void)
static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
{
spin_lock_init(&rpm->debug.lock);
+
+ if (rpm->available)
+ stack_depot_init();
}

static noinline depot_stack_handle_t
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index c34b55a6e554..60ba99a43745 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,6 +15,16 @@

typedef u32 depot_stack_handle_t;

+/*
+ * Every user of stack depot has to call this during its own init when it's
+ * decided that it will be calling stack_depot_save() later.
+ *
+ * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
+ * enabled as part of mm_init(), for subsystems where it's known at compile time
+ * that stack depot will be used.
+ */
+int stack_depot_init(void);
+
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t gfp_flags, bool can_alloc);
@@ -30,13 +40,4 @@ int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,

void stack_depot_print(depot_stack_handle_t stack);

-#ifdef CONFIG_STACKDEPOT
-int stack_depot_init(void);
-#else
-static inline int stack_depot_init(void)
-{
- return 0;
-}
-#endif /* CONFIG_STACKDEPOT */
-
#endif
diff --git a/init/main.c b/init/main.c
index ee4d3e1b3eb9..b6a5833d98f5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -844,7 +844,8 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
- stack_depot_init();
+ if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT))
+ stack_depot_init();
mem_init();
mem_init_print_info();
/* page_owner must be initialized after buddy is ready */
diff --git a/lib/Kconfig b/lib/Kconfig
index 5e7165e6a346..df6bcf0a4cc3 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -671,6 +671,9 @@ config STACKDEPOT
bool
select STACKTRACE

+config STACKDEPOT_ALWAYS_INIT
+ bool
+
config STACK_HASH_ORDER
int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
range 12 20
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cdc842d090db..695deb603c66 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -39,6 +39,7 @@ menuconfig KASAN
HAVE_ARCH_KASAN_HW_TAGS
depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB)
select STACKDEPOT
+ select STACKDEPOT_ALWAYS_INIT
help
Enables KASAN (KernelAddressSANitizer) - runtime memory debugger,
designed to find out-of-bounds accesses and use-after-free bugs.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index b437ae79aca1..a4f449ccd0dc 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -23,6 +23,7 @@
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
#include <linux/slab.h>
@@ -145,6 +146,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
#define STACK_HASH_SEED 0x9747b28c

+DEFINE_MUTEX(stack_depot_init_mutex);
static bool stack_depot_disable;
static struct stack_record **stack_table;

@@ -161,18 +163,38 @@ static int __init is_stack_depot_disabled(char *str)
}
early_param("stack_depot_disable", is_stack_depot_disabled);

-int __init stack_depot_init(void)
+/*
+ * __ref because of memblock_alloc(), which will not be actually called after
+ * the __init code is gone
+ */
+__ref int stack_depot_init(void)
{
- if (!stack_depot_disable) {
+ mutex_lock(&stack_depot_init_mutex);
+ if (!stack_depot_disable && stack_table == NULL) {
size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
int i;

- stack_table = memblock_alloc(size, size);
- for (i = 0; i < STACK_HASH_SIZE; i++)
- stack_table[i] = NULL;
+ if (slab_is_available()) {
+ pr_info("Stack Depot allocating hash table with kvmalloc\n");
+ stack_table = kvmalloc(size, GFP_KERNEL);
+ } else {
+ pr_info("Stack Depot allocating hash table with memblock_alloc\n");
+ stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+ }
+ if (stack_table) {
+ for (i = 0; i < STACK_HASH_SIZE; i++)
+ stack_table[i] = NULL;
+ } else {
+ pr_err("Stack Depot failed hash table allocationg, disabling\n");
+ stack_depot_disable = true;
+ mutex_unlock(&stack_depot_init_mutex);
+ return -ENOMEM;
+ }
}
+ mutex_unlock(&stack_depot_init_mutex);
return 0;
}
+EXPORT_SYMBOL_GPL(stack_depot_init);

/* Calculate hash for a stack */
static inline u32 hash_stack(unsigned long *entries, unsigned int size)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index a83f546c06b5..a48607b51a97 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -80,6 +80,8 @@ static void init_page_owner(void)
if (!page_owner_enabled)
return;

+ stack_depot_init();
+
register_dummy_stack();
register_failure_stack();
register_early_stack();
--
2.33.0

Marco Elver

unread,
Oct 7, 2021, 7:01:59 AM10/7/21
to Vlastimil Babka, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
On Thu, Oct 07, 2021 at 11:58AM +0200, Vlastimil Babka wrote:
[...]
> - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
> well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
> select this flag.
> - Other users have to call stack_depot_init() as part of their own init when
> it's determined that stack depot will actually be used. This may depend on
> both config and runtime conditions. Convert current users which are
> page_owner and several in the DRM subsystem. Same will be done for SLUB
> later.
> - Because the init might now be called after the boot-time memblock allocation
> has given all memory to the buddy allocator, change stack_depot_init() to
> allocate stack_table with kvmalloc() when memblock is no longer available.
> Also handle allocation failure by disabling stackdepot (could have
> theoretically happened even with memblock allocation previously), and don't
> unnecessarily align the memblock allocation to its own size anymore.
...
> Hi, I'd appreciate review of the DRM parts - namely that I've got correctly
> that stack_depot_init() is called from the proper init functions and iff
> stack_depot_save() is going to be used later. Thanks!

For ease of review between stackdepot and DRM changes, I thought it'd be
nice to split into 2 patches, but not sure it'll work, because you're
changing the semantics of the normal STACKDEPOT.

One option would be to flip it around, and instead have
STACKDEPOT_LAZY_INIT, but that seems counter-intuitive if the majority
of STACKDEPOT users are LAZY_INIT users.

On the other hand, the lazy initialization mode you're introducing
requires an explicit stack_depot_init() call somewhere and isn't as
straightforward as before.

Not sure what is best. My intuition tells me STACKDEPOT_LAZY_INIT would
be safer as it's a deliberate opt-in to the lazy initialization
behaviour.

Preferences?

[...]
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
> add_hole(&mm->head_node);
>
> mm->scan_active = 0;
> +
> +#ifdef CONFIG_DRM_DEBUG_MM
> + stack_depot_init();
> +#endif

DRM_DEBUG_MM implies STACKDEPOT. Not sure what is more readable to drm
maintainers, but perhaps it'd be nicer to avoid the #ifdef here, and
instead just keep the no-op version of stack_depot_init() in
<linux/stackdepot.h>. I don't have a strong preference.
Could we avoid the IS_ENABLED() in init/main.c by adding a wrapper here:

+#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
+static inline int stack_depot_early_init(void) { return stack_depot_init(); }
+#else
+static inline int stack_depot_early_init(void) { return 0; }
+#endif /* CONFIG_STACKDEPOT_ALWAYS_INIT */

> #endif
> diff --git a/init/main.c b/init/main.c
> index ee4d3e1b3eb9..b6a5833d98f5 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -844,7 +844,8 @@ static void __init mm_init(void)
> init_mem_debugging_and_hardening();
> kfence_alloc_pool();
> report_meminit();
> - stack_depot_init();
> + if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT))
> + stack_depot_init();

I'd push the decision of when to call this into <linux/stackdepot.h> via
wrapper stack_depot_early_init().

> mem_init();
> mem_init_print_info();
> /* page_owner must be initialized after buddy is ready */
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 5e7165e6a346..df6bcf0a4cc3 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -671,6 +671,9 @@ config STACKDEPOT
> bool
> select STACKTRACE
>
> +config STACKDEPOT_ALWAYS_INIT
> + bool

It looks like every users of STACKDEPOT_ALWAYS_INIT will also select
STACKDEPOT, so we could just make this:

+config STACKDEPOT_ALWAYS_INIT
+ bool
+ select STACKDEPOT

And remove the redundant 'select STACKDEPOT' in Kconfig.kasan.

> config STACK_HASH_ORDER
> int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
> range 12 20
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index cdc842d090db..695deb603c66 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -39,6 +39,7 @@ menuconfig KASAN
> HAVE_ARCH_KASAN_HW_TAGS
> depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB)
> select STACKDEPOT
> + select STACKDEPOT_ALWAYS_INIT

[...]
>
> -int __init stack_depot_init(void)
> +/*
> + * __ref because of memblock_alloc(), which will not be actually called after
> + * the __init code is gone

The reason is that after __init code is gone, slab_is_available() will
be true (might be worth adding to the comment).

> + */
> +__ref int stack_depot_init(void)
> {
> - if (!stack_depot_disable) {
> + mutex_lock(&stack_depot_init_mutex);
> + if (!stack_depot_disable && stack_table == NULL) {
> size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));

Thanks,
-- Marco

Dmitry Vyukov

unread,
Oct 7, 2021, 7:01:59 AM10/7/21
to Vlastimil Babka, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Acked-by: Dmitry Vyukov <dvy...@google.com>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20211007095815.3563-1-vbabka%40suse.cz.

Vlastimil Babka

unread,
Oct 11, 2021, 1:02:26 PM10/11/21
to Marco Elver, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
On 10/7/21 13:01, Marco Elver wrote:
> On Thu, Oct 07, 2021 at 11:58AM +0200, Vlastimil Babka wrote:
> [...]
>> - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
>> well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
>> select this flag.
>> - Other users have to call stack_depot_init() as part of their own init when
>> it's determined that stack depot will actually be used. This may depend on
>> both config and runtime conditions. Convert current users which are
>> page_owner and several in the DRM subsystem. Same will be done for SLUB
>> later.
>> - Because the init might now be called after the boot-time memblock allocation
>> has given all memory to the buddy allocator, change stack_depot_init() to
>> allocate stack_table with kvmalloc() when memblock is no longer available.
>> Also handle allocation failure by disabling stackdepot (could have
>> theoretically happened even with memblock allocation previously), and don't
>> unnecessarily align the memblock allocation to its own size anymore.
> ...
>> Hi, I'd appreciate review of the DRM parts - namely that I've got correctly
>> that stack_depot_init() is called from the proper init functions and iff
>> stack_depot_save() is going to be used later. Thanks!
>
> For ease of review between stackdepot and DRM changes, I thought it'd be
> nice to split into 2 patches, but not sure it'll work, because you're
> changing the semantics of the normal STACKDEPOT.

Yeah, that's why it's a single patch. As the DRM parts are clearly separated
to their files, I think review should be fine.

> One option would be to flip it around, and instead have
> STACKDEPOT_LAZY_INIT, but that seems counter-intuitive if the majority
> of STACKDEPOT users are LAZY_INIT users.

Agree.

> On the other hand, the lazy initialization mode you're introducing
> requires an explicit stack_depot_init() call somewhere and isn't as
> straightforward as before.
>
> Not sure what is best. My intuition tells me STACKDEPOT_LAZY_INIT would
> be safer as it's a deliberate opt-in to the lazy initialization
> behaviour.

I think it should be fine with ALWAYS_INIT. There are not many stackdepot
users being added, and anyone developing a new one will very quickly find
out if they forget to call stack_depot_init()?

> Preferences?
>
> [...]
>> --- a/drivers/gpu/drm/drm_mm.c
>> +++ b/drivers/gpu/drm/drm_mm.c
>> @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
>> add_hole(&mm->head_node);
>>
>> mm->scan_active = 0;
>> +
>> +#ifdef CONFIG_DRM_DEBUG_MM
>> + stack_depot_init();
>> +#endif
>
> DRM_DEBUG_MM implies STACKDEPOT. Not sure what is more readable to drm
> maintainers, but perhaps it'd be nicer to avoid the #ifdef here, and
> instead just keep the no-op version of stack_depot_init() in
> <linux/stackdepot.h>. I don't have a strong preference.

Hm, but in case STACKDEPOT is also selected by something else (e.g.
CONFIG_PAGE_OWNER) which uses lazy init but isn't enabled on boot, then
without #ifdef CONFIG_DRM_DEBUG_MM above, this code would call a
stack_depot_init() (that's not a no-op) even in case it's not going to be
using it, so not what we want to achieve.
But it could be changed to use IS_ENABLED() if that's preferred by DRM folks.

BTW it's possible that there won't be any DRM review because this failed to
apply:
https://patchwork.freedesktop.org/series/95549/
DRM folks, any hint how to indicate that the base was next-20211001?

>> @@ -30,13 +40,4 @@ int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
>>
>> void stack_depot_print(depot_stack_handle_t stack);
>>
>> -#ifdef CONFIG_STACKDEPOT
>> -int stack_depot_init(void);
>> -#else
>> -static inline int stack_depot_init(void)
>> -{
>> - return 0;
>> -}
>> -#endif /* CONFIG_STACKDEPOT */
>> -
>
> Could we avoid the IS_ENABLED() in init/main.c by adding a wrapper here:
>
> +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> +static inline int stack_depot_early_init(void) { return stack_depot_init(); }
> +#else
> +static inline int stack_depot_early_init(void) { return 0; }
> +#endif /* CONFIG_STACKDEPOT_ALWAYS_INIT */

We could, but it's a wrapper made for only a single caller...

>> #endif
>> diff --git a/init/main.c b/init/main.c
>> index ee4d3e1b3eb9..b6a5833d98f5 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -844,7 +844,8 @@ static void __init mm_init(void)
>> init_mem_debugging_and_hardening();
>> kfence_alloc_pool();
>> report_meminit();
>> - stack_depot_init();
>> + if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT))
>> + stack_depot_init();
>
> I'd push the decision of when to call this into <linux/stackdepot.h> via
> wrapper stack_depot_early_init().

No strong preferrences, if you think it's worth it.

>> mem_init();
>> mem_init_print_info();
>> /* page_owner must be initialized after buddy is ready */
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index 5e7165e6a346..df6bcf0a4cc3 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -671,6 +671,9 @@ config STACKDEPOT
>> bool
>> select STACKTRACE
>>
>> +config STACKDEPOT_ALWAYS_INIT
>> + bool
>
> It looks like every users of STACKDEPOT_ALWAYS_INIT will also select
> STACKDEPOT, so we could just make this:
>
> +config STACKDEPOT_ALWAYS_INIT
> + bool
> + select STACKDEPOT
>
> And remove the redundant 'select STACKDEPOT' in Kconfig.kasan.

Right, will do, if KConfig resolver doesn't bite me.

>> config STACK_HASH_ORDER
>> int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
>> range 12 20
>> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
>> index cdc842d090db..695deb603c66 100644
>> --- a/lib/Kconfig.kasan
>> +++ b/lib/Kconfig.kasan
>> @@ -39,6 +39,7 @@ menuconfig KASAN
>> HAVE_ARCH_KASAN_HW_TAGS
>> depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB)
>> select STACKDEPOT
>> + select STACKDEPOT_ALWAYS_INIT
>
> [...]
>>
>> -int __init stack_depot_init(void)
>> +/*
>> + * __ref because of memblock_alloc(), which will not be actually called after
>> + * the __init code is gone
>
> The reason is that after __init code is gone, slab_is_available() will
> be true (might be worth adding to the comment).

OK

Thanks for the review!

Marco Elver

unread,
Oct 11, 2021, 1:08:14 PM10/11/21
to Vlastimil Babka, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
On Mon, 11 Oct 2021 at 19:02, Vlastimil Babka <vba...@suse.cz> wrote:
[...]
> > On the other hand, the lazy initialization mode you're introducing
> > requires an explicit stack_depot_init() call somewhere and isn't as
> > straightforward as before.
> >
> > Not sure what is best. My intuition tells me STACKDEPOT_LAZY_INIT would
> > be safer as it's a deliberate opt-in to the lazy initialization
> > behaviour.
>
> I think it should be fine with ALWAYS_INIT. There are not many stackdepot
> users being added, and anyone developing a new one will very quickly find
> out if they forget to call stack_depot_init()?

I think that's fine.

> > Preferences?
> >
> > [...]
> >> --- a/drivers/gpu/drm/drm_mm.c
> >> +++ b/drivers/gpu/drm/drm_mm.c
> >> @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
> >> add_hole(&mm->head_node);
> >>
> >> mm->scan_active = 0;
> >> +
> >> +#ifdef CONFIG_DRM_DEBUG_MM
> >> + stack_depot_init();
> >> +#endif
> >
> > DRM_DEBUG_MM implies STACKDEPOT. Not sure what is more readable to drm
> > maintainers, but perhaps it'd be nicer to avoid the #ifdef here, and
> > instead just keep the no-op version of stack_depot_init() in
> > <linux/stackdepot.h>. I don't have a strong preference.
>
> Hm, but in case STACKDEPOT is also selected by something else (e.g.
> CONFIG_PAGE_OWNER) which uses lazy init but isn't enabled on boot, then
> without #ifdef CONFIG_DRM_DEBUG_MM above, this code would call a
> stack_depot_init() (that's not a no-op) even in case it's not going to be
> using it, so not what we want to achieve.
> But it could be changed to use IS_ENABLED() if that's preferred by DRM folks.

You're right -- but I'll leave this to DRM folks.

> BTW it's possible that there won't be any DRM review because this failed to
> apply:
> https://patchwork.freedesktop.org/series/95549/
> DRM folks, any hint how to indicate that the base was next-20211001?
>
[...]
> > +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> > +static inline int stack_depot_early_init(void) { return stack_depot_init(); }
> > +#else
> > +static inline int stack_depot_early_init(void) { return 0; }
> > +#endif /* CONFIG_STACKDEPOT_ALWAYS_INIT */
>
> We could, but it's a wrapper made for only a single caller...
>
> >> #endif
> >> diff --git a/init/main.c b/init/main.c
> >> index ee4d3e1b3eb9..b6a5833d98f5 100644
> >> --- a/init/main.c
> >> +++ b/init/main.c
> >> @@ -844,7 +844,8 @@ static void __init mm_init(void)
> >> init_mem_debugging_and_hardening();
> >> kfence_alloc_pool();
> >> report_meminit();
> >> - stack_depot_init();
> >> + if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT))
> >> + stack_depot_init();
> >
> > I'd push the decision of when to call this into <linux/stackdepot.h> via
> > wrapper stack_depot_early_init().
>
> No strong preferrences, if you think it's worth it.

All the other *init() functions seem to follow the same idiom as there
are barely any IS_ENABLED() in init/main.c. So I think it's just
about being consistent with the rest.

Thanks,
-- Marco

Vlastimil Babka

unread,
Oct 12, 2021, 4:31:46 AM10/12/21
to Marco Elver, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Ah, the file only includes stackdepot.h in a #ifdef CONFIG_DRM_DEBUG_MM
section so I will keep the #ifdef here for a minimal change, unless
requested otherwise.

Vlastimil Babka

unread,
Oct 12, 2021, 5:06:30 AM10/12/21
to Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated
from memblock, even if stack depot ends up not actually used. The default size
of stack_table is 4MB on 32-bit, 8MB on 64-bit.

This is fine for use-cases such as KASAN which is also a config option and
has overhead on its own. But it's an issue for functionality that has to be
actually enabled on boot (page_owner) or depends on hardware (GPU drivers)
and thus the memory might be wasted. This was raised as an issue [1] when
attempting to add stackdepot support for SLUB's debug object tracking
functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable
slub_debug on boot only when needed, or create only specific kmem caches with
debugging for testing purposes.

It would thus be more efficient if stackdepot's table was allocated only when
actually going to be used. This patch thus makes the allocation (and whole
stack_depot_init() call) optional:

- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN
select this flag.
- Other users have to call stack_depot_init() as part of their own init when
it's determined that stack depot will actually be used. This may depend on
both config and runtime conditions. Convert current users which are
page_owner and several in the DRM subsystem. Same will be done for SLUB
later.
- Because the init might now be called after the boot-time memblock allocation
has given all memory to the buddy allocator, change stack_depot_init() to
allocate stack_table with kvmalloc() when memblock is no longer available.
Also handle allocation failure by disabling stackdepot (could have
theoretically happened even with memblock allocation previously), and don't
unnecessarily align the memblock allocation to its own size anymore.

Acked-by: Dmitry Vyukov <dvy...@google.com>
Cc: Marco Elver <el...@google.com>
Cc: Vijayanand Jitta <vji...@codeaurora.org>
Cc: Maarten Lankhorst <maarten....@linux.intel.com>
Cc: Maxime Ripard <mri...@kernel.org>
Cc: Thomas Zimmermann <tzimm...@suse.de>
Cc: David Airlie <air...@linux.ie>
Cc: Daniel Vetter <dan...@ffwll.ch>
Cc: Andrey Ryabinin <ryabin...@gmail.com>
Cc: Alexander Potapenko <gli...@google.com>
Cc: Andrey Konovalov <andre...@gmail.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Geert Uytterhoeven <ge...@linux-m68k.org>
Cc: Oliver Glitta <gli...@gmail.com>
Cc: Imran Khan <imran....@oracle.com>
---
Changes in v2:
- Rebase to v5.15-rc5.
- Stylistic changes suggested by Marco Elver.
drivers/gpu/drm/drm_dp_mst_topology.c | 1 +
drivers/gpu/drm/drm_mm.c | 4 ++++
drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++
include/linux/stackdepot.h | 25 ++++++++++++-------
init/main.c | 2 +-
lib/Kconfig | 4 ++++
lib/Kconfig.kasan | 2 +-
lib/stackdepot.c | 32 +++++++++++++++++++++----
mm/page_owner.c | 2 ++
9 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 86d13d6bc463..b0ebdc843a00 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -5493,6 +5493,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr,
mutex_init(&mgr->probe_lock);
#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
mutex_init(&mgr->topology_ref_history_lock);
+ stack_depot_init();
#endif
INIT_LIST_HEAD(&mgr->tx_msg_downq);
INIT_LIST_HEAD(&mgr->destroy_port_list);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 93d48a6f04ab..5916228ea0c9 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
add_hole(&mm->head_node);

mm->scan_active = 0;
+
+#ifdef CONFIG_DRM_DEBUG_MM
+ stack_depot_init();
+#endif
}
EXPORT_SYMBOL(drm_mm_init);

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index eaf7688f517d..d083506986e1 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack,
static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm)
{
spin_lock_init(&rpm->debug.lock);
+
+ if (rpm->available)
+ stack_depot_init();
}

static noinline depot_stack_handle_t
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 6bb4bc1a5f54..40fc5e92194f 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -13,6 +13,22 @@

typedef u32 depot_stack_handle_t;

+/*
+ * Every user of stack depot has to call this during its own init when it's
+ * decided that it will be calling stack_depot_save() later.
+ *
+ * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
+ * enabled as part of mm_init(), for subsystems where it's known at compile time
+ * that stack depot will be used.
+ */
+int stack_depot_init(void);
+
+#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
+static inline int stack_depot_early_init(void) { return stack_depot_init(); }
+#else
+static inline int stack_depot_early_init(void) { return 0; }
+#endif
+
depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries, gfp_t gfp_flags);

@@ -21,13 +37,4 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,

unsigned int filter_irq_stacks(unsigned long *entries, unsigned int nr_entries);

-#ifdef CONFIG_STACKDEPOT
-int stack_depot_init(void);
-#else
-static inline int stack_depot_init(void)
-{
- return 0;
-}
-#endif /* CONFIG_STACKDEPOT */
-
#endif
diff --git a/init/main.c b/init/main.c
index 81a79a77db46..ca2765c8e45c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -842,7 +842,7 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
- stack_depot_init();
+ stack_depot_early_init();
mem_init();
mem_init_print_info();
/* page_owner must be initialized after buddy is ready */
diff --git a/lib/Kconfig b/lib/Kconfig
index 5e7165e6a346..9d0569084152 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -671,6 +671,10 @@ config STACKDEPOT
bool
select STACKTRACE

+config STACKDEPOT_ALWAYS_INIT
+ bool
+ select STACKDEPOT
+
config STACK_HASH_ORDER
int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
range 12 20
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cdc842d090db..879757b6dd14 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -38,7 +38,7 @@ menuconfig KASAN
CC_HAS_WORKING_NOSANITIZE_ADDRESS) || \
HAVE_ARCH_KASAN_HW_TAGS
depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB)
- select STACKDEPOT
+ select STACKDEPOT_ALWAYS_INIT
help
Enables KASAN (KernelAddressSANitizer) - runtime memory debugger,
designed to find out-of-bounds accesses and use-after-free bugs.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0a2e417f83cb..9bb5333bf02f 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -24,6 +24,7 @@
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
#include <linux/slab.h>
@@ -146,6 +147,7 @@ static struct stack_record *depot_alloc_stack(unsigned long *entries, int size,
#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
#define STACK_HASH_SEED 0x9747b28c

+DEFINE_MUTEX(stack_depot_init_mutex);
static bool stack_depot_disable;
static struct stack_record **stack_table;

@@ -162,18 +164,38 @@ static int __init is_stack_depot_disabled(char *str)
}
early_param("stack_depot_disable", is_stack_depot_disabled);

-int __init stack_depot_init(void)
+/*
+ * __ref because of memblock_alloc(), which will not be actually called after
+ * the __init code is gone, because at that point slab_is_available() is true
index 62402d22539b..16a0ef903384 100644

Marco Elver

unread,
Oct 12, 2021, 5:58:03 AM10/12/21
to Vlastimil Babka, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Dmitry Vyukov, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Reviewed-by: Marco Elver <el...@google.com> # stackdepot

Thanks!
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20211012090621.1357-1-vbabka%40suse.cz.

kernel test robot

unread,
Oct 12, 2021, 5:52:58 PM10/12/21
to Vlastimil Babka, Andrew Morton, kbuil...@lists.01.org, Linux Memory Management List, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka, Dmitry Vyukov, Marco Elver, Vijayanand Jitta
lib/stackdepot.c:150:1: warning: symbol 'stack_depot_init_mutex' was not declared. Should it be static?

Reported-by: kernel test robot <l...@intel.com>
Signed-off-by: kernel test robot <l...@intel.com>
---
stackdepot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 9bb5333bf02f61..89b67aef9b320b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -147,7 +147,7 @@ static struct stack_record *depot_alloc_stack(unsigned long *entries, int size,
#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
#define STACK_HASH_SEED 0x9747b28c

-DEFINE_MUTEX(stack_depot_init_mutex);
+static DEFINE_MUTEX(stack_depot_init_mutex);

kernel test robot

unread,
Oct 12, 2021, 5:53:00 PM10/12/21
to Vlastimil Babka, Andrew Morton, kbuil...@lists.01.org, Linux Memory Management List, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka, Dmitry Vyukov, Marco Elver, Vijayanand Jitta
Hi Vlastimil,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on linux/master linus/master v5.15-rc5]
[cannot apply to hnaz-mm/master next-20211012]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Vlastimil-Babka/lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc/20211012-170816
base: git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-s021-20211012 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/1cd8ce52c520c26c513899fb5aee42b8e5f60d0d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Vlastimil-Babka/lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc/20211012-170816
git checkout 1cd8ce52c520c26c513899fb5aee42b8e5f60d0d
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <l...@intel.com>


sparse warnings: (new ones prefixed by >>)
>> lib/stackdepot.c:150:1: sparse: sparse: symbol 'stack_depot_init_mutex' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuil...@lists.01.org
.config.gz

Vlastimil Babka

unread,
Oct 13, 2021, 3:30:12 AM10/13/21
to Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan
Reviewed-by: Marco Elver <el...@google.com> # stackdepot
Cc: Marco Elver <el...@google.com>
Cc: Vijayanand Jitta <vji...@codeaurora.org>
Cc: Maarten Lankhorst <maarten....@linux.intel.com>
Cc: Maxime Ripard <mri...@kernel.org>
Cc: Thomas Zimmermann <tzimm...@suse.de>
Cc: David Airlie <air...@linux.ie>
Cc: Daniel Vetter <dan...@ffwll.ch>
Cc: Andrey Ryabinin <ryabin...@gmail.com>
Cc: Alexander Potapenko <gli...@google.com>
Cc: Andrey Konovalov <andre...@gmail.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Geert Uytterhoeven <ge...@linux-m68k.org>
Cc: Oliver Glitta <gli...@gmail.com>
Cc: Imran Khan <imran....@oracle.com>
---
Changes in v3:
- stack_depot_init_mutex made static and moved inside stack_depot_init()
Reported-by: kernel test robot <l...@intel.com>
- use !stack_table condition instead of stack_table == NULL
reported by checkpatch on freedesktop.org patchwork
drivers/gpu/drm/drm_dp_mst_topology.c | 1 +
drivers/gpu/drm/drm_mm.c | 4 +++
drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++
include/linux/stackdepot.h | 25 ++++++++++++-------
init/main.c | 2 +-
lib/Kconfig | 4 +++
lib/Kconfig.kasan | 2 +-
lib/stackdepot.c | 33 +++++++++++++++++++++----
mm/page_owner.c | 2 ++
9 files changed, 60 insertions(+), 16 deletions(-)
index 0a2e417f83cb..049d7d025d78 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -24,6 +24,7 @@
#include <linux/jhash.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/percpu.h>
#include <linux/printk.h>
#include <linux/slab.h>
@@ -162,18 +163,40 @@ static int __init is_stack_depot_disabled(char *str)
}
early_param("stack_depot_disable", is_stack_depot_disabled);

-int __init stack_depot_init(void)
+/*
+ * __ref because of memblock_alloc(), which will not be actually called after
+ * the __init code is gone, because at that point slab_is_available() is true
+ */
+__ref int stack_depot_init(void)
{
- if (!stack_depot_disable) {
+ static DEFINE_MUTEX(stack_depot_init_mutex);
+

kernel test robot

unread,
Oct 14, 2021, 4:35:04 AM10/14/21
to Vlastimil Babka, 0day robot, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, LKML, l...@lists.01.org, Andrew Morton, linu...@kvack.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Vlastimil Babka


Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 1cd8ce52c520c26c513899fb5aee42b8e5f60d0d ("[PATCH v2] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
in testcase: rcutorture
version:
with following parameters:

runtime: 300s
test: cpuhotplug
torture_type: srcud

test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt


on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------+------------+------------+
| | a94a6d76c9 | 1cd8ce52c5 |
+---------------------------------------------+------------+------------+
| boot_successes | 30 | 0 |
| boot_failures | 0 | 7 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 2 |
| Oops:#[##] | 0 | 7 |
| EIP:stack_depot_save | 0 | 7 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 7 |
| BUG:unable_to_handle_page_fault_for_address | 0 | 5 |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <olive...@intel.com>



[ 319.147926][ T259] BUG: unable to handle page fault for address: 0ec74110
[ 319.149309][ T259] #PF: supervisor read access in kernel mode
[ 319.150362][ T259] #PF: error_code(0x0000) - not-present page
[ 319.151372][ T259] *pde = 00000000
[ 319.151964][ T259] Oops: 0000 [#1] SMP
[ 319.152617][ T259] CPU: 0 PID: 259 Comm: systemd-rc-loca Not tainted 5.15.0-rc1-00270-g1cd8ce52c520 #1
[ 319.154514][ T259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 319.156200][ T259] EIP: stack_depot_save+0x12a/0x4d0
[ 319.157287][ T259] Code: ff 0f 00 8d 04 90 89 45 dc 8b 18 85 db 0f 84 0d 01 00 00 8b 55 e8 eb 12 8d b4 26 00 00 00 00 90 8b 1b 85 db 0f 84 f6 00 00 00 <39> 73 04
75 f1 3b 53 08 75 ec 8b 4d e4 31 c0 8d b4 26 00 00 00 00
[ 319.161025][ T259] EAX: f286870c EBX: 0ec7410c ECX: ae94980e EDX: 00000010
[ 319.163557][ T259] ESI: ca0ea9c3 EDI: 6e32801a EBP: bec0bc90 ESP: bec0bc5c
[ 319.164952][ T259] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010206
[ 319.166533][ T259] CR0: 80050033 CR2: 0ec74110 CR3: 0373f000 CR4: 00000690
[ 319.167965][ T259] Call Trace:
[ 319.168625][ T259] save_stack+0x66/0x90
[ 319.169561][ T259] ? free_pcp_prepare+0x192/0x340
[ 319.170597][ T259] ? free_unref_page+0x18/0x210
[ 319.171478][ T259] ? __free_pages+0xa7/0xd0
[ 319.172294][ T259] ? put_task_stack+0x9d/0x140
[ 319.173115][ T259] ? finish_task_switch+0x180/0x240
[ 319.174197][ T259] ? __schedule+0x39a/0xc00
[ 319.175268][ T259] ? preempt_schedule_common+0x1c/0x30
[ 319.176344][ T259] ? __cond_resched+0x25/0x30
[ 319.177302][ T259] ? unmap_page_range+0x366/0x7a0
[ 319.178325][ T259] ? unmap_single_vma+0x55/0xc0
[ 319.179247][ T259] ? unmap_vmas+0x35/0x50
[ 319.180072][ T259] ? exit_mmap+0x72/0x1c0
[ 319.180894][ T259] ? mmput+0x61/0x100
[ 319.181663][ T259] ? do_exit+0x296/0xa50
[ 319.182511][ T259] ? do_group_exit+0x31/0x90
[ 319.183380][ T259] ? __ia32_sys_exit_group+0x10/0x10
[ 319.184357][ T259] __reset_page_owner+0x36/0x90
[ 319.185331][ T259] free_pcp_prepare+0x192/0x340
[ 319.186292][ T259] free_unref_page+0x18/0x210
[ 319.187183][ T259] __free_pages+0xa7/0xd0
[ 319.188035][ T259] put_task_stack+0x9d/0x140
[ 319.188928][ T259] finish_task_switch+0x180/0x240
[ 319.189949][ T259] ? finish_task_switch+0x52/0x240
[ 319.190896][ T259] __schedule+0x39a/0xc00
[ 319.191645][ T259] ? find_held_lock+0x2a/0x90
[ 319.192566][ T259] preempt_schedule_common+0x1c/0x30
[ 319.193495][ T259] __cond_resched+0x25/0x30
[ 319.194320][ T259] unmap_page_range+0x366/0x7a0
[ 319.195237][ T259] unmap_single_vma+0x55/0xc0
[ 319.196144][ T259] unmap_vmas+0x35/0x50
[ 319.196942][ T259] exit_mmap+0x72/0x1c0
[ 319.197742][ T259] ? up_read+0x16/0x240
[ 319.198527][ T259] mmput+0x61/0x100
[ 319.199208][ T259] do_exit+0x296/0xa50
[ 319.199930][ T259] do_group_exit+0x31/0x90
[ 319.200757][ T259] ? __might_fault+0x79/0x80
[ 319.201653][ T259] __ia32_sys_exit_group+0x10/0x10
[ 319.202662][ T259] __do_fast_syscall_32+0x5b/0xd0
[ 319.203658][ T259] do_fast_syscall_32+0x32/0x70
[ 319.204650][ T259] do_SYSENTER_32+0x15/0x20
[ 319.205571][ T259] entry_SYSENTER_32+0x98/0xe7
[ 319.206581][ T259] EIP: 0x37f47549
[ 319.207276][ T259] Code: Unable to access opcode bytes at RIP 0x37f4751f.
[ 319.208586][ T259] EAX: ffffffda EBX: 00000000 ECX: 37d181d8 EDX: 00000000
[ 319.209955][ T259] ESI: 00000000 EDI: 37d152f0 EBP: 37d181e0 ESP: 3fc3cf2c
[ 319.211250][ T259] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000216
[ 319.212737][ T259] Modules linked in:
[ 319.213484][ T259] CR2: 000000000ec74110
[ 319.214357][ T259] ---[ end trace d840069cc585ecdc ]---
[ 319.215361][ T259] EIP: stack_depot_save+0x12a/0x4d0
[ 319.216296][ T259] Code: ff 0f 00 8d 04 90 89 45 dc 8b 18 85 db 0f 84 0d 01 00 00 8b 55 e8 eb 12 8d b4 26 00 00 00 00 90 8b 1b 85 db 0f 84 f6 00 00 00 <39> 73 04 75 f1 3b 53 08 75 ec 8b 4d e4 31 c0 8d b4 26 00 00 00 00
[ 319.219967][ T259] EAX: f286870c EBX: 0ec7410c ECX: ae94980e EDX: 00000010
[ 319.221339][ T259] ESI: ca0ea9c3 EDI: 6e32801a EBP: bec0bc90 ESP: bec0bc5c
[ 319.222743][ T259] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010206
[ 319.224002][ T259] CR0: 80050033 CR2: 0ec74110 CR3: 0373f000 CR4: 00000690
[ 319.225147][ T259] Kernel panic - not syncing: Fatal exception
[ 319.226616][ T259] Kernel Offset: disabled



To reproduce:

# build kernel
cd linux
cp config-5.15.0-rc1-00270-g1cd8ce52c520 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/l...@lists.01.org Intel Corporation

Thanks,
Oliver Sang

config-5.15.0-rc1-00270-g1cd8ce52c520
job-script
dmesg.xz

Vlastimil Babka

unread,
Oct 14, 2021, 5:33:05 AM10/14/21
to kernel test robot, 0day robot, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, LKML, l...@lists.01.org, Andrew Morton, linu...@kvack.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Mike Rapoport
Cc Mike Rapoport, looks like:
- memblock_alloc() should have failed (I think, because page allocator
already took over?), but didn't. So apparently we got some area that wasn't
fully mapped.
- using slab_is_available() is not accurate enough to detect when to use
memblock or page allocator (kvmalloc in case of my patch). I have used it
because memblock_alloc_internal() checks the same condition to issue a warning.

Relevant part of dmesg.xz that was attached:
[ 1.589075][ T0] Dentry cache hash table entries: 524288 (order: 9, 2097152 bytes, linear)
[ 1.592396][ T0] Inode-cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
[ 2.916844][ T0] allocated 31496920 bytes of page_ext

- this means we were allocating from page allocator by alloc_pages_exact_nid() already

[ 2.918197][ T0] mem auto-init: stack:off, heap alloc:off, heap free:on
[ 2.919683][ T0] mem auto-init: clearing system memory may take some time...
[ 2.921239][ T0] Initializing HighMem for node 0 (000b67fe:000bffe0)
[ 23.023619][ T0] Initializing Movable for node 0 (00000000:00000000)
[ 245.194520][ T0] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 245.196847][ T0] Memory: 2914460K/3145208K available (20645K kernel code, 5953K rwdata, 12624K rodata, 760K init, 8112K bss, 230748K reserved, 0K cma-reserved, 155528K highmem)
[ 245.200521][ T0] Stack Depot allocating hash table with memblock_alloc

- initializing stack depot as part of initializing page_owner, uses memblock_alloc()
because slab_is_available() is still false

[ 245.212005][ T0] Node 0, zone Normal: page owner found early allocated 0 pages
[ 245.213867][ T0] Node 0, zone HighMem: page owner found early allocated 0 pages
[ 245.216126][ T0] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1

- printed by slub's kmem_cache_init() after create_kmalloc_caches() setting slab_state
to UP, making slab_is_available() true, but too late

In my local testing of the patch, when stackdepot was initialized through
page owner init, it was using kvmalloc() so slab_is_available() was true.
Looks like the exact order of slab vs page_owner alloc is non-deterministic,
could be arch-dependent or just random ordering of init calls. A wrong order
will exploit the apparent fact that slab_is_available() is not a good
indicator of using memblock vs page allocator, and we would need a better one.
Thoughts?

Mike Rapoport

unread,
Oct 14, 2021, 6:17:02 AM10/14/21
to Vlastimil Babka, kernel test robot, 0day robot, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, LKML, l...@lists.01.org, Andrew Morton, linu...@kvack.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com
The order of slab vs page_owner is deterministic, but it is different for
FLATMEM and SPARSEMEM. And page_ext_init_flatmem_late() that initializes
page_ext for FLATMEM is called exactly between buddy and slab setup:

static void __init mm_init(void)
{
...

mem_init();
mem_init_print_info();
/* page_owner must be initialized after buddy is ready */
page_ext_init_flatmem_late();
kmem_cache_init();

...
}

I've stared for a while at page_ext init and it seems that the
page_ext_init_flatmem_late() can be simply dropped because there is anyway
a call to invoke_init_callbacks() in page_ext_init() that is called much
later in the boot process.

--
Sincerely yours,
Mike.

Vlastimil Babka

unread,
Oct 15, 2021, 4:27:19 AM10/15/21
to Mike Rapoport, kernel test robot, 0day robot, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, LKML, l...@lists.01.org, Andrew Morton, linu...@kvack.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com
Oh, so it was due to FLATMEM, thanks for figuring that out!

> static void __init mm_init(void)
> {
> ...
>
> mem_init();
> mem_init_print_info();
> /* page_owner must be initialized after buddy is ready */
> page_ext_init_flatmem_late();
> kmem_cache_init();
>
> ...
> }
>
> I've stared for a while at page_ext init and it seems that the
> page_ext_init_flatmem_late() can be simply dropped because there is anyway
> a call to invoke_init_callbacks() in page_ext_init() that is called much
> later in the boot process.

Yeah, but page_ext_init() only does something for SPARSEMEM, and is empty on
FLATMEM. Otherwise it would be duplicating all the work. So I'll just move
page_ext_init_flatmem_late() below kmem_cache_init() in mm_init(). Thanks
again!


Mike Rapoport

unread,
Oct 15, 2021, 4:56:12 AM10/15/21
to Vlastimil Babka, kernel test robot, 0day robot, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, LKML, l...@lists.01.org, Andrew Morton, linu...@kvack.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com
I hope at some point we'll have cleaner mm_init(), but for now simply
moving page_ext_init_flatmem_late() should work.

> Thanks again!

Welcome :)


--
Sincerely yours,
Mike.

Vlastimil Babka

unread,
Oct 15, 2021, 4:57:16 AM10/15/21
to Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, dri-...@lists.freedesktop.org, inte...@lists.freedesktop.org, kasa...@googlegroups.com, Dmitry Vyukov, Marco Elver, Vijayanand Jitta, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Geert Uytterhoeven, Oliver Glitta, Imran Khan, Mike Rapoport, kernel test robot
...
> ---
> Changes in v3:
> - stack_depot_init_mutex made static and moved inside stack_depot_init()
> Reported-by: kernel test robot <l...@intel.com>
> - use !stack_table condition instead of stack_table == NULL
> reported by checkpatch on freedesktop.org patchwork

The last change above was missing because I forgot git commit --amend before
git format-patch. More importantly there was a bot report for FLATMEM. Please
add this fixup. Thanks.

----8<----
From a971a1670491f8fbbaab579eef3c756a5263af95 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vba...@suse.cz>
Date: Thu, 7 Oct 2021 10:49:09 +0200
Subject: [PATCH] lib/stackdepot: allow optional init and stack_table
allocation by kvmalloc() - fixup

On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init()
which means stack_depot_init() (called by page owner init) will not recognize
properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc()
will also not issue a warning and return a block memory that can be invalid and
cause kernel page fault when saving stacks, as reported by the kernel test
robot [1].

Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that
slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have
this issue, as it doesn't do page_ext_init_flatmem_late(), but a different
page_ext_init() even later in the boot process.

Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.

While at it, also actually resolve a checkpatch warning in stack_depot_init()
from DRM CI, which was supposed to be in the original patch already.

[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
Reported-by: kernel test robot <olive...@intel.com>
---
init/main.c | 7 +++++--
lib/stackdepot.c | 2 +-
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/init/main.c b/init/main.c
index ca2765c8e45c..0ab632f681c5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -845,9 +845,12 @@ static void __init mm_init(void)
stack_depot_early_init();
mem_init();
mem_init_print_info();
- /* page_owner must be initialized after buddy is ready */
- page_ext_init_flatmem_late();
kmem_cache_init();
+ /*
+ * page_owner must be initialized after buddy is ready, and also after
+ * slab is ready so that stack_depot_init() works properly
+ */
+ page_ext_init_flatmem_late();
kmemleak_init();
pgtable_init();
debug_objects_mem_init();
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 049d7d025d78..1f8ea6d0899b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -172,7 +172,7 @@ __ref int stack_depot_init(void)
static DEFINE_MUTEX(stack_depot_init_mutex);

mutex_lock(&stack_depot_init_mutex);
- if (!stack_depot_disable && stack_table == NULL) {
+ if (!stack_depot_disable && !stack_table) {
size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
int i;

--
2.33.0

Reply all
Reply to author
Forward
0 new messages