[PATCH RFC 00/15] mm/slab: introduce alloc_flags and slab_alloc_context

2 views
Skip to first unread message

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:17:57 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
This series is based on slab/for-next. If all goes well, it would
hopefully go to slab/for-next soon after the 7.2 merge window, so any
other work can be based on it to avoid conflicts, as it touches a lot
parts of slab.

Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags

The slab implementation currently relies on gfp flags to convey
some context information internally:

- The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
on locks", and intended to be used by kmalloc_nolock(). But false
positives are possible e.g. during early boot where gfp_allowed_mask
clears __GFP_RECLAIM from all allocations. This leads to unnecessary
allocation failures and workarounds such as fd3634312a04 ("debugobject:
Make it work with deferred page initialization - again").

- __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
space, only to prevent recursive kmalloc() allocations for obj_ext
arrays and sheaves.

The page allocator uses its internal alloc_flags to convey various
context information, including ALLOC_TRYLOCK (meaning "cannot spin").
This series copies that concept for the slab allocator, with its own
slab-specific internal flags:

- SLAB_ALLOC_DEFAULT - no extra flags (the value is 0), but explicit
- SLAB_ALLOC_TRYLOCK - do not spin on locks (used by kmalloc_nolock())
- SLAB_ALLOC_NEW_SLAB - replacing existing 'bool new_slab' parameter
for allocating obj_ext arrays
- SLAB_ALLOC_NO_RECURSE - replacing usage of __GFP_NO_OBJ_EXT

To reduce the amount of parameters in various internal functions, we
additionally introduce slab_alloc_context (also inspired by page
allocator's alloc_context) for passing a number of existing arguments
and the new alloc_flags:

/* Structure holding extra parameters for slab allocations */
struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
unsigned int alloc_flags;
struct list_lru *lru;
};

This also replaces the existing struct partial_context.

The last necessary piece is kmalloc_flags() which can take the
alloc_flags in addition to gfp flags and is intended for the recursive
allocations of sheaves and obj_ext arrays, so that both
SLAB_ALLOC_TRYLOCK and SLAB_ALLOC_NO_RECURSE can be communicated.
Internally it decides between kmalloc_nolock() and normal kmalloc()
depending SLAB_ALLOC_TRYLOCK.

The rest of the series is gradually expanding the usage of both
alloc_flags and slab_alloc_context as necessary, with bits of
refactoring. Then, __GFP_NO_OBJ_EXT is removed completely.

Note that some usage of gfpflags_allow_spinning() relying on absence of
__GFP_RECLAIM remains outside of slab (and page allocator) in memcg,
page_owner and stackdepot code. These can thus yield false-positive
decisions that spinning is not allowed, but should not result in
important allocations failing anymore.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
Vlastimil Babka (SUSE) (15):
mm/slab: always zero only requested size on alloc
mm/slab: stop inlining __slab_alloc_node()
mm/slab: introduce slab_alloc_context
mm/slab: introduce alloc_flags and SLAB_ALLOC_TRYLOCK
mm/slab: add alloc_flags to slab_alloc_context
mm/slab: replace struct partial_context with slab_alloc_context
mm/slab: pass alloc_flags to new slab allocation
mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
mm/slab: replace slab_alloc_node() parameters with slab_alloc_context
mm/slab: allow kmem_cache_alloc_bulk() with any gfp flags
mm/slab: pass slab_alloc_context to __do_kmalloc_node()
mm/slab: introduce kmalloc_flags()
mm/slab: remove __GFP_NO_OBJ_EXT usage from alloc_slab_obj_exts()
mm/slab: replace __GFP_NO_OBJ_EXT with SLAB_ALLOC_NO_RECURSE for sheaves
mm: remove the __GFP_NO_OBJ_EXT flag

include/linux/gfp_types.h | 7 -
include/linux/slab.h | 14 +-
include/trace/events/mmflags.h | 10 +-
lib/alloc_tag.c | 2 +-
mm/kfence/core.c | 6 +-
mm/memcontrol.c | 5 +-
mm/slab.h | 16 +-
mm/slub.c | 423 ++++++++++++++++++++++++----------------
tools/include/linux/gfp_types.h | 7 -
9 files changed, 288 insertions(+), 202 deletions(-)
---
base-commit: 500b2c9755301742bdbb61249511ac11a4665dae
change-id: 20260601-slab_alloc_flags-25c782b0c57c

Best regards,
--
Vlastimil Babka (SUSE) <vba...@kernel.org>

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:00 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
When zeroing on alloc is requested (by __GFP_ZERO or the init_on_alloc
parameter), we have been trying to zero the whole kmalloc bucket size
and not just requested size, if possible.

This probably comes from the past where ksize() could be used to
discover the bucket size and use it opportunistically beyond the
requested size. This is now forbidden and enabling debugging such as
KASAN or slab's red zoning would catch this misuse. Therefore, nobody
can be relying on __GFP_ZERO zeroing beyond requested size.

Theoretically it might still improve hardening in case of unintended
accesses beond requested size accessing some sensitive data from a
previous allocation. But then, init_on_free is probably used also for
hardening and would have cleared that.

So the usefullness of zeroing beyond requested size is practically none
nowadays. The disadvantages for doing it are:

- Interaction with KFENCE, which perfoms the zeroing on its own because
it has its own redzone beyond requested size. As a consequence
slab_post_alloc_hook() has an 'init' parameter which has to be
evaluated in all callers (via slab_want_init_on_alloc()).

For kfence allocations in slab_alloc_node() this evaluation is subtly
skipped over in order to do the right thing. Other callers (i.e.
kmem_cache_alloc_bulk_noprof()) evaluate it unconditionally even if
they do end up with a kfence allocation. This is only subtly not a
problem, as those are not kmalloc allocations and are using
s->object_size as requested size, so it doesn't interfere with kfence's
redzone. There's just a unnecessary double zeroing (in both kfence and
slab_post_alloc_hook()), but it's all very fragile and contradicts the
comment in kfence_guarded_alloc().

- Interaction with slab's redzoning where we have to limit the zeroing
to requested size.

We can make the code much more simple by always zeroing only up to the
requested size. Move slab_want_init_on_alloc() call to
slab_post_alloc_hook(), removing the parameter. Remove the red zone
handling.

For kfence's zeroing code, update the comment. We could remove it
completely, but due to possible interactions with KASAN, there are
configurations where neither slab or KASAN would zero the object,
so simply do it in kfence. At worst the zeroing will happen twice, but
kfence allocations are rare by design so the cost is negligible.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/kfence/core.c | 6 +++---
mm/slub.c | 35 +++++++----------------------------
2 files changed, 10 insertions(+), 31 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 655dc5ce3240..c765ba0a3a67 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -499,9 +499,9 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g
set_canary(meta);

/*
- * We check slab_want_init_on_alloc() ourselves, rather than letting
- * SL*B do the initialization, as otherwise we might overwrite KFENCE's
- * redzone.
+ * SLUB will generally init kfence objects, but due to possible
+ * interactions with KASAN, it might not happen, so do it ourselves.
+ * In the worst case the init just happens twice.
*/
if (unlikely(slab_want_init_on_alloc(gfp, cache)))
memzero_explicit(addr, size);
diff --git a/mm/slub.c b/mm/slub.c
index 63c1ef998dd3..f787dc422d1b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4565,26 +4565,14 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)

static __fastpath_inline
bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p, bool init,
+ gfp_t flags, size_t size, void **p,
unsigned int orig_size)
{
- unsigned int zero_size = s->object_size;
+ bool init = slab_want_init_on_alloc(flags, s);
bool kasan_init = init;
size_t i;
gfp_t init_flags = flags & gfp_allowed_mask;

- /*
- * For kmalloc object, the allocated memory size(object_size) is likely
- * larger than the requested size(orig_size). If redzone check is
- * enabled for the extra space, don't zero it, as it will be redzoned
- * soon. The redzone operation for this extra space could be seen as a
- * replacement of current poisoning under certain debug option, and
- * won't break other sanity checks.
- */
- if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
- (s->flags & SLAB_KMALLOC))
- zero_size = orig_size;
-
/*
* When slab_debug is enabled, avoid memory initialization integrated
* into KASAN and instead zero out the memory via the memset below with
@@ -4607,7 +4595,7 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
if (p[i] && init && (!kasan_init ||
!kasan_has_integrated_init()))
- memset(p[i], 0, zero_size);
+ memset(p[i], 0, orig_size);
if (gfpflags_allow_spinning(flags))
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, init_flags);
@@ -4908,7 +4896,6 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
- bool init = false;

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
@@ -4924,16 +4911,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);

maybe_wipe_obj_freeptr(s, object);
- init = slab_want_init_on_alloc(gfpflags, s);

out:
/*
- * When init equals 'true', like for kzalloc() family, only
- * @orig_size bytes might be zeroed instead of s->object_size
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, lru, gfpflags, 1, &object, init, orig_size);
+ slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);

return object;
}
@@ -5228,7 +5212,6 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
struct slab_sheaf *sheaf)
{
void *ret = NULL;
- bool init;

if (sheaf->size == 0)
goto out;
@@ -5238,10 +5221,8 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
if (likely(!ret))
ret = sheaf->objects[--sheaf->size];

- init = slab_want_init_on_alloc(gfp, s);
-
/* add __GFP_NOFAIL to force successful memcg charging */
- slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size);
+ slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
out:
trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);

@@ -5421,8 +5402,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret,
- slab_want_init_on_alloc(alloc_gfp, s), orig_size);
+ slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);

ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
return ret;
@@ -7337,8 +7317,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,

out:
/* memcg and kmem_cache debug support and memory initialization */
- return likely(slab_post_alloc_hook(s, NULL, flags, size, p,
- slab_want_init_on_alloc(flags, s), s->object_size));
+ return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
}
EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:04 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With sheaves, this is no longer part of the allocation fastpath. For
the same reason, also mark the call to it from slab_alloc_node() as
unlikely().

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f787dc422d1b..af85f338db4f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4519,8 +4519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
return object;
}

-static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
- gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
+ unsigned long addr, size_t orig_size)
{
void *object;

@@ -4907,7 +4907,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, node);

- if (!object)
+ if (unlikely(!object))
object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);

maybe_wipe_obj_freeptr(s, object);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:08 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Similarly to page allocator's struct alloc_context, introduce a helper
struct to hold a part of the allocation arguments. This will allow
reducing the number of parameters in many functions of the
implementation, and extend them easily if needed.

For now, make it hold the caller address and the originally requested
allocation size.

Convert alloc_single_from_new_slab(), __slab_alloc_node() and
___slab_alloc(). No functional change intended.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 46 +++++++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index af85f338db4f..06fc1656080f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -213,6 +213,12 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
static DEFINE_STATIC_KEY_FALSE(strict_numa);
#endif

+/* Structure holding extra parameters for slab allocations */
+struct slab_alloc_context {
+ unsigned long caller_addr;
+ unsigned long orig_size;
+};
+
/* Structure holding parameters for get_from_partial() call chain */
struct partial_context {
gfp_t flags;
@@ -3687,7 +3693,8 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
* and put the slab to the partial (or full) list.
*/
static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
- int orig_size, bool allow_spin)
+ struct slab_alloc_context *ac,
+ bool allow_spin)
{
struct kmem_cache_node *n;
struct slab_obj_iter iter;
@@ -3705,7 +3712,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
/* alloc_debug_processing() always expects a valid freepointer */
set_freepointer(s, object, slab->freelist);

- if (!alloc_debug_processing(s, slab, object, orig_size)) {
+ if (!alloc_debug_processing(s, slab, object, ac->orig_size)) {
/*
* It's not really expected that this would fail on a
* freshly allocated slab, but a concurrent memory
@@ -4443,7 +4450,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
* slab.
*/
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, unsigned int orig_size)
+ struct slab_alloc_context *ac)
{
bool allow_spin = gfpflags_allow_spinning(gfpflags);
void *object;
@@ -4476,7 +4483,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
pc.flags = GFP_NOWAIT | __GFP_THISNODE;
}

- pc.orig_size = orig_size;
+ pc.orig_size = ac->orig_size;
object = get_from_partial(s, node, &pc);
if (object)
goto success;
@@ -4496,7 +4503,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
stat(s, ALLOC_SLAB);

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
- object = alloc_single_from_new_slab(s, slab, orig_size, allow_spin);
+ object = alloc_single_from_new_slab(s, slab, ac, allow_spin);

if (likely(object))
goto success;
@@ -4514,13 +4521,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,

success:
if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
- set_track(s, object, TRACK_ALLOC, addr, gfpflags);
+ set_track(s, object, TRACK_ALLOC, ac->caller_addr, gfpflags);

return object;
}

static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, size_t orig_size)
+ struct slab_alloc_context *ac)
{
void *object;

@@ -4545,7 +4552,7 @@ static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
}
#endif

- object = ___slab_alloc(s, gfpflags, node, addr, orig_size);
+ object = ___slab_alloc(s, gfpflags, node, ac);

return object;
}
@@ -4907,8 +4914,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, node);

- if (unlikely(!object))
- object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
+ if (unlikely(!object)) {
+ struct slab_alloc_context ac = {
+ .caller_addr = addr,
+ .orig_size = orig_size,
+ };
+ object = __slab_alloc_node(s, gfpflags, node, &ac);
+ }

maybe_wipe_obj_freeptr(s, object);

@@ -5373,13 +5385,18 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
if (ret)
goto success;

+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = orig_size,
+ };
+
/*
* Do not call slab_alloc_node(), since trylock mode isn't
* compatible with slab_pre_alloc_hook/should_failslab and
* kfence_alloc. Hence call __slab_alloc_node() (at most twice)
* and slab_post_alloc_hook() directly.
*/
- ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, orig_size);
+ ret = __slab_alloc_node(s, alloc_gfp, node, &ac);

/*
* It's possible we failed due to trylock as we preempted someone with
@@ -7221,10 +7238,13 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
int i;

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ };
for (i = 0; i < size; i++) {

- p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, &ac);
if (unlikely(!p[i]))
goto error;


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:12 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Similarly to the page allocators, introduce slab-allocator specific
alloc flags that internally control allocation behavior in addition to
gfp_flags, without occupying the limited gfp flags space.

Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
page allocator's ALLOC_TRYLOCK and will be used to reimplement
kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
e.g. in early boot with a restricted gfp_allowed_mask.

Also introduce alloc_flags_allow_spinning() to replace the usage of
gfpflags_allow_spinning().

Start using alloc_flags and the new check first in alloc_from_pcs() and
__pcs_replace_empty_main(). This means some slab allocations that were
falsely treated as kmalloc_nolock() due to their gfp flags will now have
higher chances of succeed, and this will further increase with followup
changes.

Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
reach it from a slab allocation that's not _nolock() and yet lacks
__GFP_KSWAPD_RECLAIM for other reasons.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slab.h | 9 +++++++++
mm/slub.c | 17 ++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 1bf9c3021ae3..3e75182ee144 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -16,6 +16,15 @@
* Internal slab definitions
*/

+/* slab's alloc_flags definitions */
+#define SLAB_ALLOC_DEFAULT 0x00
+#define SLAB_ALLOC_TRYLOCK 0x01
+
+static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
+{
+ return !(alloc_flags & SLAB_ALLOC_TRYLOCK);
+}
+
#ifdef CONFIG_64BIT
# ifdef system_has_cmpxchg128
# define system_has_freelist_aba() system_has_cmpxchg128()
diff --git a/mm/slub.c b/mm/slub.c
index 06fc1656080f..278d8cbcc7ee 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4622,7 +4622,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
* unlocked.
*/
static struct slub_percpu_sheaves *
-__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, gfp_t gfp)
+__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
+ gfp_t gfp, unsigned int alloc_flags)
{
struct slab_sheaf *empty = NULL;
struct slab_sheaf *full;
@@ -4648,7 +4649,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;
}

- allow_spin = gfpflags_allow_spinning(gfp);
+ allow_spin = alloc_flags_allow_spinning(alloc_flags);

full = barn_replace_empty_sheaf(barn, pcs->main, allow_spin);

@@ -4734,7 +4735,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
}

static __fastpath_inline
-void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
+void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, unsigned int alloc_flags, int node)
{
struct slub_percpu_sheaves *pcs;
bool node_requested;
@@ -4779,7 +4780,7 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
pcs = this_cpu_ptr(s->cpu_sheaves);

if (unlikely(pcs->main->size == 0)) {
- pcs = __pcs_replace_empty_main(s, pcs, gfp);
+ pcs = __pcs_replace_empty_main(s, pcs, gfp, alloc_flags);
if (unlikely(!pcs))
return NULL;
}
@@ -4912,7 +4913,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, node);
+ object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);

if (unlikely(!object)) {
struct slab_alloc_context ac = {
@@ -5343,6 +5344,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
{
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
size_t orig_size = size;
+ unsigned int alloc_flags = SLAB_ALLOC_TRYLOCK;
struct kmem_cache *s;
bool can_retry = true;
void *ret;
@@ -5381,7 +5383,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
*/
return NULL;

- ret = alloc_from_pcs(s, alloc_gfp, node);
+ ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
if (ret)
goto success;

@@ -7200,9 +7202,6 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
unsigned int refilled;
struct slab *slab;

- if (WARN_ON_ONCE(!gfpflags_allow_spinning(gfp)))
- return 0;
-
refilled = __refill_objects_node(s, p, gfp, min, max,
get_node(s, local_node),
/* allow_spin = */ true);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:16 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Add alloc_flags as a new field to the slab_alloc_context helper struct,
so we can pass it to more functions in the slab implementation without
adding another function parameter.

Start checking them via alloc_flags_allow_spinning() in
alloc_single_from_new_slab() (where we can drop the allow_spin
parameter) and ___slab_alloc(). This further reduces false-positive
spinning-not-allowed from allocations that are not kmalloc_nolock() but
lack __GFP_RECLAIM flags.

_kmalloc_nolock_noprof() initializes ac.alloc_flags using its flags that
are SLAB_ALLOC_TRYLOCK. slab_alloc_node() and __kmem_cache_alloc_bulk()
are not reachable from kmalloc_nolock() and all their callers expect
spinning to be allowed, so they can use SLAB_ALLOC_DEFAULT. This is
temporary as the scope of slab_alloc_context will further move to the
callers, making the alloc_flags usage more obvious.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 278d8cbcc7ee..b2a452dd70fa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -217,6 +217,7 @@ static DEFINE_STATIC_KEY_FALSE(strict_numa);
struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
+ unsigned int alloc_flags;
};

/* Structure holding parameters for get_from_partial() call chain */
@@ -3693,9 +3694,9 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
* and put the slab to the partial (or full) list.
*/
static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
- struct slab_alloc_context *ac,
- bool allow_spin)
+ struct slab_alloc_context *ac)
{
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
struct kmem_cache_node *n;
struct slab_obj_iter iter;
bool needs_add_partial;
@@ -4452,7 +4453,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
struct slab_alloc_context *ac)
{
- bool allow_spin = gfpflags_allow_spinning(gfpflags);
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
void *object;
struct slab *slab;
struct partial_context pc;
@@ -4503,7 +4504,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
stat(s, ALLOC_SLAB);

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
- object = alloc_single_from_new_slab(s, slab, ac, allow_spin);
+ object = alloc_single_from_new_slab(s, slab, ac);

if (likely(object))
goto success;
@@ -4903,6 +4904,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
+ const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;

s = slab_pre_alloc_hook(s, gfpflags);
@@ -4913,12 +4915,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
+ object = alloc_from_pcs(s, gfpflags, alloc_flags, node);

if (unlikely(!object)) {
struct slab_alloc_context ac = {
.caller_addr = addr,
.orig_size = orig_size,
+ .alloc_flags = alloc_flags,
};
object = __slab_alloc_node(s, gfpflags, node, &ac);
}
@@ -5390,6 +5393,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
struct slab_alloc_context ac = {
.caller_addr = _RET_IP_,
.orig_size = orig_size,
+ .alloc_flags = alloc_flags,
};

/*
@@ -7240,6 +7244,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
struct slab_alloc_context ac = {
.caller_addr = _RET_IP_,
.orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
};
for (i = 0; i < size; i++) {


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:19 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Refactor get_from_partial_node(), get_from_any_partial(),
get_from_partial() and ___slab_alloc().

Remove struct partial_context, which used to be more substantial but
shrank as part of the sheaves conversion. Instead pass gfp_flags and
pointer to the new slab_alloc_context, which together is a superset of
partial_context.

This means alloc_flags are now available and we can use them to
determine if spinning is allowed, further reducing false positive "not
allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 52 ++++++++++++++++++++++++----------------------------
1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b2a452dd70fa..0bde4f6d9126 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -220,12 +220,6 @@ struct slab_alloc_context {
unsigned int alloc_flags;
};

-/* Structure holding parameters for get_from_partial() call chain */
-struct partial_context {
- gfp_t flags;
- unsigned int orig_size;
-};
-
/* Structure holding parameters for get_partial_node_bulk() */
struct partial_bulk_context {
gfp_t flags;
@@ -3826,7 +3820,8 @@ static bool get_partial_node_bulk(struct kmem_cache *s,
*/
static void *get_from_partial_node(struct kmem_cache *s,
struct kmem_cache_node *n,
- struct partial_context *pc)
+ gfp_t gfp_flags,
+ struct slab_alloc_context *ac)
{
struct slab *slab, *slab2;
unsigned long flags;
@@ -3841,7 +3836,7 @@ static void *get_from_partial_node(struct kmem_cache *s,
if (!n || !n->nr_partial)
return NULL;

- if (gfpflags_allow_spinning(pc->flags))
+ if (alloc_flags_allow_spinning(ac->alloc_flags))
spin_lock_irqsave(&n->list_lock, flags);
else if (!spin_trylock_irqsave(&n->list_lock, flags))
return NULL;
@@ -3849,12 +3844,12 @@ static void *get_from_partial_node(struct kmem_cache *s,

struct freelist_counters old, new;

- if (!pfmemalloc_match(slab, pc->flags))
+ if (!pfmemalloc_match(slab, gfp_flags))
continue;

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
object = alloc_single_from_partial(s, n, slab,
- pc->orig_size);
+ ac->orig_size);
if (object)
break;
continue;
@@ -3888,15 +3883,16 @@ static void *get_from_partial_node(struct kmem_cache *s,
/*
* Get an object from somewhere. Search in increasing NUMA distances.
*/
-static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *pc)
+static void *get_from_any_partial(struct kmem_cache *s, gfp_t gfp_flags,
+ struct slab_alloc_context *ac)
{
#ifdef CONFIG_NUMA
struct zonelist *zonelist;
struct zoneref *z;
struct zone *zone;
- enum zone_type highest_zoneidx = gfp_zone(pc->flags);
+ enum zone_type highest_zoneidx = gfp_zone(gfp_flags);
unsigned int cpuset_mems_cookie;
- bool allow_spin = gfpflags_allow_spinning(pc->flags);
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);

/*
* The defrag ratio allows a configuration of the tradeoffs between
@@ -3930,16 +3926,17 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
if (allow_spin)
cpuset_mems_cookie = read_mems_allowed_begin();

- zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
+ zonelist = node_zonelist(mempolicy_slab_node(), gfp_flags);
for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
struct kmem_cache_node *n;

n = get_node(s, zone_to_nid(zone));

- if (n && cpuset_zone_allowed(zone, pc->flags) &&
+ if (n && cpuset_zone_allowed(zone, gfp_flags) &&
n->nr_partial > s->min_partial) {

- void *object = get_from_partial_node(s, n, pc);
+ void *object = get_from_partial_node(s, n,
+ gfp_flags, ac);

if (object) {
/*
@@ -3961,8 +3958,8 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
/*
* Get an object from a partial slab
*/
-static void *get_from_partial(struct kmem_cache *s, int node,
- struct partial_context *pc)
+static void *get_from_partial(struct kmem_cache *s, int node, gfp_t flags,
+ struct slab_alloc_context *ac)
{
int searchnode = node;
void *object;
@@ -3970,11 +3967,11 @@ static void *get_from_partial(struct kmem_cache *s, int node,
if (node == NUMA_NO_NODE)
searchnode = numa_mem_id();

- object = get_from_partial_node(s, get_node(s, searchnode), pc);
- if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
+ object = get_from_partial_node(s, get_node(s, searchnode), flags, ac);
+ if (object || (node != NUMA_NO_NODE && (flags & __GFP_THISNODE)))
return object;

- return get_from_any_partial(s, pc);
+ return get_from_any_partial(s, flags, ac);
}

static bool has_pcs_used(int cpu, struct kmem_cache *s)
@@ -4454,16 +4451,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
struct slab_alloc_context *ac)
{
bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
+ gfp_t trynode_flags;
void *object;
struct slab *slab;
- struct partial_context pc;
bool try_thisnode = true;

stat(s, ALLOC_SLOWPATH);

new_objects:

- pc.flags = gfpflags;
+ trynode_flags = gfpflags;
/*
* When a preferred node is indicated but no __GFP_THISNODE
*
@@ -4479,17 +4476,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
&& try_thisnode)) {
if (unlikely(!allow_spin))
/* Do not upgrade gfp to NOWAIT from more restrictive mode */
- pc.flags = gfpflags | __GFP_THISNODE;
+ trynode_flags = gfpflags | __GFP_THISNODE;
else
- pc.flags = GFP_NOWAIT | __GFP_THISNODE;
+ trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
}

- pc.orig_size = ac->orig_size;
- object = get_from_partial(s, node, &pc);
+ object = get_from_partial(s, node, trynode_flags, ac);
if (object)
goto success;

- slab = new_slab(s, pc.flags, node);
+ slab = new_slab(s, trynode_flags, node);

if (unlikely(!slab)) {
if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:24 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Add the alloc_flags parameter to allocate_slab() and new_slab()
so it can be used to determine if spinning is allowed, independently
from gfp flags.

refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
reached from contexts that allow spinning.

Also change how trynode_flags are constructed in ___slab_alloc() to
achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
of a branch. It will now also not upgrade in cases where gfp is weaker
than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
kmalloc_nolock() - which is more correct anyway.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0bde4f6d9126..20df6b131f63 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3378,9 +3378,10 @@ static __always_inline void unaccount_slab(struct slab *slab, int order,
}

/* Allocate and initialize a slab without building its freelist. */
-static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
+ unsigned int alloc_flags, int node)
{
- bool allow_spin = gfpflags_allow_spinning(flags);
+ bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
struct slab *slab;
struct kmem_cache_order_objects oo = s->oo;
gfp_t alloc_gfp;
@@ -3438,15 +3439,17 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
return slab;
}

-static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *new_slab(struct kmem_cache *s, gfp_t flags,
+ unsigned int alloc_flags, int node)
{
if (unlikely(flags & GFP_SLAB_BUG_MASK))
flags = kmalloc_fix_flags(flags);

WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));

- return allocate_slab(s,
- flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+ flags &= GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK;
+
+ return allocate_slab(s, flags, alloc_flags, node);
}

static void __free_slab(struct kmem_cache *s, struct slab *slab, bool allow_spin)
@@ -4467,25 +4470,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
* 1) try to get a partial slab from target node only by having
* __GFP_THISNODE in pc.flags for get_from_partial()
* 2) if 1) failed, try to allocate a new slab from target node with
- * GPF_NOWAIT | __GFP_THISNODE opportunistically
+ * (at most) GPF_NOWAIT | __GFP_THISNODE opportunistically
* 3) if 2) failed, retry with original gfpflags which will allow
* get_from_partial() try partial lists of other nodes before
* potentially allocating new page from other nodes
*/
if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
&& try_thisnode)) {
- if (unlikely(!allow_spin))
- /* Do not upgrade gfp to NOWAIT from more restrictive mode */
- trynode_flags = gfpflags | __GFP_THISNODE;
- else
- trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
+ trynode_flags &= GFP_NOWAIT;
+ trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
}

object = get_from_partial(s, node, trynode_flags, ac);
if (object)
goto success;

- slab = new_slab(s, trynode_flags, node);
+ slab = new_slab(s, trynode_flags, ac->alloc_flags, node);

if (unlikely(!slab)) {
if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
@@ -7215,7 +7215,7 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,

new_slab:

- slab = new_slab(s, gfp, local_node);
+ slab = new_slab(s, gfp, SLAB_ALLOC_DEFAULT, local_node);
if (!slab)
goto out;

@@ -7563,7 +7563,7 @@ static void early_kmem_cache_node_alloc(int node)

BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));

- slab = new_slab(kmem_cache_node, GFP_NOWAIT, node);
+ slab = new_slab(kmem_cache_node, GFP_NOWAIT, SLAB_ALLOC_DEFAULT, node);

BUG_ON(!slab);
if (slab_nid(slab) != node) {

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:27 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Convert the whole following call stack to pass either slab_alloc_context
(thus including alloc_flags) or just alloc_flags as necessary:

slab_post_alloc_hook()
alloc_tagging_slab_alloc_hook()
__alloc_tagging_slab_alloc_hook()
prepare_slab_obj_exts_hook()
alloc_slab_obj_exts()
memcg_slab_post_alloc_hook()
__memcg_slab_post_alloc_hook()
alloc_slab_obj_exts()

Converting all these at once avoids unnecessary churn and is mostly
mechanical.

This ultimately allows to decide if spinning is allowed using
alloc_flags in alloc_slab_obj_exts(), as well as slab_post_alloc_hook().
Aside from alloc_from_pcs_bulk() (to be handled next) there is nothing
else in slab itself relying on gfpflags_allow_spinning() which can
be false even if not called from kmalloc_nolock().

A followup change will also use the alloc_flags availability in the call
stack above to remove the __GFP_NO_OBJ_EXT flag.

For alloc_slab_obj_exts(), also replace the suboptimal "bool new_slab"
parameter with a SLAB_ALLOC_NEW_SLAB flag with identical functionality.

To further reduce the number of parameters of slab_post_alloc_hook(),
also make 'struct list_lru *lru' (which is NULL for most callers) a new
field of slab_alloc_context.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/memcontrol.c | 5 +--
mm/slab.h | 6 ++--
mm/slub.c | 94 +++++++++++++++++++++++++++++++++------------------------
3 files changed, 62 insertions(+), 43 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c03d4787d466..29390ba13baa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3424,7 +3424,8 @@ static inline size_t obj_full_size(struct kmem_cache *s)
}

bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p)
+ gfp_t flags, unsigned int slab_alloc_flags,
+ size_t size, void **p)
{
size_t obj_size = obj_full_size(s);
struct obj_cgroup *objcg;
@@ -3472,7 +3473,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
slab = virt_to_slab(p[i]);

if (!slab_obj_exts(slab) &&
- alloc_slab_obj_exts(slab, s, flags, false)) {
+ alloc_slab_obj_exts(slab, s, flags, slab_alloc_flags)) {
continue;
}

diff --git a/mm/slab.h b/mm/slab.h
index 3e75182ee144..13517abcad21 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -19,6 +19,7 @@
/* slab's alloc_flags definitions */
#define SLAB_ALLOC_DEFAULT 0x00
#define SLAB_ALLOC_TRYLOCK 0x01
+#define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */

static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
{
@@ -612,7 +613,7 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab);
+ gfp_t gfp, unsigned int alloc_flags);

#else /* CONFIG_SLAB_OBJ_EXT */

@@ -642,7 +643,8 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)

#ifdef CONFIG_MEMCG
bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p);
+ gfp_t flags, unsigned int slab_alloc_flags,
+ size_t size, void **p);
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects, unsigned long obj_exts);
#endif
diff --git a/mm/slub.c b/mm/slub.c
index 20df6b131f63..034f2cd1c1fd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -218,6 +218,7 @@ struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
unsigned int alloc_flags;
+ struct list_lru *lru;
};

/* Structure holding parameters for get_partial_node_bulk() */
@@ -2155,9 +2156,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
+ gfp_t gfp, unsigned int alloc_flags)
{
- bool allow_spin = gfpflags_allow_spinning(gfp);
+ const bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
unsigned int objects = objs_per_slab(s, slab);
unsigned long new_exts;
unsigned long old_exts;
@@ -2206,7 +2207,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);

- if (new_slab) {
+ if (alloc_flags & SLAB_ALLOC_NEW_SLAB) {
/*
* If the slab is brand new and nobody can yet access its
* obj_exts, no synchronization is required and obj_exts can
@@ -2331,7 +2332,7 @@ static inline void init_slab_obj_exts(struct slab *slab)
}

static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
+ gfp_t gfp, unsigned int alloc_flags)
{
return 0;
}
@@ -2351,10 +2352,10 @@ static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,

static inline unsigned long
prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
- gfp_t flags, void *p)
+ gfp_t flags, unsigned int alloc_flags, void *p)
{
if (!slab_obj_exts(slab) &&
- alloc_slab_obj_exts(slab, s, flags, false)) {
+ alloc_slab_obj_exts(slab, s, flags, alloc_flags)) {
pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
__func__, s->name);
return 0;
@@ -2366,7 +2367,8 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,

/* Should be called only if mem_alloc_profiling_enabled() */
static noinline void
-__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
unsigned long obj_exts;
struct slabobj_ext *obj_ext;
@@ -2382,7 +2384,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
return;

slab = virt_to_slab(object);
- obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, object);
+ obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, alloc_flags, object);
/*
* Currently obj_exts is used only for allocation profiling.
* If other users appear then mem_alloc_profiling_enabled()
@@ -2401,10 +2403,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
}

static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
if (mem_alloc_profiling_enabled())
- __alloc_tagging_slab_alloc_hook(s, object, flags);
+ __alloc_tagging_slab_alloc_hook(s, object, flags, alloc_flags);
}

/* Should be called only if mem_alloc_profiling_enabled() */
@@ -2443,7 +2446,8 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
#else /* CONFIG_MEM_ALLOC_PROFILING */

static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
}

@@ -2461,8 +2465,9 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
static void memcg_alloc_abort_single(struct kmem_cache *s, void *object);

static __fastpath_inline
-bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p)
+bool memcg_slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
+ size_t size, void **p,
+ struct slab_alloc_context *ac)
{
if (likely(!memcg_kmem_online()))
return true;
@@ -2470,7 +2475,8 @@ bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
return true;

- if (likely(__memcg_slab_post_alloc_hook(s, lru, flags, size, p)))
+ if (likely(__memcg_slab_post_alloc_hook(s, ac->lru, flags,
+ ac->alloc_flags, size, p)))
return true;

if (likely(size == 1)) {
@@ -2558,14 +2564,15 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
put_slab_obj_exts(obj_exts);
}

- return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
+ return __memcg_slab_post_alloc_hook(s, NULL, flags, SLAB_ALLOC_DEFAULT,
+ 1, &p);
}

#else /* CONFIG_MEMCG */
static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- gfp_t flags, size_t size,
- void **p)
+ gfp_t flags,
+ size_t size, void **p,
+ struct slab_alloc_context *ac)
{
return true;
}
@@ -3352,12 +3359,14 @@ static inline void init_freelist_randomization(void) { }
#endif /* CONFIG_SLAB_FREELIST_RANDOM */

static __always_inline void account_slab(struct slab *slab, int order,
- struct kmem_cache *s, gfp_t gfp)
+ struct kmem_cache *s, gfp_t gfp,
+ unsigned int alloc_flags)
{
if (memcg_kmem_online() &&
(s->flags & SLAB_ACCOUNT) &&
!slab_obj_exts(slab))
- alloc_slab_obj_exts(slab, s, gfp, true);
+ alloc_slab_obj_exts(slab, s, gfp,
+ alloc_flags | SLAB_ALLOC_NEW_SLAB);

mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
PAGE_SIZE << order);
@@ -3434,7 +3443,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
* to prevent the array from being overwritten.
*/
alloc_slab_obj_exts_early(s, slab);
- account_slab(slab, oo_order(oo), s, flags);
+ account_slab(slab, oo_order(oo), s, flags, alloc_flags);

return slab;
}
@@ -4568,9 +4577,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
}

static __fastpath_inline
-bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p,
- unsigned int orig_size)
+bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
+ void **p, struct slab_alloc_context *ac)
{
bool init = slab_want_init_on_alloc(flags, s);
bool kasan_init = init;
@@ -4599,15 +4607,15 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
if (p[i] && init && (!kasan_init ||
!kasan_has_integrated_init()))
- memset(p[i], 0, orig_size);
- if (gfpflags_allow_spinning(flags))
+ memset(p[i], 0, ac->orig_size);
+ if (alloc_flags_allow_spinning(ac->alloc_flags))
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, init_flags);
kmsan_slab_alloc(s, p[i], init_flags);
- alloc_tagging_slab_alloc_hook(s, p[i], flags);
+ alloc_tagging_slab_alloc_hook(s, p[i], flags, ac->alloc_flags);
}

- return memcg_slab_post_alloc_hook(s, lru, flags, size, p);
+ return memcg_slab_post_alloc_hook(s, flags, size, p, ac);
}

/*
@@ -4902,6 +4910,12 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
{
const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;
+ struct slab_alloc_context ac = {
+ .caller_addr = addr,
+ .orig_size = orig_size,
+ .alloc_flags = alloc_flags,
+ .lru = lru,
+ };

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
@@ -4913,14 +4927,8 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, alloc_flags, node);

- if (unlikely(!object)) {
- struct slab_alloc_context ac = {
- .caller_addr = addr,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- };
+ if (!object)
object = __slab_alloc_node(s, gfpflags, node, &ac);
- }

maybe_wipe_obj_freeptr(s, object);

@@ -4929,7 +4937,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);
+ slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);

return object;
}
@@ -5224,6 +5232,10 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
struct slab_sheaf *sheaf)
{
void *ret = NULL;
+ struct slab_alloc_context ac = {
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (sheaf->size == 0)
goto out;
@@ -5234,7 +5246,7 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
ret = sheaf->objects[--sheaf->size];

/* add __GFP_NOFAIL to force successful memcg charging */
- slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
+ slab_post_alloc_hook(s, gfp | __GFP_NOFAIL, 1, &ret, &ac);
out:
trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);

@@ -5421,7 +5433,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);
+ slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);

ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
return ret;
@@ -7287,6 +7299,10 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
{
unsigned int i = 0;
void *kfence_obj;
+ struct slab_alloc_context ac = {
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (!size)
return false;
@@ -7337,7 +7353,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,

out:
/* memcg and kmem_cache debug support and memory initialization */
- return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
+ return likely(slab_post_alloc_hook(s, flags, size, p, &ac));
}
EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:31 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
The function takes all the parameters that exist as fields in
slab_alloc_context, except alloc_flags. Replace them with a single
pointer.

This moves slab_alloc_context initialization to a number of callers,
which is more verbose, but arguably also more clear than a long list of
parameters, and most do not use the 'lru' field.

This will also allow kmalloc_nolock() to call slab_alloc_node() and
reduce the special open-coding it currently has.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 75 ++++++++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 034f2cd1c1fd..b511d768e9b6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4905,30 +4905,23 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s,
+ gfp_t gfpflags, int node, struct slab_alloc_context *ac)
{
- const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;
- struct slab_alloc_context ac = {
- .caller_addr = addr,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- .lru = lru,
- };

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
return NULL;

- object = kfence_alloc(s, orig_size, gfpflags);
+ object = kfence_alloc(s, ac->orig_size, gfpflags);
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
+ object = alloc_from_pcs(s, gfpflags, ac->alloc_flags, node);

if (!object)
- object = __slab_alloc_node(s, gfpflags, node, &ac);
+ object = __slab_alloc_node(s, gfpflags, node, ac);

maybe_wipe_obj_freeptr(s, object);

@@ -4937,15 +4930,21 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);
+ slab_post_alloc_hook(s, gfpflags, 1, &object, ac);

return object;
}

void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

@@ -4956,8 +4955,15 @@ EXPORT_SYMBOL(kmem_cache_alloc_noprof);
void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
- void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ .lru = lru,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

@@ -4989,7 +4995,14 @@ EXPORT_SYMBOL(kmem_cache_charge);
*/
void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, node, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);

@@ -5319,6 +5332,11 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
{
struct kmem_cache *s;
void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = caller,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = __kmalloc_large_node_noprof(size, flags, node);
@@ -5332,7 +5350,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,

s = kmalloc_slab(size, b, flags, token);

- ret = slab_alloc_node(s, NULL, flags, node, caller, size);
+ ret = slab_alloc_node(s, flags, node, &ac);
ret = kasan_kmalloc(s, ret, size, flags);
trace_kmalloc(caller, ret, size, s->size, flags, node);
return ret;
@@ -5451,8 +5469,14 @@ EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);

void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
- _RET_IP_, size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);

@@ -5464,7 +5488,14 @@ EXPORT_SYMBOL(__kmalloc_cache_noprof);
void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, node, &ac);

trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:35 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
The last user of gfpflags_allow_spinning() in slab is
alloc_from_pcs_bulk(), which is only called from
kmem_cache_alloc_bulk().

It turns out that gfpflags_allow_spinning() is not necessary, because
kmem_cache_alloc_bulk() is only expected to be called from context that
does allow spinning, so simply replace it with 'true'.

With that, we can remove the "@flags must allow spinning" part of the
kernel doc, as there is no more connection to the gfp flags in the slab
implementation.

Also remove a comment in alloc_slab_obj_exts() because there should be
no more false positives possible due to gfp_allowed_mask during early
boot.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b511d768e9b6..dee69e0b7780 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2171,12 +2171,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

sz = obj_exts_alloc_size(s, slab, gfp);

- /*
- * Note that allow_spin may be false during early boot and its
- * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
- * architectures with cmpxchg16b, early obj_exts will be missing for
- * very early allocations on those.
- */
if (unlikely(!allow_spin))
vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
slab_nid(slab));
@@ -4851,7 +4845,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
}

full = barn_replace_empty_sheaf(barn, pcs->main,
- gfpflags_allow_spinning(gfp));
+ /* allow_spin = */ true);

if (full) {
stat(s, BARN_GET);
@@ -7317,8 +7311,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
* Allocate @size objects from @s and places them into @p. @size must be larger
* than 0.
*
- * Interrupts must be enabled when calling this function and @flags must allow
- * spinning.
+ * Interrupts must be enabled when calling this function.
*
* Unlike alloc_pages_bulk(), this function does not check for already allocated
* objects in @p, and thus the caller does not need to zero it.

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:39 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
alloc flag that prevents kmalloc recursion. For that we need a version
of kmalloc() that takes alloc_flags and use it in places that perform
these potentially recursive kmalloc allocations (of sheaves or obj_ext
arrays).

As a preparatory step, make __do_kmalloc_node() take a pointer to
slab_alloc_context. This replaces the 'caller' parameter and includes
alloc_flags which we'll make use of.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 47 ++++++++++++++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index dee69e0b7780..c11edd58b52d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5322,19 +5322,14 @@ EXPORT_SYMBOL(__kmalloc_large_node_noprof);

static __always_inline
void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
- unsigned long caller, kmalloc_token_t token)
+ kmalloc_token_t token, struct slab_alloc_context *ac)
{
struct kmem_cache *s;
void *ret;
- struct slab_alloc_context ac = {
- .caller_addr = caller,
- .orig_size = size,
- .alloc_flags = SLAB_ALLOC_DEFAULT,
- };

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = __kmalloc_large_node_noprof(size, flags, node);
- trace_kmalloc(caller, ret, size,
+ trace_kmalloc(ac->caller_addr, ret, size,
PAGE_SIZE << get_order(size), flags, node);
return ret;
}
@@ -5344,22 +5339,34 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,

s = kmalloc_slab(size, b, flags, token);

- ret = slab_alloc_node(s, flags, node, &ac);
+ ret = slab_alloc_node(s, flags, node, ac);
ret = kasan_kmalloc(s, ret, size, flags);
- trace_kmalloc(caller, ret, size, s->size, flags, node);
+ trace_kmalloc(ac->caller_addr, ret, size, s->size, flags, node);
return ret;
}
void *__kmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags, int node)
{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
- _RET_IP_, PASS_TOKEN_PARAM(token));
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_node_noprof);

void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
{
- return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_,
- PASS_TOKEN_PARAM(token));
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE,
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_noprof);

@@ -5455,9 +5462,14 @@ EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);
void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
int node, unsigned long caller)
{
- return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
- caller, PASS_TOKEN_PARAM(token));
+ struct slab_alloc_context ac = {
+ .caller_addr = caller,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

+ return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);

@@ -6858,6 +6870,11 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
{
bool allow_block;
void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

/*
* It doesn't really make sense to fallback to vmalloc for sub page
@@ -6865,7 +6882,7 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
*/
ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
kmalloc_gfp_adjust(flags, size),
- node, _RET_IP_, PASS_TOKEN_PARAM(token));
+ node, PASS_TOKEN_PARAM(token), &ac);
if (ret || size <= PAGE_SIZE)
return ret;


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:43 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
alloc flag that prevents kmalloc recursion. For that we need a version
of kmalloc() that takes alloc_flags and use it in places that perform
these potentially recursive kmalloc allocations (of sheaves or obj_ext
arrays).

Add this function, named kmalloc_flags(). Right now it's only useful for
these nested allocations, so it doesn't need to optimize build-time
constant sizes like kmalloc() or kmalloc_buckets.

Since we need it to support both normal and non-spinning
kmalloc_nolock() context through the SLAB_ALLOC_TRYLOCK flag, split out
most of the special _kmalloc_nolock_noprof() implementation to
__kmalloc_nolock_noprof() that takes a slab_alloc_context, and make
_kmalloc_nolock_noprof() a simple tail calling wrapper with the proper
context.

kmalloc_flags() can thus determine whether to call
__kmalloc_nolock_noprof() or __do_kmalloc_node(), based on the
given alloc_flags.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
include/linux/slab.h | 12 +++++++++++
mm/slub.c | 56 ++++++++++++++++++++++++++++++++++++++++------------
2 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ce1c867dc0ba..11e82fdbe8d3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -944,6 +944,10 @@ void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
void *__kmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags, int node)
__assume_kmalloc_alignment __alloc_size(1);

+void *__kmalloc_flags_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags,
+ unsigned int alloc_flags, int node)
+ __assume_kmalloc_alignment __alloc_size(1);
+
void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
__assume_kmalloc_alignment __alloc_size(3);

@@ -1176,6 +1180,14 @@ static __always_inline __alloc_size(1) void *_kmalloc_node_noprof(size_t size, g
#define kmalloc_node_noprof(...) _kmalloc_node_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
#define kmalloc_node(...) alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))

+static __always_inline __alloc_size(1) void *_kmalloc_flags_noprof(size_t size,
+ gfp_t flags, unsigned int alloc_flags, int node, kmalloc_token_t token)
+{
+ return __kmalloc_flags_noprof(PASS_TOKEN_PARAMS(size, token), flags, alloc_flags, node);
+}
+#define kmalloc_flags_noprof(...) _kmalloc_flags_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
+#define kmalloc_flags(...) alloc_hooks(kmalloc_flags_noprof(__VA_ARGS__))
+
static inline __alloc_size(1, 2) void *_kmalloc_array_noprof(size_t n, size_t size, gfp_t flags, kmalloc_token_t token)
{
size_t bytes;
diff --git a/mm/slub.c b/mm/slub.c
index c11edd58b52d..86691eb14002 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5370,15 +5370,15 @@ void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
}
EXPORT_SYMBOL(__kmalloc_noprof);

-void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
+static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags,
+ int node, struct slab_alloc_context *ac)
{
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
- size_t orig_size = size;
- unsigned int alloc_flags = SLAB_ALLOC_TRYLOCK;
struct kmem_cache *s;
bool can_retry = true;
void *ret;

+ VM_WARN_ON_ONCE(alloc_flags_allow_spinning(ac->alloc_flags));
VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
__GFP_NO_OBJ_EXT));

@@ -5413,23 +5413,17 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
*/
return NULL;

- ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
+ ret = alloc_from_pcs(s, alloc_gfp, ac->alloc_flags, node);
if (ret)
goto success;

- struct slab_alloc_context ac = {
- .caller_addr = _RET_IP_,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- };
-
/*
* Do not call slab_alloc_node(), since trylock mode isn't
* compatible with slab_pre_alloc_hook/should_failslab and
* kfence_alloc. Hence call __slab_alloc_node() (at most twice)
* and slab_post_alloc_hook() directly.
*/
- ret = __slab_alloc_node(s, alloc_gfp, node, &ac);
+ ret = __slab_alloc_node(s, alloc_gfp, node, ac);

/*
* It's possible we failed due to trylock as we preempted someone with
@@ -5452,11 +5446,23 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);
+ slab_post_alloc_hook(s, alloc_gfp, 1, &ret, ac);

- ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
+ ret = kasan_kmalloc(s, ret, ac->orig_size, alloc_gfp);
return ret;
}
+
+void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
+{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_TRYLOCK,
+ };
+
+ return __kmalloc_nolock_noprof(PASS_TOKEN_PARAMS(size, token),
+ gfp_flags, node, &ac);
+}
EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);

void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
@@ -5510,6 +5516,30 @@ void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
}
EXPORT_SYMBOL(__kmalloc_cache_node_noprof);

+/*
+ * The only version of kmalloc_node() that takes alloc_flags and thus can
+ * determine on its own whether to handle the allocation via kmalloc_nolock() or
+ * normally
+ */
+void *__kmalloc_flags_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags,
+ unsigned int alloc_flags, int node)
+{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = alloc_flags,
+ };
+
+ if (alloc_flags_allow_spinning(alloc_flags)) {
+ return __do_kmalloc_node(size, NULL, flags, node,
+ PASS_TOKEN_PARAM(token), &ac);
+ } else {
+ return __kmalloc_nolock_noprof(PASS_TOKEN_PARAMS(size, token),
+ flags, node, &ac);
+ }
+}
+
+
static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
void *head, void *tail, int bulk_cnt,

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:49 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
__GFP_NO_OBJ_EXT has limited scope within the slab allocator itself and
gfp flags are a scarce resource, unlike slab's alloc_flags.

Introduce SLAB_ALLOC_NO_RECURSE alloc flag that has the same intent as
__GFP_NO_OBJ_EXT but a more generic name, meaning that a kmalloc()
family function should not recurse into another kmalloc*() for the
purposes of allocating auxiliary structures (obj_ext arrays or sheaves).

First, replace the __GFP_NO_OBJ_EXT for allocating obj_ext arrays in
alloc_slab_obj_exts(). Make use of the newly added kmalloc_flags()
function, where we can pass alloc_flags with SLAB_ALLOC_NO_RECURSE
added. This will also pass through SLAB_ALLOC_TRYLOCK so we don't need
to special case kmalloc_nolock() anymore.

Note that until now the kmalloc_nolock() ignored the incoming gfp flags
and hardcoded __GFP_ZERO | __GFP_NO_OBJ_EXT. But it's correct to pass on
the incoming gfp flags (only augmented with __GFP_ZERO), because if
alloc_flags contain SLAB_ALLOC_TRYLOCK, the incoming gfp flags have to
be also compatible with it.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slab.h | 1 +
mm/slub.c | 13 +++++--------
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 13517abcad21..e5bd800d831e 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -20,6 +20,7 @@
#define SLAB_ALLOC_DEFAULT 0x00
#define SLAB_ALLOC_TRYLOCK 0x01
#define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */
+#define SLAB_ALLOC_NO_RECURSE 0x04 /* prevent kmalloc() recursion */

static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
{
diff --git a/mm/slub.c b/mm/slub.c
index 86691eb14002..8a655636dee6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2167,15 +2167,12 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
- gfp |= __GFP_NO_OBJ_EXT;
+ alloc_flags |= SLAB_ALLOC_NO_RECURSE;

sz = obj_exts_alloc_size(s, slab, gfp);

- if (unlikely(!allow_spin))
- vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
- slab_nid(slab));
- else
- vec = kmalloc_node(sz, gfp | __GFP_ZERO, slab_nid(slab));
+ /* This will use kmalloc_nolock() if alloc_flags say so */
+ vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));

if (!vec) {
/*
@@ -2251,7 +2248,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin)
}

/*
- * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+ * obj_exts was created with SLAB_ALLOC_NO_RECURSE flag, therefore its
* corresponding extension will be NULL. alloc_tag_sub() will throw a
* warning if slab has extensions but the extension of an object is
* NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
@@ -2374,7 +2371,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
return;

- if (flags & __GFP_NO_OBJ_EXT)
+ if (alloc_flags & SLAB_ALLOC_NO_RECURSE)
return;

slab = virt_to_slab(object);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:51 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Finish the switch away from __GFP_NO_OBJ_EXT by replacing it with
SLAB_ALLOC_NO_RECURSE when allocating empty sheaves. Pass alloc_flags to
[__]alloc_empty_sheaf(). Callers that can't be part of a recursive
kmalloc() chain simply pass SLAB_ALLOC_DEFAULT. Use kmalloc_flags()
instead of kzalloc() for allocating the sheaf.

This leaves __GFP_NO_OBJ_EXT with no users, to be removed next.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8a655636dee6..26ec015efdba 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2756,7 +2756,7 @@ static inline void *setup_object(struct kmem_cache *s, void *object)
}

static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
- unsigned int capacity)
+ unsigned int alloc_flags, unsigned int capacity)
{
struct slab_sheaf *sheaf;
size_t sheaf_size;
@@ -2767,10 +2767,10 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
* bucket)
*/
if (s->flags & SLAB_KMALLOC)
- gfp |= __GFP_NO_OBJ_EXT;
+ alloc_flags |= SLAB_ALLOC_NO_RECURSE;

sheaf_size = struct_size(sheaf, objects, capacity);
- sheaf = kzalloc(sheaf_size, gfp);
+ sheaf = kmalloc_flags(sheaf_size, gfp | __GFP_ZERO, alloc_flags, NUMA_NO_NODE);

if (unlikely(!sheaf))
return NULL;
@@ -2783,20 +2783,20 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
}

static inline struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s,
- gfp_t gfp)
+ gfp_t gfp, unsigned int alloc_flags)
{
- if (gfp & __GFP_NO_OBJ_EXT)
+ if (alloc_flags & SLAB_ALLOC_NO_RECURSE)
return NULL;

gfp &= ~OBJCGS_CLEAR_MASK;

- return __alloc_empty_sheaf(s, gfp, s->sheaf_capacity);
+ return __alloc_empty_sheaf(s, gfp, alloc_flags, s->sheaf_capacity);
}

static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf)
{
/*
- * If the sheaf was created with __GFP_NO_OBJ_EXT flag then its
+ * If the sheaf was created with SLAB_ALLOC_NO_RECURSE flag then its
* corresponding extension is NULL and alloc_tag_sub() will throw a
* warning, therefore replace NULL with CODETAG_EMPTY to indicate
* that the extension for this sheaf is expected to be NULL.
@@ -4673,7 +4673,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;

if (!empty) {
- empty = alloc_empty_sheaf(s, gfp);
+ empty = alloc_empty_sheaf(s, gfp, alloc_flags);
if (!empty)
return NULL;
}
@@ -5047,7 +5047,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)

if (unlikely(size > s->sheaf_capacity)) {

- sheaf = __alloc_empty_sheaf(s, gfp, size);
+ sheaf = __alloc_empty_sheaf(s, gfp, SLAB_ALLOC_DEFAULT, size);
if (!sheaf)
return NULL;

@@ -5092,7 +5092,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)


if (!sheaf)
- sheaf = alloc_empty_sheaf(s, gfp);
+ sheaf = alloc_empty_sheaf(s, gfp, SLAB_ALLOC_DEFAULT);

if (sheaf) {
sheaf->capacity = s->sheaf_capacity;
@@ -5376,8 +5376,7 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f
void *ret;

VM_WARN_ON_ONCE(alloc_flags_allow_spinning(ac->alloc_flags));
- VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
- __GFP_NO_OBJ_EXT));
+ VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO));

if (unlikely(!size))
return ZERO_SIZE_PTR;
@@ -5890,7 +5889,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
if (!allow_spin)
return NULL;

- empty = alloc_empty_sheaf(s, GFP_NOWAIT);
+ empty = alloc_empty_sheaf(s, GFP_NOWAIT, SLAB_ALLOC_DEFAULT);
if (empty)
goto got_empty;

@@ -6074,7 +6073,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)

local_unlock(&s->cpu_sheaves->lock);

- empty = alloc_empty_sheaf(s, GFP_NOWAIT);
+ empty = alloc_empty_sheaf(s, GFP_NOWAIT, SLAB_ALLOC_DEFAULT);

if (!empty)
goto fail;
@@ -7619,7 +7618,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
if (!s->sheaf_capacity)
pcs->main = &bootstrap_sheaf;
else
- pcs->main = alloc_empty_sheaf(s, GFP_KERNEL);
+ pcs->main = alloc_empty_sheaf(s, GFP_KERNEL, SLAB_ALLOC_DEFAULT);

if (!pcs->main)
return -ENOMEM;
@@ -8485,7 +8484,8 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)

pcs = per_cpu_ptr(s->cpu_sheaves, cpu);

- pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
+ pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL,
+ SLAB_ALLOC_DEFAULT, capacity);

if (!pcs->main) {
failed = true;

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 5:18:55 AMJun 9
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
All users of the flag are converted to SLAB_ALLOC_NO_RECURSE. Free up
the flag bit.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
include/linux/gfp_types.h | 7 -------
include/linux/slab.h | 2 +-
include/trace/events/mmflags.h | 10 +---------
lib/alloc_tag.c | 2 +-
tools/include/linux/gfp_types.h | 7 -------
5 files changed, 3 insertions(+), 25 deletions(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 6c75df30a281..a93b8bd200b7 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
#endif
- ___GFP_NO_OBJ_EXT_BIT,
___GFP_LAST_BIT
};

@@ -96,7 +95,6 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
-#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)

/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -137,17 +135,12 @@ enum {
* node with no fallbacks or placement policy enforcements.
*
* %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)

/**
* DOC: Watermark modifiers
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 11e82fdbe8d3..15d1917b81d3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1043,7 +1043,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
/**
* kmalloc_nolock - Allocate an object of given size from any context.
* @size: size to allocate
- * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO, __GFP_NO_OBJ_EXT
+ * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO
* allowed.
* @node: node number of the target node.
*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a6e5a44c9b42..c1a05ff0feab 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -54,18 +54,10 @@
# define TRACE_GFP_FLAGS_LOCKDEP
#endif

-#ifdef CONFIG_SLAB_OBJ_EXT
-# define TRACE_GFP_FLAGS_SLAB \
- TRACE_GFP_EM(NO_OBJ_EXT)
-#else
-# define TRACE_GFP_FLAGS_SLAB
-#endif
-
#define TRACE_GFP_FLAGS \
TRACE_GFP_FLAGS_GENERAL \
TRACE_GFP_FLAGS_KASAN \
- TRACE_GFP_FLAGS_LOCKDEP \
- TRACE_GFP_FLAGS_SLAB
+ TRACE_GFP_FLAGS_LOCKDEP

#undef TRACE_GFP_EM
#define TRACE_GFP_EM(a) TRACE_DEFINE_ENUM(___GFP_##a##_BIT);
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index ed1bdcf1f8ab..63686b44a23d 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -776,7 +776,7 @@ static __init bool need_page_alloc_tagging(void)
* If insufficient, a warning will be triggered to alert the user.
*
* TODO: Replace fixed-size array with dynamic allocation using
- * a GFP flag similar to ___GFP_NO_OBJ_EXT to avoid recursion.
+ * something similar to slab's SLAB_ALLOC_NO_RECURSE to avoid recursion.
*/
#define EARLY_ALLOC_PFN_MAX 8192

diff --git a/tools/include/linux/gfp_types.h b/tools/include/linux/gfp_types.h
index 6c75df30a281..a93b8bd200b7 100644
--- a/tools/include/linux/gfp_types.h
+++ b/tools/include/linux/gfp_types.h
@@ -55,7 +55,6 @@ enum {
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
#endif
- ___GFP_NO_OBJ_EXT_BIT,
___GFP_LAST_BIT
};

@@ -96,7 +95,6 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
-#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)

/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -137,17 +135,12 @@ enum {
* node with no fallbacks or placement policy enforcements.
*
* %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
- *
- * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
- * mark_obj_codetag_empty() should be called upon freeing for objects allocated
- * with this flag to indicate that their NULL tags are expected and normal.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
-#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)

/**
* DOC: Watermark modifiers

--
2.54.0

Usama Arif

unread,
Jun 9, 2026, 9:35:48 AMJun 9
to Vlastimil Babka (SUSE), Usama Arif, Harry Yoo, hao...@linux.dev, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Tue, 09 Jun 2026 11:17:45 +0200 "Vlastimil Babka (SUSE)" <vba...@kernel.org> wrote:

> This series is based on slab/for-next. If all goes well, it would
> hopefully go to slab/for-next soon after the 7.2 merge window, so any
> other work can be based on it to avoid conflicts, as it touches a lot
> parts of slab.
>
> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>
> The slab implementation currently relies on gfp flags to convey
> some context information internally:
>
> - The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
> on locks", and intended to be used by kmalloc_nolock(). But false
> positives are possible e.g. during early boot where gfp_allowed_mask
> clears __GFP_RECLAIM from all allocations. This leads to unnecessary
> allocation failures and workarounds such as fd3634312a04 ("debugobject:
> Make it work with deferred page initialization - again").
>
> - __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
> space, only to prevent recursive kmalloc() allocations for obj_ext
> arrays and sheaves.
>

Hello Valstimil!

I think memory allocation profiling uses __GFP_NO_OBJ_EXT, and I dont see
it being removed in the series (hopefully I didnt miss it).

Adding Hao Ge in CC who did this in the commit:
mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list

Vlastimil Babka (SUSE)

unread,
Jun 9, 2026, 10:28:39 AMJun 9
to Usama Arif, Harry Yoo, hao...@linux.dev, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/9/26 15:35, Usama Arif wrote:
> On Tue, 09 Jun 2026 11:17:45 +0200 "Vlastimil Babka (SUSE)" <vba...@kernel.org> wrote:
>
>> This series is based on slab/for-next. If all goes well, it would
>> hopefully go to slab/for-next soon after the 7.2 merge window, so any
>> other work can be based on it to avoid conflicts, as it touches a lot
>> parts of slab.
>>
>> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>>
>> The slab implementation currently relies on gfp flags to convey
>> some context information internally:
>>
>> - The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
>> on locks", and intended to be used by kmalloc_nolock(). But false
>> positives are possible e.g. during early boot where gfp_allowed_mask
>> clears __GFP_RECLAIM from all allocations. This leads to unnecessary
>> allocation failures and workarounds such as fd3634312a04 ("debugobject:
>> Make it work with deferred page initialization - again").
>>
>> - __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
>> space, only to prevent recursive kmalloc() allocations for obj_ext
>> arrays and sheaves.
>>
>
> Hello Valstimil!
>
> I think memory allocation profiling uses __GFP_NO_OBJ_EXT, and I dont see
> it being removed in the series (hopefully I didnt miss it).
>
> Adding Hao Ge in CC who did this in the commit:
> mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list

Thanks for the heads up. I missed it because my series is based on
slab/for-next and that commit is in mm-unstable. My patch 15 actually
modifies the TODO comment that is meanwhile resolved by Hao Ge's patch.

Which means my patch 15/15 can't be used as-is, and at worst I will drop it.
But I'd encourage Hao Ge with Suren to find some way to avoid the gfp flag
usage too, because it's now quite a niche use case (preventing false
positive CONFIG_MEM_ALLOC_PROFILING_DEBUG warnings, IIUC?) to take a
valuable gfp flag bit, IMHO.

Alexei Starovoitov

unread,
Jun 9, 2026, 2:40:27 PMJun 9
to Vlastimil Babka (SUSE), Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Tue Jun 9, 2026 at 2:17 AM PDT, Vlastimil Babka (SUSE) wrote:
> This series is based on slab/for-next. If all goes well, it would
> hopefully go to slab/for-next soon after the 7.2 merge window, so any
> other work can be based on it to avoid conflicts, as it touches a lot
> parts of slab.
>
> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags

Overall looks great to me.
I would ship all patches except the last one for this merge window,
since I don't see anything controversial or dangerous in there.
Especially since it touches slab so much. My slab-arena changes
would need to adopt it and I don't want to delay the whole thing by two merge windows.
Harry's changes would need to rebased as well.
So the sooner the trees converge the better.

Hao Ge

unread,
Jun 10, 2026, 4:30:40 AMJun 10
to Vlastimil Babka (SUSE), Usama Arif, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Hi Vlastimil and Usama

On 2026/6/9 22:28, Vlastimil Babka (SUSE) wrote:
> On 6/9/26 15:35, Usama Arif wrote:
>> On Tue, 09 Jun 2026 11:17:45 +0200 "Vlastimil Babka (SUSE)"<vba...@kernel.org> wrote:
>>
>>> This series is based on slab/for-next. If all goes well, it would
>>> hopefully go to slab/for-next soon after the 7.2 merge window, so any
>>> other work can be based on it to avoid conflicts, as it touches a lot
>>> parts of slab.
>>>
>>> Git:https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>>>
>>> The slab implementation currently relies on gfp flags to convey
>>> some context information internally:
>>>
>>> - The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
>>> on locks", and intended to be used by kmalloc_nolock(). But false
>>> positives are possible e.g. during early boot where gfp_allowed_mask
>>> clears __GFP_RECLAIM from all allocations. This leads to unnecessary
>>> allocation failures and workarounds such as fd3634312a04 ("debugobject:
>>> Make it work with deferred page initialization - again").
>>>
>>> - __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
>>> space, only to prevent recursive kmalloc() allocations for obj_ext
>>> arrays and sheaves.
>>>
>> Hello Valstimil!
>>
>> I think memory allocation profiling uses __GFP_NO_OBJ_EXT, and I dont see
>> it being removed in the series (hopefully I didnt miss it).
>>
>> Adding Hao Ge in CC who did this in the commit:
>> mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list


Thanks for the CC. I'm now aware of this.


> Thanks for the heads up. I missed it because my series is based on
> slab/for-next and that commit is in mm-unstable. My patch 15 actually
> modifies the TODO comment that is meanwhile resolved by Hao Ge's patch.
>
> Which means my patch 15/15 can't be used as-is, and at worst I will drop it.
> But I'd encourage Hao Ge with Suren to find some way to avoid the gfp flag
> usage too, because it's now quite a niche use case (preventing false
> positive CONFIG_MEM_ALLOC_PROFILING_DEBUG warnings, IIUC?) to take a
> valuable gfp flag bit, IMHO.


I previously used __GFP_NO_OBJ_EXT because it serves the same purpose as
in slab.

We use it here to prevent recursion within the page allocator.

I hadn't anticipated that __GFP_NO_OBJ_EXT would be removed so soon.

I agree with you. Since slab no longer uses it, retaining this GFP flag
solely for debug is indeed costly.

I've also been thinking about possible solutions today. Since we are
working in the page allocation path,

we need to take various race conditions into consideration.

For instance, what if an interrupt is triggered inside page_alloc, which
then invokes page_alloc again?

I'm not sure if such a scenario exists in practice, but I believe we
still need to account for it.

I would highly appreciate it if anyone could share their ideas.

I've made a note of this.

Would it make sense to hold off on merging patch 15/15 for now?

We can always include it in a later cycle once we have a proper
replacement for the

memory allocation profilingside. Thanks Best Regards Hao

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 6:36:47 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/9/26 11:17, Vlastimil Babka (SUSE) wrote:
> When zeroing on alloc is requested (by __GFP_ZERO or the init_on_alloc
> parameter), we have been trying to zero the whole kmalloc bucket size
> and not just requested size, if possible.
>
> This probably comes from the past where ksize() could be used to
> discover the bucket size and use it opportunistically beyond the
> requested size. This is now forbidden and enabling debugging such as
> KASAN or slab's red zoning would catch this misuse. Therefore, nobody
> can be relying on __GFP_ZERO zeroing beyond requested size.

Well, Sashiko says I'm wrong because krealloc() might be used later and then
the initially unused part might become used and we won't clear it because we
don't (unless slab debugging is enabled) know the original requested size
anymore. So we have to keep zeroing the full s->object_size in the cases we
currently do that.

Harry Yoo

unread,
Jun 10, 2026, 8:07:00 AMJun 10
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/9/26 6:17 PM, Vlastimil Babka (SUSE) wrote:
> With sheaves, this is no longer part of the allocation fastpath. For
> the same reason, also mark the call to it from slab_alloc_node() as
> unlikely().
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>

--
Cheers,
Harry / Hyeonggon
OpenPGP_signature.asc

Harry Yoo

unread,
Jun 10, 2026, 9:46:41 AMJun 10
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/9/26 6:17 PM, Vlastimil Babka (SUSE) wrote:
> This series is based on slab/for-next. If all goes well, it would
> hopefully go to slab/for-next soon after the 7.2 merge window, so any
> other work can be based on it to avoid conflicts, as it touches a lot
> parts of slab.
>
> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>
> The slab implementation currently relies on gfp flags to convey
> some context information internally:
>
> - The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
> on locks", and intended to be used by kmalloc_nolock(). But false
> positives are possible e.g. during early boot where gfp_allowed_mask
> clears __GFP_RECLAIM from all allocations. This leads to unnecessary
> allocation failures and workarounds such as fd3634312a04 ("debugobject:
> Make it work with deferred page initialization - again").
>
> - __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
> space, only to prevent recursive kmalloc() allocations for obj_ext
> arrays and sheaves.

[ Cc'ing Vishal and Matthew as it's somewhat relevant to memdescs... ]

When the page allocator starts allocateing slab objects,
we still need a way to avoid recursion for obj_ext arrays and sheaves
(by passing SLAB_ALLOC_NO_RECURSE).

Looking at kmalloc_flags(), probably we'll end up introducing a separate
gfp type for slab-specific flags?

Hmm but SLAB_ALLOC_* flags are defined in mm/slab.h and kmalloc_flags()
is defined in include/linux/slab.h. Do yo intend to restrict the slab
alloc flags to MM only?

> The page allocator uses its internal alloc_flags to convey various
> context information, including ALLOC_TRYLOCK (meaning "cannot spin").
> This series copies that concept for the slab allocator, with its own
> slab-specific internal flags:
>
> - SLAB_ALLOC_DEFAULT - no extra flags (the value is 0), but explicit
> - SLAB_ALLOC_TRYLOCK - do not spin on locks (used by kmalloc_nolock())
> - SLAB_ALLOC_NEW_SLAB - replacing existing 'bool new_slab' parameter
> for allocating obj_ext arrays
> - SLAB_ALLOC_NO_RECURSE - replacing usage of __GFP_NO_OBJ_EXT
>
> To reduce the amount of parameters in various internal functions, we
> additionally introduce slab_alloc_context (also inspired by page
> allocator's alloc_context) for passing a number of existing arguments
> and the new alloc_flags:
>
> /* Structure holding extra parameters for slab allocations */
> struct slab_alloc_context {
> unsigned long caller_addr;
> unsigned long orig_size;
> unsigned int alloc_flags;
> struct list_lru *lru;
> };

Perhaps beyond the scope of the patchset, but I wonder if we could have
something like struct slab_alloc_context but for kmalloc callers to
simplify {PASS,DECL}_KMALLOC_PARAMS().

Something like:

struct kmalloc_params {
#ifdef CONFIG_SLAB_BUCKETS
kmem_buckets *b;
#endif
#ifdef CONFIG_KMALLOC_PARTITION_CACHES
kmalloc_token_t token;
#endif
};

The idea is to move optional kmalloc parameters (depending on config)
into a single struct, instead of using the macros.

void *__kmalloc_node(size_t size, gfp_t flags, int node,
unsigned long caller,
struct kmalloc_params params);

void *kmalloc_node() {
/* ... snip ...*/
struct kmalloc_params params = KMALLOC_PARAMS(params.b, params.token);
return __kmalloc_node(size, flags, node, _RET_IP_, params);
}

The compiler should optimize away unused fields based on the config.

Per System V AMD64 ABI, the compiler will use registers to pass the
struct, as long as the struct size does not exceed 16 bytes.
(Otherwise it will be passed on stack).
OpenPGP_signature.asc

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 10:04:23 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/10/26 15:46, Harry Yoo wrote:
>
>
> On 6/9/26 6:17 PM, Vlastimil Babka (SUSE) wrote:
>> This series is based on slab/for-next. If all goes well, it would
>> hopefully go to slab/for-next soon after the 7.2 merge window, so any
>> other work can be based on it to avoid conflicts, as it touches a lot
>> parts of slab.
>>
>> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>>
>> The slab implementation currently relies on gfp flags to convey
>> some context information internally:
>>
>> - The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
>> on locks", and intended to be used by kmalloc_nolock(). But false
>> positives are possible e.g. during early boot where gfp_allowed_mask
>> clears __GFP_RECLAIM from all allocations. This leads to unnecessary
>> allocation failures and workarounds such as fd3634312a04 ("debugobject:
>> Make it work with deferred page initialization - again").
>>
>> - __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
>> space, only to prevent recursive kmalloc() allocations for obj_ext
>> arrays and sheaves.
>
> [ Cc'ing Vishal and Matthew as it's somewhat relevant to memdescs... ]
>
> When the page allocator starts allocateing slab objects,
> we still need a way to avoid recursion for obj_ext arrays and sheaves
> (by passing SLAB_ALLOC_NO_RECURSE).
>
> Looking at kmalloc_flags(), probably we'll end up introducing a separate
> gfp type for slab-specific flags?

What do you mean by separate gfp type?

> Hmm but SLAB_ALLOC_* flags are defined in mm/slab.h and kmalloc_flags()
> is defined in include/linux/slab.h. Do yo intend to restrict the slab
> alloc flags to MM only?

Yeah I don't expect users outside MM. If a valid one appears, we can move
it. I should try moving kmalloc_flags() to mm/slab.h as well, unless there's
some header dependency issue that will prevent it.
Hm but does this work on all architectures, and are we doing this somewhere
(for structures larger than a native word) already?
Also Marco noted earlier that gcc won't optimize away the struct if it
becomes zero-sized:

https://lore.kernel.org/all/CANpmjNO1aNm3mKphDGWasK_N...@mail.gmail.com/

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 10:54:51 AMJun 10
to Alexei Starovoitov, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/9/26 20:40, Alexei Starovoitov wrote:
> On Tue Jun 9, 2026 at 2:17 AM PDT, Vlastimil Babka (SUSE) wrote:
>> This series is based on slab/for-next. If all goes well, it would
>> hopefully go to slab/for-next soon after the 7.2 merge window, so any
>> other work can be based on it to avoid conflicts, as it touches a lot
>> parts of slab.
>>
>> Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags
>
> Overall looks great to me.
> I would ship all patches except the last one for this merge window,
> since I don't see anything controversial or dangerous in there.

Hmm that's ambitious :)

> Especially since it touches slab so much. My slab-arena changes
> would need to adopt it and I don't want to delay the whole thing by two merge windows.
> Harry's changes would need to rebased as well.

It wouldn't be a problem if they went through the slab tree as well, and
just be applied on top of this series already in the slab tree.
In case of bpf tree there could be a shared stable branch.
So no delays by two merge windows.

> So the sooner the trees converge the better.

But yeah it would be simpler.
I can try exposing this to -next ASAP and plan to send a second PR
separately from the series already there in the second merge window week, if
no issues arise, and see if Linus is benevolent.

Harry Yoo

unread,
Jun 10, 2026, 11:04:32 AMJun 10
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
I meant the patchset is introducing a new type to specify the context
(specific to slab) other than gfp_t... which is `unsigned int
alloc_flags` now.

>> Hmm but SLAB_ALLOC_* flags are defined in mm/slab.h and kmalloc_flags()
>> is defined in include/linux/slab.h. Do yo intend to restrict the slab
>> alloc flags to MM only?
>
> Yeah I don't expect users outside MM. If a valid one appears, we can move
> it. I should try moving kmalloc_flags() to mm/slab.h as well, unless there's
> some header dependency issue that will prevent it.

Ack.
apparently not on s390, unfortunately.
on s390 it works only when the struct size does not exceed 8 bytes.

> and are we doing this somewhere
> (for structures larger than a native word) already?

hmm perhaps struct timespec64?

> Also Marco noted earlier that gcc won't optimize away the struct if it
> becomes zero-sized:
>
> https://lore.kernel.org/all/CANpmjNO1aNm3mKphDGWasK_N...@mail.gmail.com/

Ouch, right. That means we still need at least one macro to define those
parameters :( Sounds less promising now...
OpenPGP_signature.asc

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:40:22 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE), sta...@vger.kernel.org
This series is based on slab/for-next. As suggested by Alexei I will
try to put it there ASAP (hence the early respin) and see if it looks
stable enough to be send in the second 7.2 merge window week.

Git: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slab_alloc_flags

The slab implementation currently relies on gfp flags to convey
some context information internally:

- The absence of both __GFP_RECLAIM flags is interpreted as "cannot spin
on locks", and intended to be used by kmalloc_nolock(). But false
positives are possible e.g. during early boot where gfp_allowed_mask
clears __GFP_RECLAIM from all allocations. This leads to unnecessary
allocation failures and workarounds such as fd3634312a04 ("debugobject:
Make it work with deferred page initialization - again").

- __GFP_NO_OBJ_EXT exists and takes up valuable bit in the gfp flags
space, only to prevent recursive kmalloc() allocations for obj_ext
arrays and sheaves.

The page allocator uses its internal alloc_flags to convey various
context information, including ALLOC_TRYLOCK (meaning "cannot spin").
This series copies that concept for the slab allocator, with its own
slab-specific internal flags:

- SLAB_ALLOC_DEFAULT - no extra flags (the value is 0), but explicit
- SLAB_ALLOC_TRYLOCK - do not spin on locks (used by kmalloc_nolock())
- SLAB_ALLOC_NEW_SLAB - replacing existing 'bool new_slab' parameter
for allocating obj_ext arrays
- SLAB_ALLOC_NO_RECURSE - replacing usage of __GFP_NO_OBJ_EXT

To reduce the amount of parameters in various internal functions, we
additionally introduce slab_alloc_context (also inspired by page
allocator's alloc_context) for passing a number of existing arguments
and the new alloc_flags:

/* Structure holding extra parameters for slab allocations */
struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
unsigned int alloc_flags;
struct list_lru *lru;
};

This also replaces the existing struct partial_context.

The last necessary piece is kmalloc_flags() which can take the
alloc_flags in addition to gfp flags and is intended for the recursive
allocations of sheaves and obj_ext arrays, so that both
SLAB_ALLOC_TRYLOCK and SLAB_ALLOC_NO_RECURSE can be communicated.
Internally it decides between kmalloc_nolock() and normal kmalloc()
depending SLAB_ALLOC_TRYLOCK.

The rest of the series is gradually expanding the usage of both
alloc_flags and slab_alloc_context as necessary, with bits of
refactoring. Then, __GFP_NO_OBJ_EXT is removed completely.

Note that some usage of gfpflags_allow_spinning() relying on absence of
__GFP_RECLAIM remains outside of slab (and page allocator) in memcg,
page_owner and stackdepot code. These can thus yield false-positive
decisions that spinning is not allowed, but should not result in
important allocations failing anymore.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
Changes in v2:
- Due to Sashiko review, drop the idea of zeroing orig_size
unconditionally, as it can break krealloc(). Thanks to that found a
pre-existing bug fixed by the new Patch 1. The kfence zeroing related
cleanup is implemented differently in Patch 2.
- Prevent nested kmalloc_nolock warnings due to added gfp flags
(Sashiko)
- Fix a pre-existing issue with opportunistic slab allocation from the
target node only effectively dropping __GFP_NOMEMALLOC and __GFP_RECLAIM.
(Sashiko)
- Move kmalloc_flags() definitions to mm/slab.h (per Harry).
- Link to v1: https://patch.msgid.link/20260609-slab_alloc_f...@kernel.org

---
Vlastimil Babka (SUSE) (16):
mm/slab: do not limit zeroing to orig_size when only red zoning is enabled
mm/slab: do not init any kfence objects on allocation
mm/slab: stop inlining __slab_alloc_node()
mm/slab: introduce slab_alloc_context
mm/slab: introduce alloc_flags and SLAB_ALLOC_TRYLOCK
mm/slab: add alloc_flags to slab_alloc_context
mm/slab: replace struct partial_context with slab_alloc_context
mm/slab: pass alloc_flags to new slab allocation
mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
mm/slab: replace slab_alloc_node() parameters with slab_alloc_context
mm/slab: allow kmem_cache_alloc_bulk() with any gfp flags
mm/slab: pass slab_alloc_context to __do_kmalloc_node()
mm/slab: allow __GFP_NOMEMALLOC and __GFP_NOWARN for kmalloc_nolock()
mm/slab: introduce kmalloc_flags()
mm/slab: remove __GFP_NO_OBJ_EXT usage from alloc_slab_obj_exts()
mm/slab: replace __GFP_NO_OBJ_EXT with SLAB_ALLOC_NO_RECURSE for sheaves

include/linux/slab.h | 5 +-
mm/kfence/core.c | 2 +-
mm/memcontrol.c | 5 +-
mm/slab.h | 29 +++-
mm/slub.c | 439 +++++++++++++++++++++++++++++++--------------------
5 files changed, 304 insertions(+), 176 deletions(-)

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:40:31 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE), sta...@vger.kernel.org
When init (zeroing) on allocation is requested, for kmalloc() we
generally have to zero the full object size even if a smaller size is
requested, in order to provide krealloc()'s __GFP_ZERO guarantees.

But if we track the requested size, krealloc() uses that information to
do the right thing. With red zoning also enabled, any unused size
became part of the red zone, so it must not be zeroed.

However the check is imprecise, and will trigger also when only
SLAB_RED_ZONE is enabled without SLAB_STORE_USER. This means enabling
red zoning alone can compromise krealloc()'s __GFP_ZERO contract.

Fix this by using slub_debug_orig_size() instead, which is the exact
check for whether the requested size is tracked. We don't need to care
if red zoning is also enabled or not. Also update and expand the
comment accordingly.

Fixes: 9ce67395f5a0 ("mm/slub: only zero requested size of buffer for kzalloc when debug enabled")
Cc: <sta...@vger.kernel.org>
Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 63c1ef998dd3..e2ee8f1aaccf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4574,15 +4574,17 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
gfp_t init_flags = flags & gfp_allowed_mask;

/*
- * For kmalloc object, the allocated memory size(object_size) is likely
- * larger than the requested size(orig_size). If redzone check is
- * enabled for the extra space, don't zero it, as it will be redzoned
- * soon. The redzone operation for this extra space could be seen as a
- * replacement of current poisoning under certain debug option, and
- * won't break other sanity checks.
+ * For kmalloc object, the allocated size (object_size) can be larger
+ * than the requested size (orig_size). We however need to zero the
+ * whole object_size to handle possible later krealloc() with
+ *__GFP_ZERO properly.
+ *
+ * But if we keep track of the requested size, krealloc() uses that
+ * information. Additionally if red zoning is enabled, the extra space
+ * is also red zone, so we should not overwrite it. So limit zeroing to
+ * orig_size if we track it.
*/
- if (kmem_cache_debug_flags(s, SLAB_STORE_USER | SLAB_RED_ZONE) &&
- (s->flags & SLAB_KMALLOC))
+ if (slub_debug_orig_size(s))
zero_size = orig_size;

/*

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:40:38 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
When init (zeroing) on allocation is requested, for kmalloc() we
generally have to zero the full object size even if a smaller size is
requested, in order to provide krealloc()'s __GFP_ZERO guarantees.

When we end up allocating a kfence object, kfence perfoms the zeroing on
its own because has its own redzone beyond the requested size. Thus
slab_post_alloc_hook() has an 'init' parameter which has to be evaluated
in all callers (via slab_want_init_on_alloc()) and should be false for
kfence allocations.

For kfence allocations in slab_alloc_node() this is achieved by subtly
skipping over the slab_want_init_on_alloc() call. Other callers (i.e.
kmem_cache_alloc_bulk_noprof()) however evaluate it unconditionally even
if they do end up with a kfence allocation. This is only subtly not a
problem, as those are not kmalloc allocations and thus the "requested
size" equals s->object_size and thus it cannot interfere with kfence's
redzone. There's just a unnecessary double zeroing (in both kfence and
slab_post_alloc_hook()), but it's all very fragile and contradicts the
comment in kfence_guarded_alloc().

Remove this subtlety and simplify the code by eliminating the init
parameter from slab_post_alloc_hook() and make it call
slab_want_init_on_alloc() itself. Instead add a is_kfence_address()
check before performing the memset, which will start doing the right
thing for all callers of slab_post_alloc_hook().

This potentially adds overhead of the is_kfence_address() check to
allocation hotpath, but that one is designed to be as small as possible,
and it's only evaluated if zeroing is about to happen. This means (aside
from init_on_alloc hardening) only for __GFP_ZERO allocations, and the
zeroing itself comes with an overhead likely larger than the added
check.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/kfence/core.c | 2 +-
mm/slub.c | 23 ++++++++---------------
2 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 655dc5ce3240..5e0b406924e9 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -500,7 +500,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g

/*
* We check slab_want_init_on_alloc() ourselves, rather than letting
- * SL*B do the initialization, as otherwise we might overwrite KFENCE's
+ * slab do the initialization, as otherwise it might overwrite KFENCE's
* redzone.
*/
if (unlikely(slab_want_init_on_alloc(gfp, cache)))
diff --git a/mm/slub.c b/mm/slub.c
index e2ee8f1aaccf..8e5264d3ddbf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4565,9 +4565,10 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)

static __fastpath_inline
bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p, bool init,
+ gfp_t flags, size_t size, void **p,
unsigned int orig_size)
{
+ bool init = slab_want_init_on_alloc(flags, s);
unsigned int zero_size = s->object_size;
bool kasan_init = init;
size_t i;
@@ -4608,7 +4609,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
for (i = 0; i < size; i++) {
p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
if (p[i] && init && (!kasan_init ||
- !kasan_has_integrated_init()))
+ !kasan_has_integrated_init())
+ && !is_kfence_address(p[i]))
memset(p[i], 0, zero_size);
if (gfpflags_allow_spinning(flags))
kmemleak_alloc_recursive(p[i], s->object_size, 1,
@@ -4910,7 +4912,6 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
- bool init = false;

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
@@ -4926,16 +4927,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);

maybe_wipe_obj_freeptr(s, object);
- init = slab_want_init_on_alloc(gfpflags, s);

out:
/*
- * When init equals 'true', like for kzalloc() family, only
- * @orig_size bytes might be zeroed instead of s->object_size
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, lru, gfpflags, 1, &object, init, orig_size);
+ slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);

return object;
}
@@ -5230,7 +5228,6 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
struct slab_sheaf *sheaf)
{
void *ret = NULL;
- bool init;

if (sheaf->size == 0)
goto out;
@@ -5240,10 +5237,8 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
if (likely(!ret))
ret = sheaf->objects[--sheaf->size];

- init = slab_want_init_on_alloc(gfp, s);
-
/* add __GFP_NOFAIL to force successful memcg charging */
- slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size);
+ slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
out:
trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);

@@ -5423,8 +5418,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret,
- slab_want_init_on_alloc(alloc_gfp, s), orig_size);
+ slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);

ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
return ret;
@@ -7339,8 +7333,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,

out:
/* memcg and kmem_cache debug support and memory initialization */
- return likely(slab_post_alloc_hook(s, NULL, flags, size, p,
- slab_want_init_on_alloc(flags, s), s->object_size));
+ return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
}
EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:40:44 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With sheaves, this is no longer part of the allocation fastpath. For
the same reason, also mark the call to it from slab_alloc_node() as
unlikely().

Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>
Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8e5264d3ddbf..7b48c0d38404 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4519,8 +4519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
return object;
}

-static __always_inline void *__slab_alloc_node(struct kmem_cache *s,
- gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
+ unsigned long addr, size_t orig_size)
{
void *object;

@@ -4923,7 +4923,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, node);

- if (!object)
+ if (unlikely(!object))
object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);

maybe_wipe_obj_freeptr(s, object);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:40:51 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Similarly to page allocator's struct alloc_context, introduce a helper
struct to hold a part of the allocation arguments. This will allow
reducing the number of parameters in many functions of the
implementation, and extend them easily if needed.

For now, make it hold the caller address and the originally requested
allocation size.

Convert alloc_single_from_new_slab(), __slab_alloc_node() and
___slab_alloc(). No functional change intended.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 46 +++++++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7b48c0d38404..a3cac7281cc6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -213,6 +213,12 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
static DEFINE_STATIC_KEY_FALSE(strict_numa);
#endif

+/* Structure holding extra parameters for slab allocations */
+struct slab_alloc_context {
+ unsigned long caller_addr;
+ unsigned long orig_size;
+};
+
/* Structure holding parameters for get_from_partial() call chain */
struct partial_context {
gfp_t flags;
@@ -3687,7 +3693,8 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
* and put the slab to the partial (or full) list.
*/
static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
- int orig_size, bool allow_spin)
+ struct slab_alloc_context *ac,
+ bool allow_spin)
{
struct kmem_cache_node *n;
struct slab_obj_iter iter;
@@ -3705,7 +3712,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
/* alloc_debug_processing() always expects a valid freepointer */
set_freepointer(s, object, slab->freelist);

- if (!alloc_debug_processing(s, slab, object, orig_size)) {
+ if (!alloc_debug_processing(s, slab, object, ac->orig_size)) {
/*
* It's not really expected that this would fail on a
* freshly allocated slab, but a concurrent memory
@@ -4443,7 +4450,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
* slab.
*/
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, unsigned int orig_size)
+ struct slab_alloc_context *ac)
{
bool allow_spin = gfpflags_allow_spinning(gfpflags);
void *object;
@@ -4476,7 +4483,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
pc.flags = GFP_NOWAIT | __GFP_THISNODE;
}

- pc.orig_size = orig_size;
+ pc.orig_size = ac->orig_size;
object = get_from_partial(s, node, &pc);
if (object)
goto success;
@@ -4496,7 +4503,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
stat(s, ALLOC_SLAB);

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
- object = alloc_single_from_new_slab(s, slab, orig_size, allow_spin);
+ object = alloc_single_from_new_slab(s, slab, ac, allow_spin);

if (likely(object))
goto success;
@@ -4514,13 +4521,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,

success:
if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
- set_track(s, object, TRACK_ALLOC, addr, gfpflags);
+ set_track(s, object, TRACK_ALLOC, ac->caller_addr, gfpflags);

return object;
}

static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, size_t orig_size)
+ struct slab_alloc_context *ac)
{
void *object;

@@ -4545,7 +4552,7 @@ static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
}
#endif

- object = ___slab_alloc(s, gfpflags, node, addr, orig_size);
+ object = ___slab_alloc(s, gfpflags, node, ac);

return object;
}
@@ -4923,8 +4930,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, node);

- if (unlikely(!object))
- object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
+ if (unlikely(!object)) {
+ struct slab_alloc_context ac = {
+ .caller_addr = addr,
+ .orig_size = orig_size,
+ };
+ object = __slab_alloc_node(s, gfpflags, node, &ac);
+ }

maybe_wipe_obj_freeptr(s, object);

@@ -5389,13 +5401,18 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
if (ret)
goto success;

+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = orig_size,
+ };
+
/*
* Do not call slab_alloc_node(), since trylock mode isn't
* compatible with slab_pre_alloc_hook/should_failslab and
* kfence_alloc. Hence call __slab_alloc_node() (at most twice)
* and slab_post_alloc_hook() directly.
*/
- ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, orig_size);
+ ret = __slab_alloc_node(s, alloc_gfp, node, &ac);

/*
* It's possible we failed due to trylock as we preempted someone with
@@ -7237,10 +7254,13 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
int i;

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ };
for (i = 0; i < size; i++) {

- p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, &ac);
if (unlikely(!p[i]))
goto error;


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:00 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Similarly to the page allocators, introduce slab-allocator specific
alloc flags that internally control allocation behavior in addition to
gfp_flags, without occupying the limited gfp flags space.

Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
page allocator's ALLOC_TRYLOCK and will be used to reimplement
kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
e.g. in early boot with a restricted gfp_allowed_mask.

Also introduce alloc_flags_allow_spinning() to replace the usage of
gfpflags_allow_spinning().

Start using alloc_flags and the new check first in alloc_from_pcs() and
__pcs_replace_empty_main(). This means some slab allocations that were
falsely treated as kmalloc_nolock() due to their gfp flags will now have
higher chances of succeed, and this will further increase with followup
changes.

Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
reach it from a slab allocation that's not _nolock() and yet lacks
__GFP_KSWAPD_RECLAIM for other reasons.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slab.h | 9 +++++++++
mm/slub.c | 17 ++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 1bf9c3021ae3..96f65b625600 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -16,6 +16,15 @@
* Internal slab definitions
*/

+/* slab's alloc_flags definitions */
+#define SLAB_ALLOC_DEFAULT 0x00 /* no flags */
+#define SLAB_ALLOC_TRYLOCK 0x01 /* a kmalloc_nolock() allocation */
+
+static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
+{
+ return !(alloc_flags & SLAB_ALLOC_TRYLOCK);
+}
+
#ifdef CONFIG_64BIT
# ifdef system_has_cmpxchg128
# define system_has_freelist_aba() system_has_cmpxchg128()
diff --git a/mm/slub.c b/mm/slub.c
index a3cac7281cc6..e79fbca11bc0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4638,7 +4638,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
* unlocked.
*/
static struct slub_percpu_sheaves *
-__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, gfp_t gfp)
+__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
+ gfp_t gfp, unsigned int alloc_flags)
{
struct slab_sheaf *empty = NULL;
struct slab_sheaf *full;
@@ -4664,7 +4665,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;
}

- allow_spin = gfpflags_allow_spinning(gfp);
+ allow_spin = alloc_flags_allow_spinning(alloc_flags);

full = barn_replace_empty_sheaf(barn, pcs->main, allow_spin);

@@ -4750,7 +4751,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
}

static __fastpath_inline
-void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
+void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, unsigned int alloc_flags, int node)
{
struct slub_percpu_sheaves *pcs;
bool node_requested;
@@ -4795,7 +4796,7 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node)
pcs = this_cpu_ptr(s->cpu_sheaves);

if (unlikely(pcs->main->size == 0)) {
- pcs = __pcs_replace_empty_main(s, pcs, gfp);
+ pcs = __pcs_replace_empty_main(s, pcs, gfp, alloc_flags);
if (unlikely(!pcs))
return NULL;
}
@@ -4928,7 +4929,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, node);
+ object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);

if (unlikely(!object)) {
struct slab_alloc_context ac = {
@@ -5359,6 +5360,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
{
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
size_t orig_size = size;
+ unsigned int alloc_flags = SLAB_ALLOC_TRYLOCK;
struct kmem_cache *s;
bool can_retry = true;
void *ret;
@@ -5397,7 +5399,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
*/
return NULL;

- ret = alloc_from_pcs(s, alloc_gfp, node);
+ ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
if (ret)
goto success;

@@ -7216,9 +7218,6 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
unsigned int refilled;
struct slab *slab;

- if (WARN_ON_ONCE(!gfpflags_allow_spinning(gfp)))
- return 0;
-
refilled = __refill_objects_node(s, p, gfp, min, max,
get_node(s, local_node),
/* allow_spin = */ true);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:08 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Add alloc_flags as a new field to the slab_alloc_context helper struct,
so we can pass it to more functions in the slab implementation without
adding another function parameter.

Start checking them via alloc_flags_allow_spinning() in
alloc_single_from_new_slab() (where we can drop the allow_spin
parameter) and ___slab_alloc(). This further reduces false-positive
spinning-not-allowed from allocations that are not kmalloc_nolock() but
lack __GFP_RECLAIM flags.

_kmalloc_nolock_noprof() initializes ac.alloc_flags using its flags that
are SLAB_ALLOC_TRYLOCK. slab_alloc_node() and __kmem_cache_alloc_bulk()
are not reachable from kmalloc_nolock() and all their callers expect
spinning to be allowed, so they can use SLAB_ALLOC_DEFAULT. This is
temporary as the scope of slab_alloc_context will further move to the
callers, making the alloc_flags usage more obvious.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index e79fbca11bc0..ef745b37d063 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -217,6 +217,7 @@ static DEFINE_STATIC_KEY_FALSE(strict_numa);
struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
+ unsigned int alloc_flags;
};

/* Structure holding parameters for get_from_partial() call chain */
@@ -3693,9 +3694,9 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
* and put the slab to the partial (or full) list.
*/
static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
- struct slab_alloc_context *ac,
- bool allow_spin)
+ struct slab_alloc_context *ac)
{
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
struct kmem_cache_node *n;
struct slab_obj_iter iter;
bool needs_add_partial;
@@ -4452,7 +4453,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
struct slab_alloc_context *ac)
{
- bool allow_spin = gfpflags_allow_spinning(gfpflags);
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
void *object;
struct slab *slab;
struct partial_context pc;
@@ -4503,7 +4504,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
stat(s, ALLOC_SLAB);

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
- object = alloc_single_from_new_slab(s, slab, ac, allow_spin);
+ object = alloc_single_from_new_slab(s, slab, ac);

if (likely(object))
goto success;
@@ -4919,6 +4920,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
+ const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;

s = slab_pre_alloc_hook(s, gfpflags);
@@ -4929,12 +4931,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
+ object = alloc_from_pcs(s, gfpflags, alloc_flags, node);

if (unlikely(!object)) {
struct slab_alloc_context ac = {
.caller_addr = addr,
.orig_size = orig_size,
+ .alloc_flags = alloc_flags,
};
object = __slab_alloc_node(s, gfpflags, node, &ac);
}
@@ -5406,6 +5409,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
struct slab_alloc_context ac = {
.caller_addr = _RET_IP_,
.orig_size = orig_size,
+ .alloc_flags = alloc_flags,
};

/*
@@ -7256,6 +7260,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
struct slab_alloc_context ac = {
.caller_addr = _RET_IP_,
.orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
};
for (i = 0; i < size; i++) {


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:12 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Refactor get_from_partial_node(), get_from_any_partial(),
get_from_partial() and ___slab_alloc().

Remove struct partial_context, which used to be more substantial but
shrank as part of the sheaves conversion. Instead pass gfp_flags and
pointer to the new slab_alloc_context, which together is a superset of
partial_context.

This means alloc_flags are now available and we can use them to
determine if spinning is allowed, further reducing false positive "not
allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 52 ++++++++++++++++++++++++----------------------------
1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ef745b37d063..98b79e5e7679 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -220,12 +220,6 @@ struct slab_alloc_context {
unsigned int alloc_flags;
};

-/* Structure holding parameters for get_from_partial() call chain */
-struct partial_context {
- gfp_t flags;
- unsigned int orig_size;
-};
-
/* Structure holding parameters for get_partial_node_bulk() */
struct partial_bulk_context {
gfp_t flags;
@@ -3826,7 +3820,8 @@ static bool get_partial_node_bulk(struct kmem_cache *s,
*/
static void *get_from_partial_node(struct kmem_cache *s,
struct kmem_cache_node *n,
- struct partial_context *pc)
+ gfp_t gfp_flags,
+ struct slab_alloc_context *ac)
{
struct slab *slab, *slab2;
unsigned long flags;
@@ -3841,7 +3836,7 @@ static void *get_from_partial_node(struct kmem_cache *s,
if (!n || !n->nr_partial)
return NULL;

- if (gfpflags_allow_spinning(pc->flags))
+ if (alloc_flags_allow_spinning(ac->alloc_flags))
spin_lock_irqsave(&n->list_lock, flags);
else if (!spin_trylock_irqsave(&n->list_lock, flags))
return NULL;
@@ -3849,12 +3844,12 @@ static void *get_from_partial_node(struct kmem_cache *s,

struct freelist_counters old, new;

- if (!pfmemalloc_match(slab, pc->flags))
+ if (!pfmemalloc_match(slab, gfp_flags))
continue;

if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
object = alloc_single_from_partial(s, n, slab,
- pc->orig_size);
+ ac->orig_size);
if (object)
break;
continue;
@@ -3888,15 +3883,16 @@ static void *get_from_partial_node(struct kmem_cache *s,
/*
* Get an object from somewhere. Search in increasing NUMA distances.
*/
-static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *pc)
+static void *get_from_any_partial(struct kmem_cache *s, gfp_t gfp_flags,
+ struct slab_alloc_context *ac)
{
#ifdef CONFIG_NUMA
struct zonelist *zonelist;
struct zoneref *z;
struct zone *zone;
- enum zone_type highest_zoneidx = gfp_zone(pc->flags);
+ enum zone_type highest_zoneidx = gfp_zone(gfp_flags);
unsigned int cpuset_mems_cookie;
- bool allow_spin = gfpflags_allow_spinning(pc->flags);
+ bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);

/*
* The defrag ratio allows a configuration of the tradeoffs between
@@ -3930,16 +3926,17 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
if (allow_spin)
cpuset_mems_cookie = read_mems_allowed_begin();

- zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
+ zonelist = node_zonelist(mempolicy_slab_node(), gfp_flags);
for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
struct kmem_cache_node *n;

n = get_node(s, zone_to_nid(zone));

- if (n && cpuset_zone_allowed(zone, pc->flags) &&
+ if (n && cpuset_zone_allowed(zone, gfp_flags) &&
n->nr_partial > s->min_partial) {

- void *object = get_from_partial_node(s, n, pc);
+ void *object = get_from_partial_node(s, n,
+ gfp_flags, ac);

if (object) {
/*
@@ -3961,8 +3958,8 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
/*
* Get an object from a partial slab
*/
-static void *get_from_partial(struct kmem_cache *s, int node,
- struct partial_context *pc)
+static void *get_from_partial(struct kmem_cache *s, int node, gfp_t flags,
+ struct slab_alloc_context *ac)
{
int searchnode = node;
void *object;
@@ -3970,11 +3967,11 @@ static void *get_from_partial(struct kmem_cache *s, int node,
if (node == NUMA_NO_NODE)
searchnode = numa_mem_id();

- object = get_from_partial_node(s, get_node(s, searchnode), pc);
- if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
+ object = get_from_partial_node(s, get_node(s, searchnode), flags, ac);
+ if (object || (node != NUMA_NO_NODE && (flags & __GFP_THISNODE)))
return object;

- return get_from_any_partial(s, pc);
+ return get_from_any_partial(s, flags, ac);
}

static bool has_pcs_used(int cpu, struct kmem_cache *s)
@@ -4454,16 +4451,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
struct slab_alloc_context *ac)
{
bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
+ gfp_t trynode_flags;
void *object;
struct slab *slab;
- struct partial_context pc;
bool try_thisnode = true;

stat(s, ALLOC_SLOWPATH);

new_objects:

- pc.flags = gfpflags;
+ trynode_flags = gfpflags;
/*
* When a preferred node is indicated but no __GFP_THISNODE
*
@@ -4479,17 +4476,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
&& try_thisnode)) {
if (unlikely(!allow_spin))
/* Do not upgrade gfp to NOWAIT from more restrictive mode */
- pc.flags = gfpflags | __GFP_THISNODE;
+ trynode_flags = gfpflags | __GFP_THISNODE;
else
- pc.flags = GFP_NOWAIT | __GFP_THISNODE;
+ trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
}

- pc.orig_size = ac->orig_size;
- object = get_from_partial(s, node, &pc);
+ object = get_from_partial(s, node, trynode_flags, ac);
if (object)
goto success;

- slab = new_slab(s, pc.flags, node);
+ slab = new_slab(s, trynode_flags, node);

if (unlikely(!slab)) {
if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:20 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Add the alloc_flags parameter to allocate_slab() and new_slab()
so it can be used to determine if spinning is allowed, independently
from gfp flags.

refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
reached from contexts that allow spinning.

Also change how trynode_flags are constructed in ___slab_alloc() to
achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
of a branch. It will now also not upgrade in cases where gfp is weaker
than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
kmalloc_nolock() - which is more correct anyway.

During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
eliminate them, but it's not a big problem that would need a separate
fix.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 98b79e5e7679..8f6ca3d5fdfa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3378,9 +3378,10 @@ static __always_inline void unaccount_slab(struct slab *slab, int order,
}

/* Allocate and initialize a slab without building its freelist. */
-static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
+ unsigned int alloc_flags, int node)
{
- bool allow_spin = gfpflags_allow_spinning(flags);
+ bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
struct slab *slab;
struct kmem_cache_order_objects oo = s->oo;
gfp_t alloc_gfp;
@@ -3438,15 +3439,17 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
return slab;
}

-static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *new_slab(struct kmem_cache *s, gfp_t flags,
+ unsigned int alloc_flags, int node)
{
if (unlikely(flags & GFP_SLAB_BUG_MASK))
flags = kmalloc_fix_flags(flags);

WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));

- return allocate_slab(s,
- flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+ flags &= GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK;
+
+ return allocate_slab(s, flags, alloc_flags, node);
}

static void __free_slab(struct kmem_cache *s, struct slab *slab, bool allow_spin)
@@ -4467,25 +4470,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
* 1) try to get a partial slab from target node only by having
* __GFP_THISNODE in pc.flags for get_from_partial()
* 2) if 1) failed, try to allocate a new slab from target node with
- * GPF_NOWAIT | __GFP_THISNODE opportunistically
+ * (at most) GFP_NOWAIT | __GFP_THISNODE opportunistically
* 3) if 2) failed, retry with original gfpflags which will allow
* get_from_partial() try partial lists of other nodes before
* potentially allocating new page from other nodes
*/
if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
&& try_thisnode)) {
- if (unlikely(!allow_spin))
- /* Do not upgrade gfp to NOWAIT from more restrictive mode */
- trynode_flags = gfpflags | __GFP_THISNODE;
- else
- trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
+ trynode_flags &= GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_ACCOUNT;
+ trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
}

object = get_from_partial(s, node, trynode_flags, ac);
if (object)
goto success;

- slab = new_slab(s, trynode_flags, node);
+ slab = new_slab(s, trynode_flags, ac->alloc_flags, node);

if (unlikely(!slab)) {
if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
@@ -7231,7 +7231,7 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,

new_slab:

- slab = new_slab(s, gfp, local_node);
+ slab = new_slab(s, gfp, SLAB_ALLOC_DEFAULT, local_node);
if (!slab)
goto out;

@@ -7579,7 +7579,7 @@ static void early_kmem_cache_node_alloc(int node)

BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));

- slab = new_slab(kmem_cache_node, GFP_NOWAIT, node);
+ slab = new_slab(kmem_cache_node, GFP_NOWAIT, SLAB_ALLOC_DEFAULT, node);

BUG_ON(!slab);
if (slab_nid(slab) != node) {

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:28 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Convert the whole following call stack to pass either slab_alloc_context
(thus including alloc_flags) or just alloc_flags as necessary:

slab_post_alloc_hook()
alloc_tagging_slab_alloc_hook()
__alloc_tagging_slab_alloc_hook()
prepare_slab_obj_exts_hook()
alloc_slab_obj_exts()
memcg_slab_post_alloc_hook()
__memcg_slab_post_alloc_hook()
alloc_slab_obj_exts()

Converting all these at once avoids unnecessary churn and is mostly
mechanical.

This ultimately allows to decide if spinning is allowed using
alloc_flags in alloc_slab_obj_exts(), as well as slab_post_alloc_hook().
Aside from alloc_from_pcs_bulk() (to be handled next) there is nothing
else in slab itself relying on gfpflags_allow_spinning() which can
be false even if not called from kmalloc_nolock().

A followup change will also use the alloc_flags availability in the call
stack above to remove the __GFP_NO_OBJ_EXT flag.

For alloc_slab_obj_exts(), also replace the suboptimal "bool new_slab"
parameter with a SLAB_ALLOC_NEW_SLAB flag with identical functionality.

To further reduce the number of parameters of slab_post_alloc_hook(),
also make 'struct list_lru *lru' (which is NULL for most callers) a new
field of slab_alloc_context.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/memcontrol.c | 5 +--
mm/slab.h | 6 ++--
mm/slub.c | 94 +++++++++++++++++++++++++++++++++------------------------
3 files changed, 62 insertions(+), 43 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c03d4787d466..29390ba13baa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3424,7 +3424,8 @@ static inline size_t obj_full_size(struct kmem_cache *s)
}

bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p)
+ gfp_t flags, unsigned int slab_alloc_flags,
+ size_t size, void **p)
{
size_t obj_size = obj_full_size(s);
struct obj_cgroup *objcg;
@@ -3472,7 +3473,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
slab = virt_to_slab(p[i]);

if (!slab_obj_exts(slab) &&
- alloc_slab_obj_exts(slab, s, flags, false)) {
+ alloc_slab_obj_exts(slab, s, flags, slab_alloc_flags)) {
continue;
}

diff --git a/mm/slab.h b/mm/slab.h
index 96f65b625600..4db6d8aa0ee3 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -19,6 +19,7 @@
/* slab's alloc_flags definitions */
#define SLAB_ALLOC_DEFAULT 0x00 /* no flags */
#define SLAB_ALLOC_TRYLOCK 0x01 /* a kmalloc_nolock() allocation */
+#define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */

static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
{
@@ -612,7 +613,7 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab);
+ gfp_t gfp, unsigned int alloc_flags);

#else /* CONFIG_SLAB_OBJ_EXT */

@@ -642,7 +643,8 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)

#ifdef CONFIG_MEMCG
bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p);
+ gfp_t flags, unsigned int slab_alloc_flags,
+ size_t size, void **p);
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects, unsigned long obj_exts);
#endif
diff --git a/mm/slub.c b/mm/slub.c
index 8f6ca3d5fdfa..e634137b67fa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -218,6 +218,7 @@ struct slab_alloc_context {
unsigned long caller_addr;
unsigned long orig_size;
unsigned int alloc_flags;
+ struct list_lru *lru;
};

/* Structure holding parameters for get_partial_node_bulk() */
@@ -2155,9 +2156,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
+ gfp_t gfp, unsigned int alloc_flags)
{
- bool allow_spin = gfpflags_allow_spinning(gfp);
+ const bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
unsigned int objects = objs_per_slab(s, slab);
unsigned long new_exts;
unsigned long old_exts;
@@ -2206,7 +2207,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);

- if (new_slab) {
+ if (alloc_flags & SLAB_ALLOC_NEW_SLAB) {
/*
* If the slab is brand new and nobody can yet access its
* obj_exts, no synchronization is required and obj_exts can
@@ -2331,7 +2332,7 @@ static inline void init_slab_obj_exts(struct slab *slab)
}

static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
+ gfp_t gfp, unsigned int alloc_flags)
{
return 0;
}
@@ -2351,10 +2352,10 @@ static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,

static inline unsigned long
prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
- gfp_t flags, void *p)
+ gfp_t flags, unsigned int alloc_flags, void *p)
{
if (!slab_obj_exts(slab) &&
- alloc_slab_obj_exts(slab, s, flags, false)) {
+ alloc_slab_obj_exts(slab, s, flags, alloc_flags)) {
pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
__func__, s->name);
return 0;
@@ -2366,7 +2367,8 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,

/* Should be called only if mem_alloc_profiling_enabled() */
static noinline void
-__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
unsigned long obj_exts;
struct slabobj_ext *obj_ext;
@@ -2382,7 +2384,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
return;

slab = virt_to_slab(object);
- obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, object);
+ obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, alloc_flags, object);
/*
* Currently obj_exts is used only for allocation profiling.
* If other users appear then mem_alloc_profiling_enabled()
@@ -2401,10 +2403,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
}

static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
if (mem_alloc_profiling_enabled())
- __alloc_tagging_slab_alloc_hook(s, object, flags);
+ __alloc_tagging_slab_alloc_hook(s, object, flags, alloc_flags);
}

/* Should be called only if mem_alloc_profiling_enabled() */
@@ -2443,7 +2446,8 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
#else /* CONFIG_MEM_ALLOC_PROFILING */

static inline void
-alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
+alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
+ unsigned int alloc_flags)
{
}

@@ -2461,8 +2465,9 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
static void memcg_alloc_abort_single(struct kmem_cache *s, void *object);

static __fastpath_inline
-bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p)
+bool memcg_slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
+ size_t size, void **p,
+ struct slab_alloc_context *ac)
{
if (likely(!memcg_kmem_online()))
return true;
@@ -2470,7 +2475,8 @@ bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
return true;

- if (likely(__memcg_slab_post_alloc_hook(s, lru, flags, size, p)))
+ if (likely(__memcg_slab_post_alloc_hook(s, ac->lru, flags,
+ ac->alloc_flags, size, p)))
return true;

if (likely(size == 1)) {
@@ -2558,14 +2564,15 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
put_slab_obj_exts(obj_exts);
}

- return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
+ return __memcg_slab_post_alloc_hook(s, NULL, flags, SLAB_ALLOC_DEFAULT,
+ 1, &p);
}

#else /* CONFIG_MEMCG */
static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
- struct list_lru *lru,
- gfp_t flags, size_t size,
- void **p)
+ gfp_t flags,
+ size_t size, void **p,
+ struct slab_alloc_context *ac)
{
return true;
}
@@ -3352,12 +3359,14 @@ static inline void init_freelist_randomization(void) { }
#endif /* CONFIG_SLAB_FREELIST_RANDOM */

static __always_inline void account_slab(struct slab *slab, int order,
- struct kmem_cache *s, gfp_t gfp)
+ struct kmem_cache *s, gfp_t gfp,
+ unsigned int alloc_flags)
{
if (memcg_kmem_online() &&
(s->flags & SLAB_ACCOUNT) &&
!slab_obj_exts(slab))
- alloc_slab_obj_exts(slab, s, gfp, true);
+ alloc_slab_obj_exts(slab, s, gfp,
+ alloc_flags | SLAB_ALLOC_NEW_SLAB);

mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
PAGE_SIZE << order);
@@ -3434,7 +3443,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
* to prevent the array from being overwritten.
*/
alloc_slab_obj_exts_early(s, slab);
- account_slab(slab, oo_order(oo), s, flags);
+ account_slab(slab, oo_order(oo), s, flags, alloc_flags);

return slab;
}
@@ -4568,9 +4577,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
}

static __fastpath_inline
-bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p,
- unsigned int orig_size)
+bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
+ void **p, struct slab_alloc_context *ac)
{
bool init = slab_want_init_on_alloc(flags, s);
unsigned int zero_size = s->object_size;
@@ -4590,7 +4598,7 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
* orig_size if we track it.
*/
if (slub_debug_orig_size(s))
- zero_size = orig_size;
+ zero_size = ac->orig_size;

/*
* When slab_debug is enabled, avoid memory initialization integrated
@@ -4616,14 +4624,14 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
!kasan_has_integrated_init())
&& !is_kfence_address(p[i]))
memset(p[i], 0, zero_size);
- if (gfpflags_allow_spinning(flags))
+ if (alloc_flags_allow_spinning(ac->alloc_flags))
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, init_flags);
kmsan_slab_alloc(s, p[i], init_flags);
- alloc_tagging_slab_alloc_hook(s, p[i], flags);
+ alloc_tagging_slab_alloc_hook(s, p[i], flags, ac->alloc_flags);
}

- return memcg_slab_post_alloc_hook(s, lru, flags, size, p);
+ return memcg_slab_post_alloc_hook(s, flags, size, p, ac);
}

/*
@@ -4918,6 +4926,12 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
{
const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;
+ struct slab_alloc_context ac = {
+ .caller_addr = addr,
+ .orig_size = orig_size,
+ .alloc_flags = alloc_flags,
+ .lru = lru,
+ };

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
@@ -4929,14 +4943,8 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list

object = alloc_from_pcs(s, gfpflags, alloc_flags, node);

- if (unlikely(!object)) {
- struct slab_alloc_context ac = {
- .caller_addr = addr,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- };
+ if (!object)
object = __slab_alloc_node(s, gfpflags, node, &ac);
- }

maybe_wipe_obj_freeptr(s, object);

@@ -4945,7 +4953,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);
+ slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);

return object;
}
@@ -5240,6 +5248,10 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
struct slab_sheaf *sheaf)
{
void *ret = NULL;
+ struct slab_alloc_context ac = {
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (sheaf->size == 0)
goto out;
@@ -5250,7 +5262,7 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
ret = sheaf->objects[--sheaf->size];

/* add __GFP_NOFAIL to force successful memcg charging */
- slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
+ slab_post_alloc_hook(s, gfp | __GFP_NOFAIL, 1, &ret, &ac);
out:
trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);

@@ -5437,7 +5449,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);
+ slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);

ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
return ret;
@@ -7303,6 +7315,10 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
{
unsigned int i = 0;
void *kfence_obj;
+ struct slab_alloc_context ac = {
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (!size)
return false;
@@ -7353,7 +7369,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,

out:
/* memcg and kmem_cache debug support and memory initialization */
- return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
+ return likely(slab_post_alloc_hook(s, flags, size, p, &ac));
}
EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:35 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
The function takes all the parameters that exist as fields in
slab_alloc_context, except alloc_flags. Replace them with a single
pointer.

This moves slab_alloc_context initialization to a number of callers,
which is more verbose, but arguably also more clear than a long list of
parameters, and most do not use the 'lru' field.

This will also allow kmalloc_nolock() to call slab_alloc_node() and
reduce the special open-coding it currently has.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 75 ++++++++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index e634137b67fa..0b9974bfcb24 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4921,30 +4921,23 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
+static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s,
+ gfp_t gfpflags, int node, struct slab_alloc_context *ac)
{
- const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
void *object;
- struct slab_alloc_context ac = {
- .caller_addr = addr,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- .lru = lru,
- };

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
return NULL;

- object = kfence_alloc(s, orig_size, gfpflags);
+ object = kfence_alloc(s, ac->orig_size, gfpflags);
if (unlikely(object))
goto out;

- object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
+ object = alloc_from_pcs(s, gfpflags, ac->alloc_flags, node);

if (!object)
- object = __slab_alloc_node(s, gfpflags, node, &ac);
+ object = __slab_alloc_node(s, gfpflags, node, ac);

maybe_wipe_obj_freeptr(s, object);

@@ -4953,15 +4946,21 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);
+ slab_post_alloc_hook(s, gfpflags, 1, &object, ac);

return object;
}

void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

@@ -4972,8 +4971,15 @@ EXPORT_SYMBOL(kmem_cache_alloc_noprof);
void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
- void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
- s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ .lru = lru,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);

@@ -5005,7 +5011,14 @@ EXPORT_SYMBOL(kmem_cache_charge);
*/
void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = s->object_size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, node, &ac);

trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);

@@ -5335,6 +5348,11 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
{
struct kmem_cache *s;
void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = caller,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = __kmalloc_large_node_noprof(size, flags, node);
@@ -5348,7 +5366,7 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,

s = kmalloc_slab(size, b, flags, token);

- ret = slab_alloc_node(s, NULL, flags, node, caller, size);
+ ret = slab_alloc_node(s, flags, node, &ac);
ret = kasan_kmalloc(s, ret, size, flags);
trace_kmalloc(caller, ret, size, s->size, flags, node);
return ret;
@@ -5467,8 +5485,14 @@ EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);

void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
- _RET_IP_, size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, NUMA_NO_NODE, &ac);

trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);

@@ -5480,7 +5504,14 @@ EXPORT_SYMBOL(__kmalloc_cache_noprof);
void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
+ void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ ret = slab_alloc_node(s, gfpflags, node, &ac);

trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:42 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
The last user of gfpflags_allow_spinning() in slab is
alloc_from_pcs_bulk(), which is only called from
kmem_cache_alloc_bulk().

It turns out that gfpflags_allow_spinning() is not necessary, because
kmem_cache_alloc_bulk() is only expected to be called from context that
does allow spinning, so simply replace it with 'true'.

With that, we can remove the "@flags must allow spinning" part of the
kernel doc, as there is no more connection to the gfp flags in the slab
implementation.

Also remove a comment in alloc_slab_obj_exts() because there should be
no more false positives possible due to gfp_allowed_mask during early
boot.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0b9974bfcb24..ef457e07db83 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2171,12 +2171,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

sz = obj_exts_alloc_size(s, slab, gfp);

- /*
- * Note that allow_spin may be false during early boot and its
- * restricted GFP_BOOT_MASK. Due to kmalloc_nolock() only supporting
- * architectures with cmpxchg16b, early obj_exts will be missing for
- * very early allocations on those.
- */
if (unlikely(!allow_spin))
vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
slab_nid(slab));
@@ -4867,7 +4861,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
}

full = barn_replace_empty_sheaf(barn, pcs->main,
- gfpflags_allow_spinning(gfp));
+ /* allow_spin = */ true);

if (full) {
stat(s, BARN_GET);
@@ -7333,8 +7327,7 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
* Allocate @size objects from @s and places them into @p. @size must be larger
* than 0.
*
- * Interrupts must be enabled when calling this function and @flags must allow
- * spinning.
+ * Interrupts must be enabled when calling this function.
*
* Unlike alloc_pages_bulk(), this function does not check for already allocated
* objects in @p, and thus the caller does not need to zero it.

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:50 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
alloc flag that prevents kmalloc recursion. For that we need a version
of kmalloc() that takes alloc_flags and use it in places that perform
these potentially recursive kmalloc allocations (of sheaves or obj_ext
arrays).

As a preparatory step, make __do_kmalloc_node() take a pointer to
slab_alloc_context. This replaces the 'caller' parameter and includes
alloc_flags which we'll make use of.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slub.c | 47 ++++++++++++++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ef457e07db83..6845e15c148a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5338,19 +5338,14 @@ EXPORT_SYMBOL(__kmalloc_large_node_noprof);

static __always_inline
void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
- unsigned long caller, kmalloc_token_t token)
+ kmalloc_token_t token, struct slab_alloc_context *ac)
{
struct kmem_cache *s;
void *ret;
- struct slab_alloc_context ac = {
- .caller_addr = caller,
- .orig_size = size,
- .alloc_flags = SLAB_ALLOC_DEFAULT,
- };

if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
ret = __kmalloc_large_node_noprof(size, flags, node);
- trace_kmalloc(caller, ret, size,
+ trace_kmalloc(ac->caller_addr, ret, size,
PAGE_SIZE << get_order(size), flags, node);
return ret;
}
@@ -5360,22 +5355,34 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,

s = kmalloc_slab(size, b, flags, token);

- ret = slab_alloc_node(s, flags, node, &ac);
+ ret = slab_alloc_node(s, flags, node, ac);
ret = kasan_kmalloc(s, ret, size, flags);
- trace_kmalloc(caller, ret, size, s->size, flags, node);
+ trace_kmalloc(ac->caller_addr, ret, size, s->size, flags, node);
return ret;
}
void *__kmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags, int node)
{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
- _RET_IP_, PASS_TOKEN_PARAM(token));
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_node_noprof);

void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
{
- return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_,
- PASS_TOKEN_PARAM(token));
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };
+
+ return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE,
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_noprof);

@@ -5471,9 +5478,14 @@ EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);
void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
int node, unsigned long caller)
{
- return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
- caller, PASS_TOKEN_PARAM(token));
+ struct slab_alloc_context ac = {
+ .caller_addr = caller,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

+ return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
+ PASS_TOKEN_PARAM(token), &ac);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);

@@ -6874,6 +6886,11 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
{
bool allow_block;
void *ret;
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_DEFAULT,
+ };

/*
* It doesn't really make sense to fallback to vmalloc for sub page
@@ -6881,7 +6898,7 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
*/
ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
kmalloc_gfp_adjust(flags, size),
- node, _RET_IP_, PASS_TOKEN_PARAM(token));
+ node, PASS_TOKEN_PARAM(token), &ac);
if (ret || size <= PAGE_SIZE)
return ret;


--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:41:58 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
The two flags are added internally so there's no point for warning if
they are passed by the caller as well, so allow them. This will allow
simplifying obj_ext allocation under kmalloc_nolock().

Also it's not necessary to have the extra alloc_gfp variable for adding
the two flags. The original gfp_flags parameter is not used anywhere
except for the warning. So remove alloc_gfp and directly modify and use
gfp_flags everywhere.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
include/linux/slab.h | 3 ++-
mm/slub.c | 19 ++++++++++---------
2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ce1c867dc0ba..b955f3cbb732 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1040,7 +1040,8 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
* kmalloc_nolock - Allocate an object of given size from any context.
* @size: size to allocate
* @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO, __GFP_NO_OBJ_EXT
- * allowed.
+ * allowed. Also __GFP_NOWARN and __GFP_NOMEMALLOC are allowed but added
+ * internally thus not necessary.
* @node: node number of the target node.
*
* Return: pointer to the new object or NULL in case of error.
diff --git a/mm/slub.c b/mm/slub.c
index 6845e15c148a..847cad5203b2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5388,7 +5388,6 @@ EXPORT_SYMBOL(__kmalloc_noprof);

void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
{
- gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
size_t orig_size = size;
unsigned int alloc_flags = SLAB_ALLOC_TRYLOCK;
struct kmem_cache *s;
@@ -5396,7 +5395,9 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
void *ret;

VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
- __GFP_NO_OBJ_EXT));
+ __GFP_NO_OBJ_EXT | __GFP_NOWARN | __GFP_NOMEMALLOC));
+
+ gfp_flags |= __GFP_NOWARN | __GFP_NOMEMALLOC;

if (unlikely(!size))
return ZERO_SIZE_PTR;
@@ -5415,7 +5416,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
retry:
if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
return NULL;
- s = kmalloc_slab(size, NULL, alloc_gfp, PASS_TOKEN_PARAM(token));
+ s = kmalloc_slab(size, NULL, gfp_flags, PASS_TOKEN_PARAM(token));

if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
/*
@@ -5429,7 +5430,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
*/
return NULL;

- ret = alloc_from_pcs(s, alloc_gfp, alloc_flags, node);
+ ret = alloc_from_pcs(s, gfp_flags, alloc_flags, node);
if (ret)
goto success;

@@ -5445,7 +5446,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
* kfence_alloc. Hence call __slab_alloc_node() (at most twice)
* and slab_post_alloc_hook() directly.
*/
- ret = __slab_alloc_node(s, alloc_gfp, node, &ac);
+ ret = __slab_alloc_node(s, gfp_flags, node, &ac);

/*
* It's possible we failed due to trylock as we preempted someone with
@@ -5458,8 +5459,8 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
size = s->object_size + 1;
/*
* Another alternative is to
- * if (memcg) alloc_gfp &= ~__GFP_ACCOUNT;
- * else if (!memcg) alloc_gfp |= __GFP_ACCOUNT;
+ * if (memcg) gfp_flags &= ~__GFP_ACCOUNT;
+ * else if (!memcg) gfp_flags |= __GFP_ACCOUNT;
* to retry from bucket of the same size.
*/
can_retry = false;
@@ -5468,9 +5469,9 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);
+ slab_post_alloc_hook(s, gfp_flags, 1, &ret, &ac);

- ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
+ ret = kasan_kmalloc(s, ret, orig_size, gfp_flags);
return ret;
}
EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:42:05 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
alloc flag that prevents kmalloc recursion. For that we need a version
of kmalloc() that takes alloc_flags and use it in places that perform
these potentially recursive kmalloc allocations (of sheaves or obj_ext
arrays).

Add this function, named kmalloc_flags(). Right now it's only useful for
these nested allocations, so it doesn't need to optimize build-time
constant sizes like kmalloc() or kmalloc_buckets.

Since we need it to support both normal and non-spinning
kmalloc_nolock() context through the SLAB_ALLOC_TRYLOCK flag, split out
most of the special _kmalloc_nolock_noprof() implementation to
__kmalloc_nolock_noprof() that takes a slab_alloc_context, and make
_kmalloc_nolock_noprof() a simple tail calling wrapper with the proper
context.

kmalloc_flags() can thus determine whether to call
__kmalloc_nolock_noprof() or __do_kmalloc_node(), based on the
given alloc_flags.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slab.h | 13 +++++++++++++
mm/slub.c | 56 +++++++++++++++++++++++++++++++++++++++++++-------------
2 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 4db6d8aa0ee3..45bfcfb35a9c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -11,6 +11,7 @@
#include <linux/memcontrol.h>
#include <linux/kfence.h>
#include <linux/kasan.h>
+#include <linux/slab.h>

/*
* Internal slab definitions
@@ -26,6 +27,18 @@ static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
return !(alloc_flags & SLAB_ALLOC_TRYLOCK);
}

+void *__kmalloc_flags_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags,
+ unsigned int alloc_flags, int node)
+ __assume_kmalloc_alignment __alloc_size(1);
+
+static __always_inline __alloc_size(1) void *_kmalloc_flags_noprof(size_t size,
+ gfp_t flags, unsigned int alloc_flags, int node, kmalloc_token_t token)
+{
+ return __kmalloc_flags_noprof(PASS_TOKEN_PARAMS(size, token), flags, alloc_flags, node);
+}
+#define kmalloc_flags_noprof(...) _kmalloc_flags_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
+#define kmalloc_flags(...) alloc_hooks(kmalloc_flags_noprof(__VA_ARGS__))
+
#ifdef CONFIG_64BIT
# ifdef system_has_cmpxchg128
# define system_has_freelist_aba() system_has_cmpxchg128()
diff --git a/mm/slub.c b/mm/slub.c
index 847cad5203b2..cbb38bd01e46 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5386,14 +5386,14 @@ void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
}
EXPORT_SYMBOL(__kmalloc_noprof);

-void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
+static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags,
+ int node, struct slab_alloc_context *ac)
{
- size_t orig_size = size;
- unsigned int alloc_flags = SLAB_ALLOC_TRYLOCK;
struct kmem_cache *s;
bool can_retry = true;
void *ret;

+ VM_WARN_ON_ONCE(alloc_flags_allow_spinning(ac->alloc_flags));
VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
__GFP_NO_OBJ_EXT | __GFP_NOWARN | __GFP_NOMEMALLOC));

@@ -5430,23 +5430,17 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
*/
return NULL;

- ret = alloc_from_pcs(s, gfp_flags, alloc_flags, node);
+ ret = alloc_from_pcs(s, gfp_flags, ac->alloc_flags, node);
if (ret)
goto success;

- struct slab_alloc_context ac = {
- .caller_addr = _RET_IP_,
- .orig_size = orig_size,
- .alloc_flags = alloc_flags,
- };
-
/*
* Do not call slab_alloc_node(), since trylock mode isn't
* compatible with slab_pre_alloc_hook/should_failslab and
* kfence_alloc. Hence call __slab_alloc_node() (at most twice)
* and slab_post_alloc_hook() directly.
*/
- ret = __slab_alloc_node(s, gfp_flags, node, &ac);
+ ret = __slab_alloc_node(s, gfp_flags, node, ac);

/*
* It's possible we failed due to trylock as we preempted someone with
@@ -5469,11 +5463,23 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, gfp_flags, 1, &ret, &ac);
+ slab_post_alloc_hook(s, gfp_flags, 1, &ret, ac);

- ret = kasan_kmalloc(s, ret, orig_size, gfp_flags);
+ ret = kasan_kmalloc(s, ret, ac->orig_size, gfp_flags);
return ret;
}
+
+void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, int node)
+{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = SLAB_ALLOC_TRYLOCK,
+ };
+
+ return __kmalloc_nolock_noprof(PASS_TOKEN_PARAMS(size, token),
+ gfp_flags, node, &ac);
+}
EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);

void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
@@ -5527,6 +5533,30 @@ void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
}
EXPORT_SYMBOL(__kmalloc_cache_node_noprof);

+/*
+ * The only version of kmalloc_node() that takes alloc_flags and thus can
+ * determine on its own whether to handle the allocation via kmalloc_nolock() or
+ * normally
+ */
+void *__kmalloc_flags_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags,
+ unsigned int alloc_flags, int node)
+{
+ struct slab_alloc_context ac = {
+ .caller_addr = _RET_IP_,
+ .orig_size = size,
+ .alloc_flags = alloc_flags,
+ };
+
+ if (alloc_flags_allow_spinning(alloc_flags)) {
+ return __do_kmalloc_node(size, NULL, flags, node,
+ PASS_TOKEN_PARAM(token), &ac);
+ } else {
+ return __kmalloc_nolock_noprof(PASS_TOKEN_PARAMS(size, token),
+ flags, node, &ac);
+ }
+}
+
+
static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
void *head, void *tail, int bulk_cnt,

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:42:13 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
__GFP_NO_OBJ_EXT has limited scope within the slab allocator itself and
gfp flags are a scarce resource, unlike slab's alloc_flags.

Introduce SLAB_ALLOC_NO_RECURSE alloc flag that has the same intent as
__GFP_NO_OBJ_EXT but a more generic name, meaning that a kmalloc()
family function should not recurse into another kmalloc*() for the
purposes of allocating auxiliary structures (obj_ext arrays or sheaves).

First, replace the __GFP_NO_OBJ_EXT for allocating obj_ext arrays in
alloc_slab_obj_exts(). Make use of the newly added kmalloc_flags()
function, where we can pass alloc_flags with SLAB_ALLOC_NO_RECURSE
added. This will also pass through SLAB_ALLOC_TRYLOCK so we don't need
to special case kmalloc_nolock() anymore.

Note that until now the kmalloc_nolock() ignored the incoming gfp flags
and hardcoded __GFP_ZERO | __GFP_NO_OBJ_EXT. But it's correct to pass on
the incoming gfp flags (only augmented with __GFP_ZERO), because if
alloc_flags contain SLAB_ALLOC_TRYLOCK, the incoming gfp flags have to
be also compatible with it.

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/slab.h | 1 +
mm/slub.c | 13 +++++--------
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 45bfcfb35a9c..509f330654b8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,7 @@
#define SLAB_ALLOC_DEFAULT 0x00 /* no flags */
#define SLAB_ALLOC_TRYLOCK 0x01 /* a kmalloc_nolock() allocation */
#define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */
+#define SLAB_ALLOC_NO_RECURSE 0x04 /* prevent kmalloc() recursion */

static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
{
diff --git a/mm/slub.c b/mm/slub.c
index cbb38bd01e46..7dfbd0251aa2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2167,15 +2167,12 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
- gfp |= __GFP_NO_OBJ_EXT;
+ alloc_flags |= SLAB_ALLOC_NO_RECURSE;

sz = obj_exts_alloc_size(s, slab, gfp);

- if (unlikely(!allow_spin))
- vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
- slab_nid(slab));
- else
- vec = kmalloc_node(sz, gfp | __GFP_ZERO, slab_nid(slab));
+ /* This will use kmalloc_nolock() if alloc_flags say so */
+ vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));

if (!vec) {
/*
@@ -2251,7 +2248,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin)
}

/*
- * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+ * obj_exts was created with SLAB_ALLOC_NO_RECURSE flag, therefore its
* corresponding extension will be NULL. alloc_tag_sub() will throw a
* warning if slab has extensions but the extension of an object is
* NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
@@ -2374,7 +2371,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
return;

- if (flags & __GFP_NO_OBJ_EXT)
+ if (alloc_flags & SLAB_ALLOC_NO_RECURSE)
return;

slab = virt_to_slab(object);

--
2.54.0

Vlastimil Babka (SUSE)

unread,
Jun 10, 2026, 11:42:18 AMJun 10
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Vlastimil Babka (SUSE)
Finish the switch away from __GFP_NO_OBJ_EXT by replacing it with
SLAB_ALLOC_NO_RECURSE when allocating empty sheaves. Pass alloc_flags to
[__]alloc_empty_sheaf(). Callers that can't be part of a recursive
kmalloc() chain simply pass SLAB_ALLOC_DEFAULT. Use kmalloc_flags()
instead of kzalloc() for allocating the sheaf.

This leaves __GFP_NO_OBJ_EXT with no users in slab, so stop allowing the
flag in kmalloc_nolock().

Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
include/linux/slab.h | 6 +++---
mm/slub.c | 31 ++++++++++++++++---------------
2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index b955f3cbb732..43c3d9b51107 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1039,9 +1039,9 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
/**
* kmalloc_nolock - Allocate an object of given size from any context.
* @size: size to allocate
- * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO, __GFP_NO_OBJ_EXT
- * allowed. Also __GFP_NOWARN and __GFP_NOMEMALLOC are allowed but added
- * internally thus not necessary.
+ * @gfp_flags: GFP flags. Only __GFP_ACCOUNT and __GFP_ZERO allowed. Also
+ * __GFP_NOWARN and __GFP_NOMEMALLOC are allowed but added internally thus not
+ * necessary.
* @node: node number of the target node.
*
* Return: pointer to the new object or NULL in case of error.
diff --git a/mm/slub.c b/mm/slub.c
index 7dfbd0251aa2..5d7ea72ebebd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2756,7 +2756,7 @@ static inline void *setup_object(struct kmem_cache *s, void *object)
}

static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
- unsigned int capacity)
+ unsigned int alloc_flags, unsigned int capacity)
{
struct slab_sheaf *sheaf;
size_t sheaf_size;
@@ -2767,10 +2767,10 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
* bucket)
*/
if (s->flags & SLAB_KMALLOC)
- gfp |= __GFP_NO_OBJ_EXT;
+ alloc_flags |= SLAB_ALLOC_NO_RECURSE;

sheaf_size = struct_size(sheaf, objects, capacity);
- sheaf = kzalloc(sheaf_size, gfp);
+ sheaf = kmalloc_flags(sheaf_size, gfp | __GFP_ZERO, alloc_flags, NUMA_NO_NODE);

if (unlikely(!sheaf))
return NULL;
@@ -2783,20 +2783,20 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
}

static inline struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s,
- gfp_t gfp)
+ gfp_t gfp, unsigned int alloc_flags)
{
- if (gfp & __GFP_NO_OBJ_EXT)
+ if (alloc_flags & SLAB_ALLOC_NO_RECURSE)
return NULL;

gfp &= ~OBJCGS_CLEAR_MASK;

- return __alloc_empty_sheaf(s, gfp, s->sheaf_capacity);
+ return __alloc_empty_sheaf(s, gfp, alloc_flags, s->sheaf_capacity);
}

static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf)
{
/*
- * If the sheaf was created with __GFP_NO_OBJ_EXT flag then its
+ * If the sheaf was created with SLAB_ALLOC_NO_RECURSE flag then its
* corresponding extension is NULL and alloc_tag_sub() will throw a
* warning, therefore replace NULL with CODETAG_EMPTY to indicate
* that the extension for this sheaf is expected to be NULL.
@@ -4689,7 +4689,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;

if (!empty) {
- empty = alloc_empty_sheaf(s, gfp);
+ empty = alloc_empty_sheaf(s, gfp, alloc_flags);
if (!empty)
return NULL;
}
@@ -5063,7 +5063,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)

if (unlikely(size > s->sheaf_capacity)) {

- sheaf = __alloc_empty_sheaf(s, gfp, size);
+ sheaf = __alloc_empty_sheaf(s, gfp, SLAB_ALLOC_DEFAULT, size);
if (!sheaf)
return NULL;

@@ -5108,7 +5108,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)


if (!sheaf)
- sheaf = alloc_empty_sheaf(s, gfp);
+ sheaf = alloc_empty_sheaf(s, gfp, SLAB_ALLOC_DEFAULT);

if (sheaf) {
sheaf->capacity = s->sheaf_capacity;
@@ -5392,7 +5392,7 @@ static void *__kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_f

VM_WARN_ON_ONCE(alloc_flags_allow_spinning(ac->alloc_flags));
VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
- __GFP_NO_OBJ_EXT | __GFP_NOWARN | __GFP_NOMEMALLOC));
+ __GFP_NOWARN | __GFP_NOMEMALLOC));

gfp_flags |= __GFP_NOWARN | __GFP_NOMEMALLOC;

@@ -5907,7 +5907,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
if (!allow_spin)
return NULL;

- empty = alloc_empty_sheaf(s, GFP_NOWAIT);
+ empty = alloc_empty_sheaf(s, GFP_NOWAIT, SLAB_ALLOC_DEFAULT);
if (empty)
goto got_empty;

@@ -6091,7 +6091,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)

local_unlock(&s->cpu_sheaves->lock);

- empty = alloc_empty_sheaf(s, GFP_NOWAIT);
+ empty = alloc_empty_sheaf(s, GFP_NOWAIT, SLAB_ALLOC_DEFAULT);

if (!empty)
goto fail;
@@ -7636,7 +7636,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
if (!s->sheaf_capacity)
pcs->main = &bootstrap_sheaf;
else
- pcs->main = alloc_empty_sheaf(s, GFP_KERNEL);
+ pcs->main = alloc_empty_sheaf(s, GFP_KERNEL, SLAB_ALLOC_DEFAULT);

if (!pcs->main)
return -ENOMEM;
@@ -8502,7 +8502,8 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)

pcs = per_cpu_ptr(s->cpu_sheaves, cpu);

- pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
+ pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL,
+ SLAB_ALLOC_DEFAULT, capacity);

if (!pcs->main) {
failed = true;

--
2.54.0

Harry Yoo

unread,
Jun 10, 2026, 11:20:03 PMJun 10
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> When init (zeroing) on allocation is requested, for kmalloc() we
> generally have to zero the full object size even if a smaller size is
> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.

Oh, today I learned...

> When we end up allocating a kfence object, kfence perfoms the zeroing on
> its own because has its own redzone beyond the requested size. Thus
> slab_post_alloc_hook() has an 'init' parameter which has to be evaluated
> in all callers (via slab_want_init_on_alloc()) and should be false for
> kfence allocations.

TIL again :D

> For kfence allocations in slab_alloc_node() this is achieved by subtly
> skipping over the slab_want_init_on_alloc() call.

Indeed subtle and I didn't realize this.

> Other callers (i.e.
> kmem_cache_alloc_bulk_noprof()) however evaluate it unconditionally even
> if they do end up with a kfence allocation. This is only subtly not a
> problem, as those are not kmalloc allocations and thus the "requested
> size" equals s->object_size and thus it cannot interfere with kfence's
> redzone.

Right.

> There's just a unnecessary double zeroing (in both kfence and
> slab_post_alloc_hook()), but it's all very fragile and contradicts the
> comment in kfence_guarded_alloc().

Right.

> Remove this subtlety and simplify the code by eliminating the init
> parameter from slab_post_alloc_hook() and make it call
> slab_want_init_on_alloc() itself. Instead add a is_kfence_address()
> check before performing the memset, which will start doing the right
> thing for all callers of slab_post_alloc_hook().

Great, more straightforward!

> This potentially adds overhead of the is_kfence_address() check to
> allocation hotpath, but that one is designed to be as small as possible,
> and it's only evaluated if zeroing is about to happen. This means (aside
> from init_on_alloc hardening) only for __GFP_ZERO allocations, and the
> zeroing itself comes with an overhead likely larger than the added
> check.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/kfence/core.c | 2 +-
> mm/slub.c | 23 ++++++++---------------
> 2 files changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index e2ee8f1aaccf..8e5264d3ddbf 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4565,9 +4565,10 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
>
> static __fastpath_inline
> bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t flags, size_t size, void **p, bool init,
> + gfp_t flags, size_t size, void **p,
> unsigned int orig_size)
> {
> + bool init = slab_want_init_on_alloc(flags, s);
> unsigned int zero_size = s->object_size;
> bool kasan_init = init;
> size_t i;
> @@ -4608,7 +4609,8 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> for (i = 0; i < size; i++) {
> p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
> if (p[i] && init && (!kasan_init ||
> - !kasan_has_integrated_init()))
> + !kasan_has_integrated_init())
> + && !is_kfence_address(p[i]))

I hope we could make it bit more verbose and straightforward,
something like:

diff --git a/mm/slub.c b/mm/slub.c
index 5d7ea72ebebd..29cf4590f9d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4573,7 +4573,6 @@ bool slab_post_alloc_hook(struct kmem_cache *s,
gfp_t flags, size_t size,
{
bool init = slab_want_init_on_alloc(flags, s);
unsigned int zero_size = s->object_size;
- bool kasan_init = init;
size_t i;
gfp_t init_flags = flags & gfp_allowed_mask;

@@ -4591,29 +4590,37 @@ bool slab_post_alloc_hook(struct kmem_cache *s,
gfp_t flags, size_t size,
if (slub_debug_orig_size(s))
zero_size = ac->orig_size;

- /*
- * When slab_debug is enabled, avoid memory initialization integrated
- * into KASAN and instead zero out the memory via the memset below with
- * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
- * cause false-positive reports. This does not lead to a performance
- * penalty on production builds, as slab_debug is not intended to be
- * enabled there.
- */
- if (__slub_debug_enabled())
- kasan_init = false;
-
- /*
- * As memory initialization might be integrated into KASAN,
- * kasan_slab_alloc and initialization memset must be
- * kept together to avoid discrepancies in behavior.
- *
- * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
- */
for (i = 0; i < size; i++) {
- p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
- if (p[i] && init && (!kasan_init ||
- !kasan_has_integrated_init())
- && !is_kfence_address(p[i]))
+ bool skip_init = false;
+
+ if (is_kfence_address(p[i])) {
+ /*
+ * kfence zeroes the object instead of SLUB to avoid
+ * overwriting its own redzone, and zeroing of
+ * s->object_size will corrupt it.
+ */
+ skip_init = true;
+ } else if (__slub_debug_enabled()) {
+ /*
+ * KASAN never zeroes memory when slab_debug is enabled
+ * to avoid overwriting SLUB redzones. This does not
+ * lead to a performance penalty on production builds,
+ * as slab_debug is not intended to be enabled there.
+ */
+ skip_init = false;
+ } else if (kasan_has_integrated_init()) {
+ /*
+ * ARM64 can set memory tags and zero the memory using
+ * a single instruction. Since HW_TAGS KASAN uses that
+ * while tagging the object, a separate zeroing is
+ * unnecessary unless slab_debug is enabled.
+ */
+ skip_init = true;
+ }
+
+ p[i] = kasan_slab_alloc(s, p[i], init_flags, init && skip_init);
+ /* memset and hooks come after KASAN as p[i] might get tagged */
+ if (p[i] && init && !skip_init)
memset(p[i], 0, zero_size);
if (alloc_flags_allow_spinning(ac->alloc_flags))
OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 12:28:24 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, sta...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> When init (zeroing) on allocation is requested, for kmalloc() we
> generally have to zero the full object size even if a smaller size is
> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>
> But if we track the requested size, krealloc() uses that information to
> do the right thing. With red zoning also enabled, any unused size
> became part of the red zone, so it must not be zeroed.
>
> However the check is imprecise, and will trigger also when only
> SLAB_RED_ZONE is enabled without SLAB_STORE_USER. This means enabling
> red zoning alone can compromise krealloc()'s __GFP_ZERO contract.
>
> Fix this by using slub_debug_orig_size() instead, which is the exact
> check for whether the requested size is tracked. We don't need to care
> if red zoning is also enabled or not. Also update and expand the
> comment accordingly.
>
> Fixes: 9ce67395f5a0 ("mm/slub: only zero requested size of buffer for kzalloc when debug enabled")
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>

OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 12:49:58 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> Similarly to page allocator's struct alloc_context, introduce a helper
> struct to hold a part of the allocation arguments. This will allow
> reducing the number of parameters in many functions of the
> implementation, and extend them easily if needed.
>
> For now, make it hold the caller address and the originally requested
> allocation size.
>
> Convert alloc_single_from_new_slab(), __slab_alloc_node() and
> ___slab_alloc(). No functional change intended.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 12:57:43 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> Similarly to the page allocators, introduce slab-allocator specific
> alloc flags that internally control allocation behavior in addition to
> gfp_flags, without occupying the limited gfp flags space.
>
> Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
> page allocator's ALLOC_TRYLOCK and will be used to reimplement
> kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
> gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
> importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
> e.g. in early boot with a restricted gfp_allowed_mask.
>
> Also introduce alloc_flags_allow_spinning() to replace the usage of
> gfpflags_allow_spinning().
>
> Start using alloc_flags and the new check first in alloc_from_pcs() and
> __pcs_replace_empty_main(). This means some slab allocations that were
> falsely treated as kmalloc_nolock() due to their gfp flags will now have
> higher chances of succeed, and this will further increase with followup
> changes.
>
> Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
> reach it from a slab allocation that's not _nolock() and yet lacks
> __GFP_KSWAPD_RECLAIM for other reasons.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 1:07:05 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> Add alloc_flags as a new field to the slab_alloc_context helper struct,
> so we can pass it to more functions in the slab implementation without
> adding another function parameter.
>
> Start checking them via alloc_flags_allow_spinning() in
> alloc_single_from_new_slab() (where we can drop the allow_spin
> parameter) and ___slab_alloc(). This further reduces false-positive
> spinning-not-allowed from allocations that are not kmalloc_nolock() but
> lack __GFP_RECLAIM flags.
>
> _kmalloc_nolock_noprof() initializes ac.alloc_flags using its flags that
> are SLAB_ALLOC_TRYLOCK. slab_alloc_node() and __kmem_cache_alloc_bulk()
> are not reachable from kmalloc_nolock() and all their callers expect
> spinning to be allowed, so they can use SLAB_ALLOC_DEFAULT. This is
> temporary as the scope of slab_alloc_context will further move to the
> callers, making the alloc_flags usage more obvious.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 2:05:39 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> Refactor get_from_partial_node(), get_from_any_partial(),
> get_from_partial() and ___slab_alloc().
>
> Remove struct partial_context, which used to be more substantial but
> shrank as part of the sheaves conversion. Instead pass gfp_flags and
> pointer to the new slab_alloc_context, which together is a superset of
> partial_context.
>
> This means alloc_flags are now available and we can use them to
> determine if spinning is allowed, further reducing false positive "not
> allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Looks good to me,
OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 2:40:55 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> @@ -4664,7 +4665,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
> return NULL;
> }
>
> - allow_spin = gfpflags_allow_spinning(gfp);
> + allow_spin = alloc_flags_allow_spinning(alloc_flags);

Sashiko wrote [1]:
> Does this bypass the caller's gfp constraints for standard allocations?
> Looking at slab_alloc_node(), standard allocations now pass
> SLAB_ALLOC_DEFAULT into alloc_from_pcs():
> - object = alloc_from_pcs(s, gfpflags, node);
> + object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
> This default flag means alloc_flags_allow_spinning() will unconditionally
> return true regardless of the gfp flags provided.

Yes, but that's not used in _nolock path
as mentioned in patch 6 description :)

> If a caller allocating under a raw spinlock intentionally strips
> __GFP_KSWAPD_RECLAIM (for example, by using __GFP_NOWARN) to prevent
> sleeping,

That's a horrible hack (and hypothetical. Nobody should be stripping
__GFP_KSWAP_RECLAIM instead of using kmalloc_nolock(). That's purely
broken).

> won't this allow the allocator to execute spin_lock_irqsave()
> on barn->lock or n->list_lock?
>
> On systems with preempt-rt enabled, a standard spinlock maps to a sleeping
> lock, so taking these locks in an atomic context could cause a scheduling
> while atomic panic.
>
> Since there is no nolock variant available for custom caches, do callers
> currently have any alternative mitigation?

Well, RT kernels are not supposed to allocate meomry under a raw
spinlock (at least w/ allow_spin = true)

[1]
https://sashiko.dev/#/patchset/20260610-slab_alloc_flags-v2-0-7190909db118%40kernel.org
OpenPGP_signature.asc

Harry Yoo

unread,
Jun 11, 2026, 3:52:45 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org


On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> Add the alloc_flags parameter to allocate_slab() and new_slab()
> so it can be used to determine if spinning is allowed, independently
> from gfp flags.
>
> refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
> reached from contexts that allow spinning.
>
> Also change how trynode_flags are constructed in ___slab_alloc() to
> achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
> of a branch. It will now also not upgrade in cases where gfp is weaker
> than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
> kmalloc_nolock() - which is more correct anyway.

Wait, debugobjects intentionally avoids __GFP_KSWAPD_RECLAIM,
but we have been upgrading it to GFP_NOWAIT?

> During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
> Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
> eliminate them, but it's not a big problem that would need a separate
> fix.

Ack.

> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slub.c | 28 ++++++++++++++--------------
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 98b79e5e7679..8f6ca3d5fdfa 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4467,25 +4470,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> * 1) try to get a partial slab from target node only by having
> * __GFP_THISNODE in pc.flags for get_from_partial()
> * 2) if 1) failed, try to allocate a new slab from target node with
> - * GPF_NOWAIT | __GFP_THISNODE opportunistically
> + * (at most) GFP_NOWAIT | __GFP_THISNODE opportunistically
> * 3) if 2) failed, retry with original gfpflags which will allow
> * get_from_partial() try partial lists of other nodes before
> * potentially allocating new page from other nodes
> */
> if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
> && try_thisnode)) {
> - if (unlikely(!allow_spin))
> - /* Do not upgrade gfp to NOWAIT from more restrictive mode */
> - trynode_flags = gfpflags | __GFP_THISNODE;
> - else
> - trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
> + trynode_flags &= GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_ACCOUNT;
> + trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
> }

OpenPGP_signature.asc

Vlastimil Babka (SUSE)

unread,
Jun 11, 2026, 4:35:04 AMJun 11
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
But now we perform this check even if init is false, making it more hot.

> + } else if (__slub_debug_enabled()) {
> + /*
> + * KASAN never zeroes memory when slab_debug is enabled
> + * to avoid overwriting SLUB redzones. This does not
> + * lead to a performance penalty on production builds,
> + * as slab_debug is not intended to be enabled there.
> + */
> + skip_init = false;
> + } else if (kasan_has_integrated_init()) {
> + /*
> + * ARM64 can set memory tags and zero the memory using
> + * a single instruction. Since HW_TAGS KASAN uses that
> + * while tagging the object, a separate zeroing is
> + * unnecessary unless slab_debug is enabled.
> + */

(I like the new/updated comments)

> + skip_init = true;
> + }>

And these two are now done in every loop iteration even though they don't
depend on the object. Yeah it's a static key and build-time constant but still.

But maybe there's some middle ground?

Above the loop do (with your comments).

bool init;

/* ARM64 can ...
* ...
* But KASAN never zeroes ...
*/
if (kasan_has_integrated_init() && !__slub_debug_enabled())
init = false;
else
init = slab_want_init_on_alloc(flags, s);

In the loop:

if (p[i] && init && !is_kfence_address(p[i]))

Vlastimil Babka (SUSE)

unread,
Jun 11, 2026, 4:51:28 AMJun 11
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/11/26 08:40, Harry Yoo wrote:
> Sashiko wrote [1]:
>> Does this bypass the caller's gfp constraints for standard allocations?
>> Looking at slab_alloc_node(), standard allocations now pass
>> SLAB_ALLOC_DEFAULT into alloc_from_pcs():
>> - object = alloc_from_pcs(s, gfpflags, node);
>> + object = alloc_from_pcs(s, gfpflags, SLAB_ALLOC_DEFAULT, node);
>> This default flag means alloc_flags_allow_spinning() will unconditionally
>> return true regardless of the gfp flags provided.
>
> Yes, but that's not used in _nolock path
> as mentioned in patch 6 description :)
>
>> If a caller allocating under a raw spinlock intentionally strips
>> __GFP_KSWAPD_RECLAIM (for example, by using __GFP_NOWARN) to prevent
>> sleeping,
>
> That's a horrible hack (and hypothetical. Nobody should be stripping
> __GFP_KSWAP_RECLAIM instead of using kmalloc_nolock(). That's purely
> broken).

Indeed this was never intended to work, and was just an unfortunate
sideffect of the gfp flag reuse to implement kmalloc_nolock().

>> won't this allow the allocator to execute spin_lock_irqsave()
>> on barn->lock or n->list_lock?
>>
>> On systems with preempt-rt enabled, a standard spinlock maps to a sleeping
>> lock, so taking these locks in an atomic context could cause a scheduling
>> while atomic panic.
>>
>> Since there is no nolock variant available for custom caches, do callers
>> currently have any alternative mitigation?
>
> Well, RT kernels are not supposed to allocate meomry under a raw
> spinlock (at least w/ allow_spin = true)

Yep.

> [1]
> https://sashiko.dev/#/patchset/20260610-slab_alloc_flags-v2-0-7190909db118%40kernel.org
>

Vlastimil Babka (SUSE)

unread,
Jun 11, 2026, 10:47:28 AMJun 11
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
OK, not so simple, we still need the kasan_init variable too.
I've ended up with this, thoughts?

From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
From: "Vlastimil Babka (SUSE)" <vba...@kernel.org>
Date: Wed, 10 Jun 2026 17:40:04 +0200
Subject: [PATCH] mm/slab: do not init any kfence objects on allocation

When init (zeroing) on allocation is requested, for kmalloc() we
generally have to zero the full object size even if a smaller size is
requested, in order to provide krealloc()'s __GFP_ZERO guarantees.

When we end up allocating a kfence object, kfence perfoms the zeroing on
its own because has its own redzone beyond the requested size. Thus
slab_post_alloc_hook() has an 'init' parameter which has to be evaluated
in all callers (via slab_want_init_on_alloc()) and should be false for
kfence allocations.

For kfence allocations in slab_alloc_node() this is achieved by subtly
skipping over the slab_want_init_on_alloc() call. Other callers (i.e.
kmem_cache_alloc_bulk_noprof()) however evaluate it unconditionally even
if they do end up with a kfence allocation. This is only subtly not a
problem, as those are not kmalloc allocations and thus the "requested
size" equals s->object_size and thus it cannot interfere with kfence's
redzone. There's just a unnecessary double zeroing (in both kfence and
slab_post_alloc_hook()), but it's all very fragile and contradicts the
comment in kfence_guarded_alloc().

Remove this subtlety and simplify the code by eliminating the init
parameter from slab_post_alloc_hook() and make it call
slab_want_init_on_alloc() itself. Instead add a is_kfence_address()
check before performing the memset, which will start doing the right
thing for all callers of slab_post_alloc_hook().

This potentially adds overhead of the is_kfence_address() check to
allocation hotpath, but that one is designed to be as small as possible,
and it's only evaluated if zeroing is about to happen. This means (aside
from init_on_alloc hardening) only for __GFP_ZERO allocations, and the
zeroing itself comes with an overhead likely larger than the added
check.

While at it, refactor the handling of evaluating when KASAN does the
init instead of SLUB, with no intended functional changes. A
non-functional change is that we don't pass kasan_init as true to
kasan_slab_alloc() if kasan has no integrated init, but then the value
is ignored anyway, so it's theoretically more correct.

Thanks to Harry Yoo for the initial refactoring attempt, and for updated
comments that are used here.

Link: https://patch.msgid.link/20260610-slab_alloc_f...@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
---
mm/kfence/core.c | 2 +-
mm/slub.c | 60 ++++++++++++++++++++++--------------------------
2 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 655dc5ce3240..5e0b406924e9 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -500,7 +500,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g

/*
* We check slab_want_init_on_alloc() ourselves, rather than letting
- * SL*B do the initialization, as otherwise we might overwrite KFENCE's
+ * slab do the initialization, as otherwise it might overwrite KFENCE's
* redzone.
*/
if (unlikely(slab_want_init_on_alloc(gfp, cache)))
diff --git a/mm/slub.c b/mm/slub.c
index e2ee8f1aaccf..d762cbe5d040 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4565,13 +4565,13 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)

static __fastpath_inline
bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
- gfp_t flags, size_t size, void **p, bool init,
+ gfp_t flags, size_t size, void **p,
unsigned int orig_size)
{
+ bool init = slab_want_init_on_alloc(flags, s);
unsigned int zero_size = s->object_size;
- bool kasan_init = init;
- size_t i;
gfp_t init_flags = flags & gfp_allowed_mask;
+ bool kasan_init = false;

/*
* For kmalloc object, the allocated size (object_size) can be larger
@@ -4588,28 +4588,33 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
zero_size = orig_size;

/*
- * When slab_debug is enabled, avoid memory initialization integrated
- * into KASAN and instead zero out the memory via the memset below with
- * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
- * cause false-positive reports. This does not lead to a performance
+ * ARM64 can set memory tags and zero the memory using a single
+ * instruction. Since HW_TAGS KASAN uses that while tagging the object,
+ * separate zeroing is unnecessary.
+ *
+ * However, KASAN never zeroes memory when slab_debug is enabled to
+ * avoid overwriting SLUB redzones. This does not lead to a performance
* penalty on production builds, as slab_debug is not intended to be
* enabled there.
*/
- if (__slub_debug_enabled())
- kasan_init = false;
+ if (kasan_has_integrated_init() && !__slub_debug_enabled()) {
+ kasan_init = init;
+ init = false;
+ }

- /*
- * As memory initialization might be integrated into KASAN,
- * kasan_slab_alloc and initialization memset must be
- * kept together to avoid discrepancies in behavior.
- *
- * As p[i] might get tagged, memset and kmemleak hook come after KASAN.
- */
- for (i = 0; i < size; i++) {
+ for (size_t i = 0; i < size; i++) {
p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init);
- if (p[i] && init && (!kasan_init ||
- !kasan_has_integrated_init()))
+
+ /*
+ * memset and hooks come after KASAN as p[i] might get tagged
+ *
+ * kfence zeroes the object instead of SLUB to avoid overwriting
+ * its own redzone starting at orig_size, which could happen
+ * with SLUB zeroing full s->object_size
+ */
+ if (init && p[i] && !is_kfence_address(p[i]))
memset(p[i], 0, zero_size);
+
if (gfpflags_allow_spinning(flags))
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, init_flags);
@@ -4910,7 +4915,6 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
- bool init = false;

s = slab_pre_alloc_hook(s, gfpflags);
if (unlikely(!s))
@@ -4926,16 +4930,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);

maybe_wipe_obj_freeptr(s, object);
- init = slab_want_init_on_alloc(gfpflags, s);

out:
/*
- * When init equals 'true', like for kzalloc() family, only
- * @orig_size bytes might be zeroed instead of s->object_size
* In case this fails due to memcg_slab_post_alloc_hook(),
* object is set to NULL
*/
- slab_post_alloc_hook(s, lru, gfpflags, 1, &object, init, orig_size);
+ slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);

return object;
}
@@ -5230,7 +5231,6 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
struct slab_sheaf *sheaf)
{
void *ret = NULL;
- bool init;

if (sheaf->size == 0)
goto out;
@@ -5240,10 +5240,8 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
if (likely(!ret))
ret = sheaf->objects[--sheaf->size];

- init = slab_want_init_on_alloc(gfp, s);
-
/* add __GFP_NOFAIL to force successful memcg charging */
- slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size);
+ slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
out:
trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);

@@ -5423,8 +5421,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in

success:
maybe_wipe_obj_freeptr(s, ret);
- slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret,
- slab_want_init_on_alloc(alloc_gfp, s), orig_size);
+ slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);

ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
return ret;
@@ -7339,8 +7336,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,

Harry Yoo

unread,
Jun 11, 2026, 11:11:17 AMJun 11
to Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Ouch, right.

> I've ended up with this, thoughts?

Much better!

> From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
> From: "Vlastimil Babka (SUSE)" <vba...@kernel.org>
> Date: Wed, 10 Jun 2026 17:40:04 +0200
> Subject: [PATCH] mm/slab: do not init any kfence objects on allocation
>
> When init (zeroing) on allocation is requested, for kmalloc() we
> generally have to zero the full object size even if a smaller size is
> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>
> When we end up allocating a kfence object, kfence perfoms the zeroing on

nit: perfoms -> performs
Right.

> Thanks to Harry Yoo for the initial refactoring attempt, and for updated
> comments that are used here.

No problem ;)

> Link: https://patch.msgid.link/20260610-slab_alloc_f...@kernel.org
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Looks good to me,
Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>

Thanks!

Vlastimil Babka (SUSE)

unread,
Jun 11, 2026, 12:29:03 PMJun 11
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/10/26 17:40, Vlastimil Babka (SUSE) wrote:
> __GFP_NO_OBJ_EXT has limited scope within the slab allocator itself and
> gfp flags are a scarce resource, unlike slab's alloc_flags.
>
> Introduce SLAB_ALLOC_NO_RECURSE alloc flag that has the same intent as
> __GFP_NO_OBJ_EXT but a more generic name, meaning that a kmalloc()
> family function should not recurse into another kmalloc*() for the
> purposes of allocating auxiliary structures (obj_ext arrays or sheaves).
>
> First, replace the __GFP_NO_OBJ_EXT for allocating obj_ext arrays in
> alloc_slab_obj_exts(). Make use of the newly added kmalloc_flags()
> function, where we can pass alloc_flags with SLAB_ALLOC_NO_RECURSE
> added. This will also pass through SLAB_ALLOC_TRYLOCK so we don't need
> to special case kmalloc_nolock() anymore.
>
> Note that until now the kmalloc_nolock() ignored the incoming gfp flags
> and hardcoded __GFP_ZERO | __GFP_NO_OBJ_EXT. But it's correct to pass on
> the incoming gfp flags (only augmented with __GFP_ZERO), because if
> alloc_flags contain SLAB_ALLOC_TRYLOCK, the incoming gfp flags have to
> be also compatible with it.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>

As pointed out by Sashiko, this piecemeal approach creates a bisection
hazard where sheaves -> obj_ext -> sheaves -> ... recursion can happen.
So I'll changes this as follows to make obj_ext accept and pass both the
gfp and alloc_flags preventing recursion, and change the next patch
to revert that temporary change again.

diff --git a/mm/slub.c b/mm/slub.c
index a81f1f6bad67..c60f3a252ae5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2167,6 +2167,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
+ gfp |= __GFP_NO_OBJ_EXT;
alloc_flags |= SLAB_ALLOC_NO_RECURSE;

sz = obj_exts_alloc_size(s, slab, gfp);
@@ -2371,7 +2372,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
return;

- if (alloc_flags & SLAB_ALLOC_NO_RECURSE)
+ if (alloc_flags & SLAB_ALLOC_NO_RECURSE || flags & __GFP_NO_OBJ_EXT)
return;

slab = virt_to_slab(object);

Vlastimil Babka (SUSE)

unread,
Jun 11, 2026, 12:37:24 PMJun 11
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/11/26 17:11, Harry Yoo wrote:
>
>> From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
>> From: "Vlastimil Babka (SUSE)" <vba...@kernel.org>
>> Date: Wed, 10 Jun 2026 17:40:04 +0200
>> Subject: [PATCH] mm/slab: do not init any kfence objects on allocation
>>
>> When init (zeroing) on allocation is requested, for kmalloc() we
>> generally have to zero the full object size even if a smaller size is
>> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>>
>> When we end up allocating a kfence object, kfence perfoms the zeroing on
>
> nit: perfoms -> performs

Fixed.
> Thanks!
>

Hao Li

unread,
Jun 11, 2026, 11:10:13 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:06PM +0200, Vlastimil Babka (SUSE) wrote:
> Similarly to page allocator's struct alloc_context, introduce a helper
> struct to hold a part of the allocation arguments. This will allow
> reducing the number of parameters in many functions of the
> implementation, and extend them easily if needed.
>
> For now, make it hold the caller address and the originally requested
> allocation size.
>
> Convert alloc_single_from_new_slab(), __slab_alloc_node() and
> ___slab_alloc(). No functional change intended.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slub.c | 46 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7b48c0d38404..a3cac7281cc6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -213,6 +213,12 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
> static DEFINE_STATIC_KEY_FALSE(strict_numa);
> #endif
>
> +/* Structure holding extra parameters for slab allocations */
> +struct slab_alloc_context {
> + unsigned long caller_addr;
> + unsigned long orig_size;
> +};
> +
> /* Structure holding parameters for get_from_partial() call chain */
> struct partial_context {
> gfp_t flags;
> @@ -3687,7 +3693,8 @@ static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
> * and put the slab to the partial (or full) list.
> */
> static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> - int orig_size, bool allow_spin)
> + struct slab_alloc_context *ac,
> + bool allow_spin)
> {
> struct kmem_cache_node *n;
> struct slab_obj_iter iter;
> @@ -3705,7 +3712,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> /* alloc_debug_processing() always expects a valid freepointer */
> set_freepointer(s, object, slab->freelist);
>
> - if (!alloc_debug_processing(s, slab, object, orig_size)) {
> + if (!alloc_debug_processing(s, slab, object, ac->orig_size)) {
> /*
> * It's not really expected that this would fail on a
> * freshly allocated slab, but a concurrent memory
> @@ -4443,7 +4450,7 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
> * slab.
> */
> static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> - unsigned long addr, unsigned int orig_size)
> + struct slab_alloc_context *ac)
> {
> bool allow_spin = gfpflags_allow_spinning(gfpflags);
> void *object;
> @@ -4476,7 +4483,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> pc.flags = GFP_NOWAIT | __GFP_THISNODE;
> }
>
> - pc.orig_size = orig_size;
> + pc.orig_size = ac->orig_size;
> object = get_from_partial(s, node, &pc);
> if (object)
> goto success;
> @@ -4496,7 +4503,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> stat(s, ALLOC_SLAB);
>
> if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> - object = alloc_single_from_new_slab(s, slab, orig_size, allow_spin);
> + object = alloc_single_from_new_slab(s, slab, ac, allow_spin);
>
> if (likely(object))
> goto success;
> @@ -4514,13 +4521,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>
> success:
> if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> - set_track(s, object, TRACK_ALLOC, addr, gfpflags);
> + set_track(s, object, TRACK_ALLOC, ac->caller_addr, gfpflags);
>
> return object;
> }
>
> static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
> - unsigned long addr, size_t orig_size)
> + struct slab_alloc_context *ac)
> {
> void *object;
>
> @@ -4545,7 +4552,7 @@ static void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
> }
> #endif
>
> - object = ___slab_alloc(s, gfpflags, node, addr, orig_size);
> + object = ___slab_alloc(s, gfpflags, node, ac);
>
> return object;
> }
> @@ -4923,8 +4930,13 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
>
> object = alloc_from_pcs(s, gfpflags, node);
>
> - if (unlikely(!object))
> - object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
> + if (unlikely(!object)) {
> + struct slab_alloc_context ac = {
> + .caller_addr = addr,
> + .orig_size = orig_size,
> + };
> + object = __slab_alloc_node(s, gfpflags, node, &ac);
> + }
>
> maybe_wipe_obj_freeptr(s, object);
>
> @@ -5389,13 +5401,18 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
> if (ret)
> goto success;
>
> + struct slab_alloc_context ac = {
> + .caller_addr = _RET_IP_,
> + .orig_size = orig_size,
> + };

It might be better to move this to the beginning of the function, to avoid
patch09 jump to `success` before ac is initialized.

> +
> /*
> * Do not call slab_alloc_node(), since trylock mode isn't
> * compatible with slab_pre_alloc_hook/should_failslab and
> * kfence_alloc. Hence call __slab_alloc_node() (at most twice)
> * and slab_post_alloc_hook() directly.
> */
> - ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, orig_size);
> + ret = __slab_alloc_node(s, alloc_gfp, node, &ac);
>
> /*
> * It's possible we failed due to trylock as we preempted someone with
> @@ -7237,10 +7254,13 @@ static bool __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
> int i;
>
> if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> + struct slab_alloc_context ac = {
> + .caller_addr = _RET_IP_,
> + .orig_size = s->object_size,
> + };
> for (i = 0; i < size; i++) {
>
> - p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, _RET_IP_,
> - s->object_size);
> + p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, &ac);
> if (unlikely(!p[i]))
> goto error;
>
>
> --
> 2.54.0
>

Hao Li

unread,
Jun 11, 2026, 11:21:19 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
we can remove the `gfp` arg as this function no longer use it.

Hao Li

unread,
Jun 11, 2026, 11:47:27 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, sta...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:03PM +0200, Vlastimil Babka (SUSE) wrote:
> When init (zeroing) on allocation is requested, for kmalloc() we
> generally have to zero the full object size even if a smaller size is
> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>
> But if we track the requested size, krealloc() uses that information to
> do the right thing. With red zoning also enabled, any unused size
> became part of the red zone, so it must not be zeroed.
>
> However the check is imprecise, and will trigger also when only
> SLAB_RED_ZONE is enabled without SLAB_STORE_USER. This means enabling
> red zoning alone can compromise krealloc()'s __GFP_ZERO contract.
>
> Fix this by using slub_debug_orig_size() instead, which is the exact
> check for whether the requested size is tracked. We don't need to care
> if red zoning is also enabled or not. Also update and expand the
> comment accordingly.
>
> Fixes: 9ce67395f5a0 ("mm/slub: only zero requested size of buffer for kzalloc when debug enabled")
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Reviewed-by: Hao Li <hao...@linux.dev>

--
Thanks,
Hao

Hao Li

unread,
Jun 11, 2026, 11:48:18 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:05PM +0200, Vlastimil Babka (SUSE) wrote:
> With sheaves, this is no longer part of the allocation fastpath. For
> the same reason, also mark the call to it from slab_alloc_node() as
> unlikely().
>
> Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>

Hao Li

unread,
Jun 11, 2026, 11:49:52 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:07PM +0200, Vlastimil Babka (SUSE) wrote:
> Similarly to the page allocators, introduce slab-allocator specific
> alloc flags that internally control allocation behavior in addition to
> gfp_flags, without occupying the limited gfp flags space.
>
> Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
> page allocator's ALLOC_TRYLOCK and will be used to reimplement
> kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
> gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
> importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
> e.g. in early boot with a restricted gfp_allowed_mask.
>
> Also introduce alloc_flags_allow_spinning() to replace the usage of
> gfpflags_allow_spinning().
>
> Start using alloc_flags and the new check first in alloc_from_pcs() and
> __pcs_replace_empty_main(). This means some slab allocations that were
> falsely treated as kmalloc_nolock() due to their gfp flags will now have
> higher chances of succeed, and this will further increase with followup
> changes.
>
> Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
> reach it from a slab allocation that's not _nolock() and yet lacks
> __GFP_KSWAPD_RECLAIM for other reasons.
>

Hao Li

unread,
Jun 11, 2026, 11:51:04 PMJun 11
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:08PM +0200, Vlastimil Babka (SUSE) wrote:
> Add alloc_flags as a new field to the slab_alloc_context helper struct,
> so we can pass it to more functions in the slab implementation without
> adding another function parameter.
>
> Start checking them via alloc_flags_allow_spinning() in
> alloc_single_from_new_slab() (where we can drop the allow_spin
> parameter) and ___slab_alloc(). This further reduces false-positive
> spinning-not-allowed from allocations that are not kmalloc_nolock() but
> lack __GFP_RECLAIM flags.
>
> _kmalloc_nolock_noprof() initializes ac.alloc_flags using its flags that
> are SLAB_ALLOC_TRYLOCK. slab_alloc_node() and __kmem_cache_alloc_bulk()
> are not reachable from kmalloc_nolock() and all their callers expect
> spinning to be allowed, so they can use SLAB_ALLOC_DEFAULT. This is
> temporary as the scope of slab_alloc_context will further move to the
> callers, making the alloc_flags usage more obvious.
>

Hao Li

unread,
Jun 12, 2026, 12:05:08 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:09PM +0200, Vlastimil Babka (SUSE) wrote:
> Refactor get_from_partial_node(), get_from_any_partial(),
> get_from_partial() and ___slab_alloc().
>
> Remove struct partial_context, which used to be more substantial but
> shrank as part of the sheaves conversion. Instead pass gfp_flags and
> pointer to the new slab_alloc_context, which together is a superset of
> partial_context.
>
> This means alloc_flags are now available and we can use them to
> determine if spinning is allowed, further reducing false positive "not
> allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slub.c | 52 ++++++++++++++++++++++++----------------------------
> 1 file changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index ef745b37d063..98b79e5e7679 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -220,12 +220,6 @@ struct slab_alloc_context {
> unsigned int alloc_flags;
> };
>
> -/* Structure holding parameters for get_from_partial() call chain */
> -struct partial_context {
> - gfp_t flags;
> - unsigned int orig_size;
> -};
> -
> /* Structure holding parameters for get_partial_node_bulk() */
> struct partial_bulk_context {
> gfp_t flags;
> @@ -3826,7 +3820,8 @@ static bool get_partial_node_bulk(struct kmem_cache *s,
> */
> static void *get_from_partial_node(struct kmem_cache *s,
> struct kmem_cache_node *n,
> - struct partial_context *pc)
> + gfp_t gfp_flags,
> + struct slab_alloc_context *ac)
> {
> struct slab *slab, *slab2;
> unsigned long flags;
> @@ -3841,7 +3836,7 @@ static void *get_from_partial_node(struct kmem_cache *s,
> if (!n || !n->nr_partial)
> return NULL;
>
> - if (gfpflags_allow_spinning(pc->flags))
> + if (alloc_flags_allow_spinning(ac->alloc_flags))
> spin_lock_irqsave(&n->list_lock, flags);
> else if (!spin_trylock_irqsave(&n->list_lock, flags))
> return NULL;
> @@ -3849,12 +3844,12 @@ static void *get_from_partial_node(struct kmem_cache *s,
>
> struct freelist_counters old, new;
>
> - if (!pfmemalloc_match(slab, pc->flags))
> + if (!pfmemalloc_match(slab, gfp_flags))
> continue;
>
> if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> object = alloc_single_from_partial(s, n, slab,
> - pc->orig_size);
> + ac->orig_size);
> if (object)
> break;
> continue;
> @@ -3888,15 +3883,16 @@ static void *get_from_partial_node(struct kmem_cache *s,
> /*
> * Get an object from somewhere. Search in increasing NUMA distances.
> */
> -static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *pc)
> +static void *get_from_any_partial(struct kmem_cache *s, gfp_t gfp_flags,
> + struct slab_alloc_context *ac)
> {
> #ifdef CONFIG_NUMA
> struct zonelist *zonelist;
> struct zoneref *z;
> struct zone *zone;
> - enum zone_type highest_zoneidx = gfp_zone(pc->flags);
> + enum zone_type highest_zoneidx = gfp_zone(gfp_flags);
> unsigned int cpuset_mems_cookie;
> - bool allow_spin = gfpflags_allow_spinning(pc->flags);
> + bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
>
> /*
> * The defrag ratio allows a configuration of the tradeoffs between
> @@ -3930,16 +3926,17 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
> if (allow_spin)
> cpuset_mems_cookie = read_mems_allowed_begin();
>
> - zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
> + zonelist = node_zonelist(mempolicy_slab_node(), gfp_flags);
> for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
> struct kmem_cache_node *n;
>
> n = get_node(s, zone_to_nid(zone));
>
> - if (n && cpuset_zone_allowed(zone, pc->flags) &&
> + if (n && cpuset_zone_allowed(zone, gfp_flags) &&
> n->nr_partial > s->min_partial) {
>
> - void *object = get_from_partial_node(s, n, pc);
> + void *object = get_from_partial_node(s, n,
> + gfp_flags, ac);
>
> if (object) {
> /*
> @@ -3961,8 +3958,8 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
> /*
> * Get an object from a partial slab
> */
> -static void *get_from_partial(struct kmem_cache *s, int node,
> - struct partial_context *pc)
> +static void *get_from_partial(struct kmem_cache *s, int node, gfp_t flags,
> + struct slab_alloc_context *ac)
> {
> int searchnode = node;
> void *object;
> @@ -3970,11 +3967,11 @@ static void *get_from_partial(struct kmem_cache *s, int node,
> if (node == NUMA_NO_NODE)
> searchnode = numa_mem_id();
>
> - object = get_from_partial_node(s, get_node(s, searchnode), pc);
> - if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
> + object = get_from_partial_node(s, get_node(s, searchnode), flags, ac);
> + if (object || (node != NUMA_NO_NODE && (flags & __GFP_THISNODE)))
> return object;
>
> - return get_from_any_partial(s, pc);
> + return get_from_any_partial(s, flags, ac);
> }
>
> static bool has_pcs_used(int cpu, struct kmem_cache *s)
> @@ -4454,16 +4451,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> struct slab_alloc_context *ac)
> {
> bool allow_spin = alloc_flags_allow_spinning(ac->alloc_flags);
> + gfp_t trynode_flags;
> void *object;
> struct slab *slab;
> - struct partial_context pc;
> bool try_thisnode = true;
>
> stat(s, ALLOC_SLOWPATH);
>
> new_objects:
>
> - pc.flags = gfpflags;
> + trynode_flags = gfpflags;
> /*
> * When a preferred node is indicated but no __GFP_THISNODE
> *
> @@ -4479,17 +4476,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> && try_thisnode)) {
> if (unlikely(!allow_spin))
> /* Do not upgrade gfp to NOWAIT from more restrictive mode */
> - pc.flags = gfpflags | __GFP_THISNODE;
> + trynode_flags = gfpflags | __GFP_THISNODE;
> else
> - pc.flags = GFP_NOWAIT | __GFP_THISNODE;
> + trynode_flags = GFP_NOWAIT | __GFP_THISNODE;

nit: the comment "__GFP_THISNODE in pc.flags" also needs to be updated to "trynode_flags"

otherwise, looks good to me.

Hao Li

unread,
Jun 12, 2026, 1:27:09 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:10PM +0200, Vlastimil Babka (SUSE) wrote:
> Add the alloc_flags parameter to allocate_slab() and new_slab()
> so it can be used to determine if spinning is allowed, independently
> from gfp flags.
>
> refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
> reached from contexts that allow spinning.
>
> Also change how trynode_flags are constructed in ___slab_alloc() to
> achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
> of a branch. It will now also not upgrade in cases where gfp is weaker
> than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
> kmalloc_nolock() - which is more correct anyway.
>
> During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
> Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
> eliminate them, but it's not a big problem that would need a separate
> fix.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slub.c | 28 ++++++++++++++--------------
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 98b79e5e7679..8f6ca3d5fdfa 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3378,9 +3378,10 @@ static __always_inline void unaccount_slab(struct slab *slab, int order,
> }
>
> /* Allocate and initialize a slab without building its freelist. */
> -static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> +static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
> + unsigned int alloc_flags, int node)
> {
> - bool allow_spin = gfpflags_allow_spinning(flags);
> + bool allow_spin = alloc_flags_allow_spinning(alloc_flags);

nit: allow_spin doesn't depend on `flags` now, so it seems we can delete the
comments:

/*
* __GFP_RECLAIM could be cleared on the first allocation attempt,
* so pass allow_spin flag directly.
*/

Otherwise, looks good to me.

Hao Li

unread,
Jun 12, 2026, 1:29:00 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:12PM +0200, Vlastimil Babka (SUSE) wrote:
> The function takes all the parameters that exist as fields in
> slab_alloc_context, except alloc_flags. Replace them with a single
> pointer.
>
> This moves slab_alloc_context initialization to a number of callers,
> which is more verbose, but arguably also more clear than a long list of
> parameters, and most do not use the 'lru' field.
>
> This will also allow kmalloc_nolock() to call slab_alloc_node() and
> reduce the special open-coding it currently has.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Hao Li

unread,
Jun 12, 2026, 1:35:15 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:14PM +0200, Vlastimil Babka (SUSE) wrote:
> With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
> alloc flag that prevents kmalloc recursion. For that we need a version
> of kmalloc() that takes alloc_flags and use it in places that perform
> these potentially recursive kmalloc allocations (of sheaves or obj_ext
> arrays).
>
> As a preparatory step, make __do_kmalloc_node() take a pointer to
> slab_alloc_context. This replaces the 'caller' parameter and includes
> alloc_flags which we'll make use of.

Hao Li

unread,
Jun 12, 2026, 2:55:16 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:17PM +0200, Vlastimil Babka (SUSE) wrote:
> __GFP_NO_OBJ_EXT has limited scope within the slab allocator itself and
> gfp flags are a scarce resource, unlike slab's alloc_flags.
>
> Introduce SLAB_ALLOC_NO_RECURSE alloc flag that has the same intent as
> __GFP_NO_OBJ_EXT but a more generic name, meaning that a kmalloc()
> family function should not recurse into another kmalloc*() for the
> purposes of allocating auxiliary structures (obj_ext arrays or sheaves).
>
> First, replace the __GFP_NO_OBJ_EXT for allocating obj_ext arrays in
> alloc_slab_obj_exts(). Make use of the newly added kmalloc_flags()
> function, where we can pass alloc_flags with SLAB_ALLOC_NO_RECURSE
> added. This will also pass through SLAB_ALLOC_TRYLOCK so we don't need
> to special case kmalloc_nolock() anymore.
>
> Note that until now the kmalloc_nolock() ignored the incoming gfp flags
> and hardcoded __GFP_ZERO | __GFP_NO_OBJ_EXT. But it's correct to pass on
> the incoming gfp flags (only augmented with __GFP_ZERO), because if
> alloc_flags contain SLAB_ALLOC_TRYLOCK, the incoming gfp flags have to
> be also compatible with it.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slab.h | 1 +
> mm/slub.c | 13 +++++--------
> 2 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 45bfcfb35a9c..509f330654b8 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -21,6 +21,7 @@
> #define SLAB_ALLOC_DEFAULT 0x00 /* no flags */
> #define SLAB_ALLOC_TRYLOCK 0x01 /* a kmalloc_nolock() allocation */
> #define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */
> +#define SLAB_ALLOC_NO_RECURSE 0x04 /* prevent kmalloc() recursion */
>
> static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
> {
> diff --git a/mm/slub.c b/mm/slub.c
> index cbb38bd01e46..7dfbd0251aa2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2167,15 +2167,12 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>
> gfp &= ~OBJCGS_CLEAR_MASK;
> /* Prevent recursive extension vector allocation */
> - gfp |= __GFP_NO_OBJ_EXT;
> + alloc_flags |= SLAB_ALLOC_NO_RECURSE;
>
> sz = obj_exts_alloc_size(s, slab, gfp);
>

For the original calls to kmalloc_nolock and kmalloc_node, I notice a difference:

> - if (unlikely(!allow_spin))
> - vec = kmalloc_nolock(sz, __GFP_ZERO | __GFP_NO_OBJ_EXT,
> - slab_nid(slab));

kmalloc_nolock completely discarded `gfp` flags.

> - else
> - vec = kmalloc_node(sz, gfp | __GFP_ZERO, slab_nid(slab));

while kmalloc_node preserved and passed it along.

> + /* This will use kmalloc_nolock() if alloc_flags say so */
> + vec = kmalloc_flags(sz, gfp | __GFP_ZERO, alloc_flags, slab_nid(slab));

Now both paths are merged into kmalloc_flags, the gfp flags are
unconditionally carried through. It seems this might carry some unwanted flags.

I traced the call path and found that ___slab_alloc sets the __GFP_THISNODE
for trynode_flags. If this flag propagates all the way into
kmalloc_flags->...->__kmalloc_nolock_noprof, it will trigger the
VM_WARN_ON_ONCE warning. Maybe we need to strip the original gfp if
`!allow_spin`.

--
Thanks,
Hao

Hao Li

unread,
Jun 12, 2026, 2:57:35 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:15PM +0200, Vlastimil Babka (SUSE) wrote:
> The two flags are added internally so there's no point for warning if
> they are passed by the caller as well, so allow them. This will allow
> simplifying obj_ext allocation under kmalloc_nolock().
>
> Also it's not necessary to have the extra alloc_gfp variable for adding
> the two flags. The original gfp_flags parameter is not used anywhere
> except for the warning. So remove alloc_gfp and directly modify and use
> gfp_flags everywhere.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

LGTM

Hao Li

unread,
Jun 12, 2026, 4:03:22 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:16PM +0200, Vlastimil Babka (SUSE) wrote:
> With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
> alloc flag that prevents kmalloc recursion. For that we need a version
> of kmalloc() that takes alloc_flags and use it in places that perform
> these potentially recursive kmalloc allocations (of sheaves or obj_ext
> arrays).
>
> Add this function, named kmalloc_flags(). Right now it's only useful for
> these nested allocations, so it doesn't need to optimize build-time
> constant sizes like kmalloc() or kmalloc_buckets.
>
> Since we need it to support both normal and non-spinning
> kmalloc_nolock() context through the SLAB_ALLOC_TRYLOCK flag, split out
> most of the special _kmalloc_nolock_noprof() implementation to
> __kmalloc_nolock_noprof() that takes a slab_alloc_context, and make
> _kmalloc_nolock_noprof() a simple tail calling wrapper with the proper
> context.
>
> kmalloc_flags() can thus determine whether to call
> __kmalloc_nolock_noprof() or __do_kmalloc_node(), based on the
> given alloc_flags.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---

Hao Li

unread,
Jun 12, 2026, 4:17:18 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 05:40:18PM +0200, Vlastimil Babka (SUSE) wrote:
> Finish the switch away from __GFP_NO_OBJ_EXT by replacing it with
> SLAB_ALLOC_NO_RECURSE when allocating empty sheaves. Pass alloc_flags to
> [__]alloc_empty_sheaf(). Callers that can't be part of a recursive
> kmalloc() chain simply pass SLAB_ALLOC_DEFAULT. Use kmalloc_flags()
> instead of kzalloc() for allocating the sheaf.
>
> This leaves __GFP_NO_OBJ_EXT with no users in slab, so stop allowing the
> flag in kmalloc_nolock().

Vlastimil Babka (SUSE)

unread,
Jun 12, 2026, 5:51:36 AMJun 12
to Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/12/26 05:10, Hao Li wrote:
> On Wed, Jun 10, 2026 at 05:40:06PM +0200, Vlastimil Babka (SUSE) wrote:
>> @@ -5389,13 +5401,18 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
>> if (ret)
>> goto success;
>>
>> + struct slab_alloc_context ac = {
>> + .caller_addr = _RET_IP_,
>> + .orig_size = orig_size,
>> + };
>
> It might be better to move this to the beginning of the function, to avoid
> patch09 jump to `success` before ac is initialized.

Hm right, didn't compilers actually complain about goto skipping over
declarations? But neither gcc nor clang do for me, hm. Will move, thanks.

Vlastimil Babka (SUSE)

unread,
Jun 12, 2026, 5:56:16 AMJun 12
to Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/12/26 06:04, Hao Li wrote:
> On Wed, Jun 10, 2026 at 05:40:09PM +0200, Vlastimil Babka (SUSE) wrote:
> nit: the comment "__GFP_THISNODE in pc.flags" also needs to be updated to "trynode_flags"

Done.

> otherwise, looks good to me.
> Reviewed-by: Hao Li <hao...@linux.dev>

Thanks!

Vlastimil Babka (SUSE)

unread,
Jun 12, 2026, 5:59:28 AMJun 12
to Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Right, deleted.

> Otherwise, looks good to me.
> Reviewed-by: Hao Li <hao...@linux.dev>

Thanks!

Vlastimil Babka (SUSE)

unread,
Jun 12, 2026, 6:05:34 AMJun 12
to Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
True, done!

Vlastimil Babka (SUSE)

unread,
Jun 12, 2026, 6:17:52 AMJun 12
to Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Thanks. This should do the job in a more generic way I hope?

diff --git a/mm/slub.c b/mm/slub.c
index f9b8dc56bb57..0bf53f70c9be 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2047,12 +2047,15 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node,
#endif /* CONFIG_SLUB_DEBUG */

/*
- * The allocated objcg pointers array is not accounted directly.
+ * The allocated objcg pointers array or sheaf is not accounted directly.
* Moreover, it should not come from DMA buffer and is not readily
- * reclaimable. So those GFP bits should be masked off.
+ * reclaimable. Node restriction for the parent allocation also should
+ * not apply to the slab's internal objects.
+ * So those GFP bits should be masked off.
*/
#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
- __GFP_ACCOUNT | __GFP_NOFAIL)
+ __GFP_ACCOUNT | __GFP_NOFAIL |
+ __GFP_THISNODE )

#ifdef CONFIG_SLAB_OBJ_EXT


Hao Li

unread,
Jun 12, 2026, 7:30:14 AMJun 12
to Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Yeah, this is more elegant.

> diff --git a/mm/slub.c b/mm/slub.c
> index f9b8dc56bb57..0bf53f70c9be 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2047,12 +2047,15 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node,
> #endif /* CONFIG_SLUB_DEBUG */
>
> /*
> - * The allocated objcg pointers array is not accounted directly.
> + * The allocated objcg pointers array or sheaf is not accounted directly.
> * Moreover, it should not come from DMA buffer and is not readily
> - * reclaimable. So those GFP bits should be masked off.
> + * reclaimable. Node restriction for the parent allocation also should
> + * not apply to the slab's internal objects.
> + * So those GFP bits should be masked off.
> */
> #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
> - __GFP_ACCOUNT | __GFP_NOFAIL)
> + __GFP_ACCOUNT | __GFP_NOFAIL |
> + __GFP_THISNODE )

Good idea! Both code and comments make sense to me.

>
> #ifdef CONFIG_SLAB_OBJ_EXT
>
>

--
Thanks,
Hao

Suren Baghdasaryan

unread,
Jun 14, 2026, 9:28:42 PMJun 14
to Vlastimil Babka (SUSE), Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Thu, Jun 11, 2026 at 9:37 AM Vlastimil Babka (SUSE)
<vba...@kernel.org> wrote:
>
> On 6/11/26 17:11, Harry Yoo wrote:
> >
> >> From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
> >> From: "Vlastimil Babka (SUSE)" <vba...@kernel.org>
> >> Date: Wed, 10 Jun 2026 17:40:04 +0200
> >> Subject: [PATCH] mm/slab: do not init any kfence objects on allocation
> >>
> >> When init (zeroing) on allocation is requested, for kmalloc() we
> >> generally have to zero the full object size even if a smaller size is
> >> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
> >>
> >> When we end up allocating a kfence object, kfence perfoms the zeroing on
> >
> > nit: perfoms -> performs
>
> Fixed.
>
> >> its own because has its own redzone beyond the requested size. Thus

nit: s/because has/because it has
Reviewed-by: Suren Baghdasaryan <sur...@google.com>

>
> Thanks!
>
> > Thanks!
> >
>

Suren Baghdasaryan

unread,
Jun 14, 2026, 9:33:16 PMJun 14
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Reviewed-by: Suren Baghdasaryan <sur...@google.com>


>
> --
> Thanks,
> Hao

Suren Baghdasaryan

unread,
Jun 14, 2026, 9:41:21 PMJun 14
to Vlastimil Babka (SUSE), Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Fri, Jun 12, 2026 at 2:51 AM Vlastimil Babka (SUSE)
<vba...@kernel.org> wrote:
>
> On 6/12/26 05:10, Hao Li wrote:
> > On Wed, Jun 10, 2026 at 05:40:06PM +0200, Vlastimil Babka (SUSE) wrote:
> >> @@ -5389,13 +5401,18 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
> >> if (ret)
> >> goto success;
> >>
> >> + struct slab_alloc_context ac = {
> >> + .caller_addr = _RET_IP_,
> >> + .orig_size = orig_size,
> >> + };
> >
> > It might be better to move this to the beginning of the function, to avoid
> > patch09 jump to `success` before ac is initialized.
>
> Hm right, didn't compilers actually complain about goto skipping over
> declarations? But neither gcc nor clang do for me, hm. Will move, thanks.

I see it's moved in
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next,
so

Reviewed-by: Suren Baghdasaryan <sur...@google.com>

Suren Baghdasaryan

unread,
Jun 14, 2026, 10:00:59 PMJun 14
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Thu, Jun 11, 2026 at 8:50 PM Hao Li <hao...@linux.dev> wrote:
>
> On Wed, Jun 10, 2026 at 05:40:07PM +0200, Vlastimil Babka (SUSE) wrote:
> > Similarly to the page allocators, introduce slab-allocator specific
> > alloc flags that internally control allocation behavior in addition to
> > gfp_flags, without occupying the limited gfp flags space.
> >
> > Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
> > page allocator's ALLOC_TRYLOCK and will be used to reimplement
> > kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
> > gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
> > importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
> > e.g. in early boot with a restricted gfp_allowed_mask.
> >
> > Also introduce alloc_flags_allow_spinning() to replace the usage of
> > gfpflags_allow_spinning().
> >
> > Start using alloc_flags and the new check first in alloc_from_pcs() and
> > __pcs_replace_empty_main(). This means some slab allocations that were
> > falsely treated as kmalloc_nolock() due to their gfp flags will now have
> > higher chances of succeed, and this will further increase with followup

nit: I think it should be either "higher chances of succeess" or
"higher chances to succeed".

> > changes.
> >
> > Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
> > reach it from a slab allocation that's not _nolock() and yet lacks
> > __GFP_KSWAPD_RECLAIM for other reasons.
> >
> > Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> > ---
>
> Reviewed-by: Hao Li <hao...@linux.dev>

I would call SLAB_ALLOC_TRYLOCK something like SLAB_ALLOC_NOSPIN or
SLAB_ALLOC_NOLOCK but naming is hard and I don't claim myself to be
good at it. So, feel free to adopt my suggestion if you like it or
ignore it otherwise.

Suren Baghdasaryan

unread,
Jun 14, 2026, 10:01:42 PMJun 14
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Sun, Jun 14, 2026 at 7:00 PM Suren Baghdasaryan <sur...@google.com> wrote:
>
> On Thu, Jun 11, 2026 at 8:50 PM Hao Li <hao...@linux.dev> wrote:
> >
> > On Wed, Jun 10, 2026 at 05:40:07PM +0200, Vlastimil Babka (SUSE) wrote:
> > > Similarly to the page allocators, introduce slab-allocator specific
> > > alloc flags that internally control allocation behavior in addition to
> > > gfp_flags, without occupying the limited gfp flags space.
> > >
> > > Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
> > > page allocator's ALLOC_TRYLOCK and will be used to reimplement
> > > kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
> > > gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
> > > importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
> > > e.g. in early boot with a restricted gfp_allowed_mask.
> > >
> > > Also introduce alloc_flags_allow_spinning() to replace the usage of
> > > gfpflags_allow_spinning().
> > >
> > > Start using alloc_flags and the new check first in alloc_from_pcs() and
> > > __pcs_replace_empty_main(). This means some slab allocations that were
> > > falsely treated as kmalloc_nolock() due to their gfp flags will now have
> > > higher chances of succeed, and this will further increase with followup
>
> nit: I think it should be either "higher chances of succeess" or
> "higher chances to succeed".

And of course I misspelled "success" :)

Alexei Starovoitov

unread,
Jun 14, 2026, 10:16:15 PMJun 14
to Suren Baghdasaryan, Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, LKML, open list:CONTROL GROUP (CGROUP)
Just noticed "trylock" in the #define SLAB_ALLOC_TRYLOCK

Please call it SLAB_ALLOC_NOLOCK.

Initial api was using 'trylock' name and it was a mistake,
since people assumed normal spin_trylock() like semantics.
"trylock" implies that it fails under contention
and retry is a normal next step. It's not the case.
No one should be retrying. That's why the final api was kmalloc_nolock().
So please keep this important distinction in the name.
SLAB_ALLOC_NOLOCK should mean that spinning locks
should not be taken. It should not mean "just go to trylock everywhere".

Suren Baghdasaryan

unread,
Jun 14, 2026, 10:20:24 PMJun 14
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Thu, Jun 11, 2026 at 8:51 PM Hao Li <hao...@linux.dev> wrote:
>
> On Wed, Jun 10, 2026 at 05:40:08PM +0200, Vlastimil Babka (SUSE) wrote:
> > Add alloc_flags as a new field to the slab_alloc_context helper struct,
> > so we can pass it to more functions in the slab implementation without
> > adding another function parameter.
> >
> > Start checking them via alloc_flags_allow_spinning() in
> > alloc_single_from_new_slab() (where we can drop the allow_spin
> > parameter) and ___slab_alloc(). This further reduces false-positive
> > spinning-not-allowed from allocations that are not kmalloc_nolock() but
> > lack __GFP_RECLAIM flags.

___slab_alloc() is now using alloc_flags_allow_spinning(alloc_flags)
while function it uses (get_from_partial()->get_from_any_partial()) is
still using gfpflags_allow_spinning(gfpflags). I'm guessing
get_from_any_partial() will be converted later on but I wonder if that
conversion would better be done in the same patch to avoid
inconsistent behavior during possible bisection.

Suren Baghdasaryan

unread,
Jun 14, 2026, 10:36:42 PMJun 14
to Harry Yoo, Vlastimil Babka (SUSE), Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 11:05 PM Harry Yoo <ha...@kernel.org> wrote:
>
>
>
> On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
> > Refactor get_from_partial_node(), get_from_any_partial(),
> > get_from_partial() and ___slab_alloc().
> >
> > Remove struct partial_context, which used to be more substantial but
> > shrank as part of the sheaves conversion. Instead pass gfp_flags and
> > pointer to the new slab_alloc_context, which together is a superset of
> > partial_context.
> >
> > This means alloc_flags are now available and we can use them to
> > determine if spinning is allowed, further reducing false positive "not
> > allowed" in the slow path due to gfp flags lacking __GFP_RECLAIM.
> >
> > Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> > ---
>
> Looks good to me,
> Reviewed-by: Harry Yoo (Oracle) <ha...@kernel.org>

Ah, nice! The conversion I was anticipating in the previous patch...
I would do this removal of partial_context as patch 6 and then convert
___slab_alloc() and get_from_any_partial*() altogether in patch 7. I
think that would keep the behavior of the ___slab_alloc() more robust
throughout the patchset. But I would say it's nice to have, not a
must-have.

Reviewed-by: Suren Baghdasaryan <sur...@google.com>

>
> --
> Cheers,
> Harry / Hyeonggon

Suren Baghdasaryan

unread,
Jun 15, 2026, 12:10:16 AMJun 15
to Vlastimil Babka (SUSE), Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org

Suren Baghdasaryan

unread,
Jun 15, 2026, 12:35:41 AMJun 15
to Vlastimil Babka (SUSE), Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
<vba...@kernel.org> wrote:
>
> Convert the whole following call stack to pass either slab_alloc_context
> (thus including alloc_flags) or just alloc_flags as necessary:
>
> slab_post_alloc_hook()
> alloc_tagging_slab_alloc_hook()
> __alloc_tagging_slab_alloc_hook()
> prepare_slab_obj_exts_hook()
> alloc_slab_obj_exts()
> memcg_slab_post_alloc_hook()
> __memcg_slab_post_alloc_hook()
> alloc_slab_obj_exts()
>
> Converting all these at once avoids unnecessary churn and is mostly
> mechanical.
>
> This ultimately allows to decide if spinning is allowed using
> alloc_flags in alloc_slab_obj_exts(), as well as slab_post_alloc_hook().
> Aside from alloc_from_pcs_bulk() (to be handled next) there is nothing
> else in slab itself relying on gfpflags_allow_spinning() which can
> be false even if not called from kmalloc_nolock().
>
> A followup change will also use the alloc_flags availability in the call
> stack above to remove the __GFP_NO_OBJ_EXT flag.
>
> For alloc_slab_obj_exts(), also replace the suboptimal "bool new_slab"
> parameter with a SLAB_ALLOC_NEW_SLAB flag with identical functionality.
>
> To further reduce the number of parameters of slab_post_alloc_hook(),
> also make 'struct list_lru *lru' (which is NULL for most callers) a new
> field of slab_alloc_context.
>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/memcontrol.c | 5 +--
> mm/slab.h | 6 ++--
> mm/slub.c | 94 +++++++++++++++++++++++++++++++++------------------------
> 3 files changed, 62 insertions(+), 43 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c03d4787d466..29390ba13baa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3424,7 +3424,8 @@ static inline size_t obj_full_size(struct kmem_cache *s)
> }
>
> bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t flags, size_t size, void **p)
> + gfp_t flags, unsigned int slab_alloc_flags,
> + size_t size, void **p)
> {
> size_t obj_size = obj_full_size(s);
> struct obj_cgroup *objcg;
> @@ -3472,7 +3473,7 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> slab = virt_to_slab(p[i]);
>
> if (!slab_obj_exts(slab) &&
> - alloc_slab_obj_exts(slab, s, flags, false)) {
> + alloc_slab_obj_exts(slab, s, flags, slab_alloc_flags)) {
> continue;
> }
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 96f65b625600..4db6d8aa0ee3 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -19,6 +19,7 @@
> /* slab's alloc_flags definitions */
> #define SLAB_ALLOC_DEFAULT 0x00 /* no flags */
> #define SLAB_ALLOC_TRYLOCK 0x01 /* a kmalloc_nolock() allocation */
> +#define SLAB_ALLOC_NEW_SLAB 0x02 /* a flag for alloc_slab_obj_exts() */
>
> static inline bool alloc_flags_allow_spinning(const unsigned int alloc_flags)
> {
> @@ -612,7 +613,7 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> }
>
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> - gfp_t gfp, bool new_slab);
> + gfp_t gfp, unsigned int alloc_flags);
>
> #else /* CONFIG_SLAB_OBJ_EXT */
>
> @@ -642,7 +643,8 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
>
> #ifdef CONFIG_MEMCG
> bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t flags, size_t size, void **p);
> + gfp_t flags, unsigned int slab_alloc_flags,
> + size_t size, void **p);
> void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> void **p, int objects, unsigned long obj_exts);
> #endif
> diff --git a/mm/slub.c b/mm/slub.c
> index 8f6ca3d5fdfa..e634137b67fa 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -218,6 +218,7 @@ struct slab_alloc_context {
> unsigned long caller_addr;
> unsigned long orig_size;
> unsigned int alloc_flags;
> + struct list_lru *lru;
> };
>
> /* Structure holding parameters for get_partial_node_bulk() */
> @@ -2155,9 +2156,9 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
> }
>
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> - gfp_t gfp, bool new_slab)
> + gfp_t gfp, unsigned int alloc_flags)
> {
> - bool allow_spin = gfpflags_allow_spinning(gfp);
> + const bool allow_spin = alloc_flags_allow_spinning(alloc_flags);
> unsigned int objects = objs_per_slab(s, slab);
> unsigned long new_exts;
> unsigned long old_exts;
> @@ -2206,7 +2207,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> old_exts = READ_ONCE(slab->obj_exts);
> handle_failed_objexts_alloc(old_exts, vec, objects);
>
> - if (new_slab) {
> + if (alloc_flags & SLAB_ALLOC_NEW_SLAB) {
> /*
> * If the slab is brand new and nobody can yet access its
> * obj_exts, no synchronization is required and obj_exts can
> @@ -2331,7 +2332,7 @@ static inline void init_slab_obj_exts(struct slab *slab)
> }
>
> static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> - gfp_t gfp, bool new_slab)
> + gfp_t gfp, unsigned int alloc_flags)
> {
> return 0;
> }
> @@ -2351,10 +2352,10 @@ static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
>
> static inline unsigned long
> prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
> - gfp_t flags, void *p)
> + gfp_t flags, unsigned int alloc_flags, void *p)
> {
> if (!slab_obj_exts(slab) &&
> - alloc_slab_obj_exts(slab, s, flags, false)) {
> + alloc_slab_obj_exts(slab, s, flags, alloc_flags)) {
> pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
> __func__, s->name);
> return 0;
> @@ -2366,7 +2367,8 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, struct slab *slab,
>
> /* Should be called only if mem_alloc_profiling_enabled() */
> static noinline void
> -__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> +__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
> + unsigned int alloc_flags)
> {
> unsigned long obj_exts;
> struct slabobj_ext *obj_ext;
> @@ -2382,7 +2384,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> return;
>
> slab = virt_to_slab(object);
> - obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, object);
> + obj_exts = prepare_slab_obj_exts_hook(s, slab, flags, alloc_flags, object);
> /*
> * Currently obj_exts is used only for allocation profiling.
> * If other users appear then mem_alloc_profiling_enabled()
> @@ -2401,10 +2403,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> }
>
> static inline void
> -alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> +alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
> + unsigned int alloc_flags)
> {
> if (mem_alloc_profiling_enabled())
> - __alloc_tagging_slab_alloc_hook(s, object, flags);
> + __alloc_tagging_slab_alloc_hook(s, object, flags, alloc_flags);
> }
>
> /* Should be called only if mem_alloc_profiling_enabled() */
> @@ -2443,7 +2446,8 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> #else /* CONFIG_MEM_ALLOC_PROFILING */
>
> static inline void
> -alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> +alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags,
> + unsigned int alloc_flags)
> {
> }
>
> @@ -2461,8 +2465,9 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> static void memcg_alloc_abort_single(struct kmem_cache *s, void *object);
>
> static __fastpath_inline
> -bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t flags, size_t size, void **p)
> +bool memcg_slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
> + size_t size, void **p,
> + struct slab_alloc_context *ac)
> {
> if (likely(!memcg_kmem_online()))
> return true;
> @@ -2470,7 +2475,8 @@ bool memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)))
> return true;
>
> - if (likely(__memcg_slab_post_alloc_hook(s, lru, flags, size, p)))
> + if (likely(__memcg_slab_post_alloc_hook(s, ac->lru, flags,
> + ac->alloc_flags, size, p)))
> return true;
>
> if (likely(size == 1)) {
> @@ -2558,14 +2564,15 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
> put_slab_obj_exts(obj_exts);
> }
>
> - return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> + return __memcg_slab_post_alloc_hook(s, NULL, flags, SLAB_ALLOC_DEFAULT,
> + 1, &p);
> }
>
> #else /* CONFIG_MEMCG */
> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
> - struct list_lru *lru,
> - gfp_t flags, size_t size,
> - void **p)
> + gfp_t flags,
> + size_t size, void **p,
> + struct slab_alloc_context *ac)
> {
> return true;
> }
> @@ -3352,12 +3359,14 @@ static inline void init_freelist_randomization(void) { }
> #endif /* CONFIG_SLAB_FREELIST_RANDOM */
>
> static __always_inline void account_slab(struct slab *slab, int order,
> - struct kmem_cache *s, gfp_t gfp)
> + struct kmem_cache *s, gfp_t gfp,
> + unsigned int alloc_flags)
> {
> if (memcg_kmem_online() &&
> (s->flags & SLAB_ACCOUNT) &&
> !slab_obj_exts(slab))
> - alloc_slab_obj_exts(slab, s, gfp, true);
> + alloc_slab_obj_exts(slab, s, gfp,
> + alloc_flags | SLAB_ALLOC_NEW_SLAB);
>
> mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> PAGE_SIZE << order);
> @@ -3434,7 +3443,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags,
> * to prevent the array from being overwritten.
> */
> alloc_slab_obj_exts_early(s, slab);
> - account_slab(slab, oo_order(oo), s, flags);
> + account_slab(slab, oo_order(oo), s, flags, alloc_flags);
>
> return slab;
> }
> @@ -4568,9 +4577,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
> }
>
> static __fastpath_inline
> -bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> - gfp_t flags, size_t size, void **p,
> - unsigned int orig_size)
> +bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
> + void **p, struct slab_alloc_context *ac)

Would if be possible to make this last parameter a ""const struct
slab_alloc_context*" (here and in other functions accepting it)? I
think these functions accept it as an input parameter only and are not
supposed to change it, right? Makes it easy to veriy that
slab_alloc_context is not changed between consequitive calls reusing
it, for example inside slab_alloc_node().

> {
> bool init = slab_want_init_on_alloc(flags, s);
> unsigned int zero_size = s->object_size;
> @@ -4590,7 +4598,7 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> * orig_size if we track it.
> */
> if (slub_debug_orig_size(s))
> - zero_size = orig_size;
> + zero_size = ac->orig_size;
>
> /*
> * When slab_debug is enabled, avoid memory initialization integrated
> @@ -4616,14 +4624,14 @@ bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> !kasan_has_integrated_init())
> && !is_kfence_address(p[i]))
> memset(p[i], 0, zero_size);
> - if (gfpflags_allow_spinning(flags))
> + if (alloc_flags_allow_spinning(ac->alloc_flags))
> kmemleak_alloc_recursive(p[i], s->object_size, 1,
> s->flags, init_flags);
> kmsan_slab_alloc(s, p[i], init_flags);
> - alloc_tagging_slab_alloc_hook(s, p[i], flags);
> + alloc_tagging_slab_alloc_hook(s, p[i], flags, ac->alloc_flags);
> }
>
> - return memcg_slab_post_alloc_hook(s, lru, flags, size, p);
> + return memcg_slab_post_alloc_hook(s, flags, size, p, ac);
> }
>
> /*
> @@ -4918,6 +4926,12 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
> {
> const unsigned int alloc_flags = SLAB_ALLOC_DEFAULT;
> void *object;
> + struct slab_alloc_context ac = {
> + .caller_addr = addr,
> + .orig_size = orig_size,
> + .alloc_flags = alloc_flags,
> + .lru = lru,
> + };
>
> s = slab_pre_alloc_hook(s, gfpflags);
> if (unlikely(!s))
> @@ -4929,14 +4943,8 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
>
> object = alloc_from_pcs(s, gfpflags, alloc_flags, node);
>
> - if (unlikely(!object)) {
> - struct slab_alloc_context ac = {
> - .caller_addr = addr,
> - .orig_size = orig_size,
> - .alloc_flags = alloc_flags,
> - };
> + if (!object)

Any reason "unlikely" is removed?

> object = __slab_alloc_node(s, gfpflags, node, &ac);
> - }
>
> maybe_wipe_obj_freeptr(s, object);
>
> @@ -4945,7 +4953,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
> * In case this fails due to memcg_slab_post_alloc_hook(),
> * object is set to NULL
> */
> - slab_post_alloc_hook(s, lru, gfpflags, 1, &object, orig_size);
> + slab_post_alloc_hook(s, gfpflags, 1, &object, &ac);
>
> return object;
> }
> @@ -5240,6 +5248,10 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
> struct slab_sheaf *sheaf)
> {
> void *ret = NULL;
> + struct slab_alloc_context ac = {
> + .orig_size = s->object_size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
>
> if (sheaf->size == 0)
> goto out;
> @@ -5250,7 +5262,7 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
> ret = sheaf->objects[--sheaf->size];
>
> /* add __GFP_NOFAIL to force successful memcg charging */
> - slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, s->object_size);
> + slab_post_alloc_hook(s, gfp | __GFP_NOFAIL, 1, &ret, &ac);
> out:
> trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);
>
> @@ -5437,7 +5449,7 @@ void *_kmalloc_nolock_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t gfp_flags, in
>
> success:
> maybe_wipe_obj_freeptr(s, ret);
> - slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, orig_size);
> + slab_post_alloc_hook(s, alloc_gfp, 1, &ret, &ac);
>
> ret = kasan_kmalloc(s, ret, orig_size, alloc_gfp);
> return ret;
> @@ -7303,6 +7315,10 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
> {
> unsigned int i = 0;
> void *kfence_obj;
> + struct slab_alloc_context ac = {
> + .orig_size = s->object_size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
>
> if (!size)
> return false;
> @@ -7353,7 +7369,7 @@ bool kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags,
>
> out:
> /* memcg and kmem_cache debug support and memory initialization */
> - return likely(slab_post_alloc_hook(s, NULL, flags, size, p, s->object_size));
> + return likely(slab_post_alloc_hook(s, flags, size, p, &ac));
> }
> EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);
>
>
> --
> 2.54.0
>

Suren Baghdasaryan

unread,
Jun 15, 2026, 12:40:13 AMJun 15
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
Reviewed-by: Suren Baghdasaryan <sur...@google.com>

>
> --
> Thanks,
> Hao

Suren Baghdasaryan

unread,
Jun 15, 2026, 12:49:05 AMJun 15
to Vlastimil Babka (SUSE), Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
I see it fixed in

Suren Baghdasaryan

unread,
Jun 15, 2026, 12:58:26 AMJun 15
to Vlastimil Babka (SUSE), Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
<vba...@kernel.org> wrote:
>
> With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
> alloc flag that prevents kmalloc recursion. For that we need a version
> of kmalloc() that takes alloc_flags and use it in places that perform
> these potentially recursive kmalloc allocations (of sheaves or obj_ext
> arrays).
>
> As a preparatory step, make __do_kmalloc_node() take a pointer to
> slab_alloc_context. This replaces the 'caller' parameter and includes
> alloc_flags which we'll make use of.

I think you could also eliminate __do_kmalloc_node() function's "size"
parameter as it's always the same as ac->orig_size.

>
> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
> ---
> mm/slub.c | 47 ++++++++++++++++++++++++++++++++---------------
> 1 file changed, 32 insertions(+), 15 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index ef457e07db83..6845e15c148a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5338,19 +5338,14 @@ EXPORT_SYMBOL(__kmalloc_large_node_noprof);
>
> static __always_inline
> void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
> - unsigned long caller, kmalloc_token_t token)
> + kmalloc_token_t token, struct slab_alloc_context *ac)
> {
> struct kmem_cache *s;
> void *ret;
> - struct slab_alloc_context ac = {
> - .caller_addr = caller,
> - .orig_size = size,
> - .alloc_flags = SLAB_ALLOC_DEFAULT,
> - };
>
> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> ret = __kmalloc_large_node_noprof(size, flags, node);
> - trace_kmalloc(caller, ret, size,
> + trace_kmalloc(ac->caller_addr, ret, size,
> PAGE_SIZE << get_order(size), flags, node);
> return ret;
> }
> @@ -5360,22 +5355,34 @@ void *__do_kmalloc_node(size_t size, kmem_buckets *b, gfp_t flags, int node,
>
> s = kmalloc_slab(size, b, flags, token);
>
> - ret = slab_alloc_node(s, flags, node, &ac);
> + ret = slab_alloc_node(s, flags, node, ac);
> ret = kasan_kmalloc(s, ret, size, flags);
> - trace_kmalloc(caller, ret, size, s->size, flags, node);
> + trace_kmalloc(ac->caller_addr, ret, size, s->size, flags, node);
> return ret;
> }
> void *__kmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags, int node)
> {
> + struct slab_alloc_context ac = {
> + .caller_addr = _RET_IP_,
> + .orig_size = size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
> +
> return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
> - _RET_IP_, PASS_TOKEN_PARAM(token));
> + PASS_TOKEN_PARAM(token), &ac);
> }
> EXPORT_SYMBOL(__kmalloc_node_noprof);
>
> void *__kmalloc_noprof(DECL_TOKEN_PARAMS(size, token), gfp_t flags)
> {
> - return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE, _RET_IP_,
> - PASS_TOKEN_PARAM(token));
> + struct slab_alloc_context ac = {
> + .caller_addr = _RET_IP_,
> + .orig_size = size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
> +
> + return __do_kmalloc_node(size, NULL, flags, NUMA_NO_NODE,
> + PASS_TOKEN_PARAM(token), &ac);
> }
> EXPORT_SYMBOL(__kmalloc_noprof);
>
> @@ -5471,9 +5478,14 @@ EXPORT_SYMBOL_GPL(_kmalloc_nolock_noprof);
> void *__kmalloc_node_track_caller_noprof(DECL_KMALLOC_PARAMS(size, b, token), gfp_t flags,
> int node, unsigned long caller)
> {
> - return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
> - caller, PASS_TOKEN_PARAM(token));
> + struct slab_alloc_context ac = {
> + .caller_addr = caller,
> + .orig_size = size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
>
> + return __do_kmalloc_node(size, PASS_BUCKET_PARAM(b), flags, node,
> + PASS_TOKEN_PARAM(token), &ac);
> }
> EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
>
> @@ -6874,6 +6886,11 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
> {
> bool allow_block;
> void *ret;
> + struct slab_alloc_context ac = {
> + .caller_addr = _RET_IP_,
> + .orig_size = size,
> + .alloc_flags = SLAB_ALLOC_DEFAULT,
> + };
>
> /*
> * It doesn't really make sense to fallback to vmalloc for sub page
> @@ -6881,7 +6898,7 @@ void *__kvmalloc_node_noprof(DECL_KMALLOC_PARAMS(size, b, token), unsigned long
> */
> ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
> kmalloc_gfp_adjust(flags, size),
> - node, _RET_IP_, PASS_TOKEN_PARAM(token));
> + node, PASS_TOKEN_PARAM(token), &ac);
> if (ret || size <= PAGE_SIZE)
> return ret;
>
>
> --
> 2.54.0
>

Suren Baghdasaryan

unread,
Jun 15, 2026, 1:06:33 AMJun 15
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org

Suren Baghdasaryan

unread,
Jun 15, 2026, 1:15:05 AMJun 15
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org

Suren Baghdasaryan

unread,
Jun 15, 2026, 1:38:56 AMJun 15
to Hao Li, Vlastimil Babka (SUSE), Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Hao Ge
Makes sense. I see
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next
already implementing this and also keeping __GFP_NO_OBJ_EXT and
SLAB_ALLOC_NO_RECURSE both used. That version looks good to me, so
I'll wait for v3.

At the end of this series, we end up with no users of __GFP_NO_OBJ_EXT
but we still keep it defined. I'm guessing you leave it because of the
new patch [1] which aliases __GFP_NO_OBJ_EXT? I will have to make that
mechanism work without a GFP flag, possibly using a similar approach.
CC'ing Hao Ge to be in the loop of these changes. I'll work with him
on aliminating that __GFP_NO_OBJ_EXT alias.

[1] https://lore.kernel.org/all/20260604024008...@linux.dev/

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 4:52:57 AMJun 15
to Suren Baghdasaryan, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Andrey Konovalov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/15/26 03:28, Suren Baghdasaryan wrote:
> On Thu, Jun 11, 2026 at 9:37 AM Vlastimil Babka (SUSE)
> <vba...@kernel.org> wrote:
>>
>> On 6/11/26 17:11, Harry Yoo wrote:
>> >
>> >> From 3a1c4398ce9f361a4e6f4d9946eab6237eea89c2 Mon Sep 17 00:00:00 2001
>> >> From: "Vlastimil Babka (SUSE)" <vba...@kernel.org>
>> >> Date: Wed, 10 Jun 2026 17:40:04 +0200
>> >> Subject: [PATCH] mm/slab: do not init any kfence objects on allocation
>> >>
>> >> When init (zeroing) on allocation is requested, for kmalloc() we
>> >> generally have to zero the full object size even if a smaller size is
>> >> requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
>> >>
>> >> When we end up allocating a kfence object, kfence perfoms the zeroing on
>> >
>> > nit: perfoms -> performs
>>
>> Fixed.
>>
>> >> its own because has its own redzone beyond the requested size. Thus
>
> nit: s/because has/because it has

Fixed.

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 5:02:08 AMJun 15
to Alexei Starovoitov, Suren Baghdasaryan, Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, LKML, open list:CONTROL GROUP (CGROUP)
On 6/15/26 04:16, Alexei Starovoitov wrote:
> On Sun, Jun 14, 2026 at 7:01 PM Suren Baghdasaryan <sur...@google.com> wrote:
>>
>> On Thu, Jun 11, 2026 at 8:50 PM Hao Li <hao...@linux.dev> wrote:
>> >
>> > On Wed, Jun 10, 2026 at 05:40:07PM +0200, Vlastimil Babka (SUSE) wrote:
>> > > Similarly to the page allocators, introduce slab-allocator specific
>> > > alloc flags that internally control allocation behavior in addition to
>> > > gfp_flags, without occupying the limited gfp flags space.
>> > >
>> > > Introduce the first flag SLAB_ALLOC_TRYLOCK that behaves similarly to
>> > > page allocator's ALLOC_TRYLOCK and will be used to reimplement
>> > > kmalloc_nolock()'s "!allow_spin" behavior. That currently relies on
>> > > gfpflags_allow_spinning() and thus the lack of both __GFP_RECLAIM flags,
>> > > importantly __GFP_KSWAPD_RECLAIM. This can give false-positive results
>> > > e.g. in early boot with a restricted gfp_allowed_mask.
>> > >
>> > > Also introduce alloc_flags_allow_spinning() to replace the usage of
>> > > gfpflags_allow_spinning().
>> > >
>> > > Start using alloc_flags and the new check first in alloc_from_pcs() and
>> > > __pcs_replace_empty_main(). This means some slab allocations that were
>> > > falsely treated as kmalloc_nolock() due to their gfp flags will now have
>> > > higher chances of succeed, and this will further increase with followup
>>
>> nit: I think it should be either "higher chances of succeess" or
>> "higher chances to succeed".

success it is

>>
>> > > changes.
>> > >
>> > > Remove a WARN_ON_ONCE() from refill_objects() as it's now legitimate to
>> > > reach it from a slab allocation that's not _nolock() and yet lacks
>> > > __GFP_KSWAPD_RECLAIM for other reasons.
>> > >
>> > > Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
>> > > ---
>> >
>> > Reviewed-by: Hao Li <hao...@linux.dev>
>>
>> I would call SLAB_ALLOC_TRYLOCK something like SLAB_ALLOC_NOSPIN or
>> SLAB_ALLOC_NOLOCK but naming is hard and I don't claim myself to be
>> good at it. So, feel free to adopt my suggestion if you like it or
>> ignore it otherwise.
>>
>> Reviewed-by: Suren Baghdasaryan <sur...@google.com>
>
> Just noticed "trylock" in the #define SLAB_ALLOC_TRYLOCK
>
> Please call it SLAB_ALLOC_NOLOCK.
>
> Initial api was using 'trylock' name and it was a mistake,
> since people assumed normal spin_trylock() like semantics.
> "trylock" implies that it fails under contention
> and retry is a normal next step. It's not the case.
> No one should be retrying. That's why the final api was kmalloc_nolock().
> So please keep this important distinction in the name.
> SLAB_ALLOC_NOLOCK should mean that spinning locks
> should not be taken. It should not mean "just go to trylock everywhere".

Eh, ok then, will change to SLAB_ALLOC_NOLOCK. Even though it's mostly internal.

So next thing we change page allocator's ALLOC_TRYLOCK to ALLOC_NOLOCK too?

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 6:01:45 AMJun 15
to Suren Baghdasaryan, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
OK, so I switched the order of 6 7 and all the changes from
gfpflags_allow_spinning() to alloc_flags_allow_spinning are now in the
newly-later patch; the "replace struct partial_context with
slab_alloc_context" part has no functional changes. Verified that the end
result is exactly the same, and only updated changelogs a bit.

> Reviewed-by: Suren Baghdasaryan <sur...@google.com>

Thanks!

>>

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 6:14:30 AMJun 15
to Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Suren Baghdasaryan, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/11/26 09:52, Harry Yoo wrote:
>
>
> On 6/11/26 12:40 AM, Vlastimil Babka (SUSE) wrote:
>> Add the alloc_flags parameter to allocate_slab() and new_slab()
>> so it can be used to determine if spinning is allowed, independently
>> from gfp flags.
>>
>> refill_objects() passes SLAB_ALLOC_DEFAULT because it can only be
>> reached from contexts that allow spinning.
>>
>> Also change how trynode_flags are constructed in ___slab_alloc() to
>> achieve the same "do not upgrade to GFP_NOWAIT" by using masking instead
>> of a branch. It will now also not upgrade in cases where gfp is weaker
>> than GFP_NOWAIT (i.e. lacks __GFP_KSWAPD_RECLAIM) but doesn't come from
>> kmalloc_nolock() - which is more correct anyway.
>
> Wait, debugobjects intentionally avoids __GFP_KSWAPD_RECLAIM,
> but we have been upgrading it to GFP_NOWAIT?

Actually, we have not been upgrading it until patch 6/16, which made the
upgrade trigger by starting to rely on alloc_flags? Because previously it
would be !allow_spin due to lack of __GFP_KSWAPD_RECLAIM.

So I will move that flags adjustment to 6/16 (now 7/16).

>> During the masking keep also existing __GFP_NOMEMALLOC (pointed out by
>> Sashiko) and __GFP_ACCOUNT. Previously the hardcoded GFP_NOWAIT would
>> eliminate them, but it's not a big problem that would need a separate
>> fix.
>
> Ack.
>
>> Signed-off-by: Vlastimil Babka (SUSE) <vba...@kernel.org>
>> ---
>> mm/slub.c | 28 ++++++++++++++--------------
>> 1 file changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 98b79e5e7679..8f6ca3d5fdfa 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -4467,25 +4470,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> * 1) try to get a partial slab from target node only by having
>> * __GFP_THISNODE in pc.flags for get_from_partial()
>> * 2) if 1) failed, try to allocate a new slab from target node with
>> - * GPF_NOWAIT | __GFP_THISNODE opportunistically
>> + * (at most) GFP_NOWAIT | __GFP_THISNODE opportunistically
>> * 3) if 2) failed, retry with original gfpflags which will allow
>> * get_from_partial() try partial lists of other nodes before
>> * potentially allocating new page from other nodes
>> */
>> if (unlikely(node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
>> && try_thisnode)) {
>> - if (unlikely(!allow_spin))
>> - /* Do not upgrade gfp to NOWAIT from more restrictive mode */
>> - trynode_flags = gfpflags | __GFP_THISNODE;
>> - else
>> - trynode_flags = GFP_NOWAIT | __GFP_THISNODE;
>> + trynode_flags &= GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_ACCOUNT;
>> + trynode_flags |= __GFP_NOWARN | __GFP_THISNODE;
>> }
>

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 7:08:13 AMJun 15
to Suren Baghdasaryan, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/15/26 06:58, Suren Baghdasaryan wrote:
> On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
> <vba...@kernel.org> wrote:
>>
>> With alloc_flags usage in slab, we can replace __GFP_NO_OBJ_EXT with an
>> alloc flag that prevents kmalloc recursion. For that we need a version
>> of kmalloc() that takes alloc_flags and use it in places that perform
>> these potentially recursive kmalloc allocations (of sheaves or obj_ext
>> arrays).
>>
>> As a preparatory step, make __do_kmalloc_node() take a pointer to
>> slab_alloc_context. This replaces the 'caller' parameter and includes
>> alloc_flags which we'll make use of.
>
> I think you could also eliminate __do_kmalloc_node() function's "size"
> parameter as it's always the same as ac->orig_size.

OK, done.

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 7:11:46 AMJun 15
to Suren Baghdasaryan, Hao Li, Harry Yoo, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org, Hao Ge
OK.

> At the end of this series, we end up with no users of __GFP_NO_OBJ_EXT
> but we still keep it defined. I'm guessing you leave it because of the
> new patch [1] which aliases __GFP_NO_OBJ_EXT? I will have to make that

Yeah.

> mechanism work without a GFP flag, possibly using a similar approach.
> CC'ing Hao Ge to be in the loop of these changes. I'll work with him
> on aliminating that __GFP_NO_OBJ_EXT alias.

Good, then we can remove the flag completely.

Vlastimil Babka (SUSE)

unread,
Jun 15, 2026, 7:33:10 AMJun 15
to Suren Baghdasaryan, Harry Yoo, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Alexei Starovoitov, Andrew Morton, Johannes Weiner, Michal Hocko, Shakeel Butt, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, cgr...@vger.kernel.org
On 6/15/26 06:35, Suren Baghdasaryan wrote:
> On Wed, Jun 10, 2026 at 8:41 AM Vlastimil Babka (SUSE)
> <vba...@kernel.org> wrote:
>> @@ -4568,9 +4577,8 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
>> }
>>
>> static __fastpath_inline
>> -bool slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
>> - gfp_t flags, size_t size, void **p,
>> - unsigned int orig_size)
>> +bool slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, size_t size,
>> + void **p, struct slab_alloc_context *ac)
>
> Would if be possible to make this last parameter a ""const struct
> slab_alloc_context*" (here and in other functions accepting it)? I
> think these functions accept it as an input parameter only and are not
> supposed to change it, right? Makes it easy to veriy that
> slab_alloc_context is not changed between consequitive calls reusing
> it, for example inside slab_alloc_node().

Uh, ok, did that. Also changed orig_size to size_t.
No, fixed, thanks!
It is loading more messages.
0 new messages