[PATCH 0/5] slab: preparatory cleanups before adding sheaves to all caches

0 views
Skip to first unread message

Vlastimil Babka

unread,
Nov 5, 2025, 4:05:34 AM (2 days ago) Nov 5
to Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com, Vlastimil Babka, Alexander Potapenko, Marco Elver, Dmitry Vyukov
These patches are separated from the RFC [1] since that needs more work
and 6.19 would be unrelistic for the whole series at this point. This
subset should be safe to land, improve the codebase on its own and make
the followup smaller.

Patch "slab: make __slab_free() more clear" is a new one based on review
of one of the RFC patches where __slab_free() was found rather tricky.

Git branch: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/sheaves-cleanups

[1] https://patch.msgid.link/20251023-sheaves-for-...@suse.cz

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
Vlastimil Babka (5):
slab: make __slab_free() more clear
slab: move kfence_alloc() out of internal bulk alloc
slab: handle pfmemalloc slabs properly with sheaves
slub: remove CONFIG_SLUB_TINY specific code paths
slab: prevent recursive kmalloc() in alloc_empty_sheaf()

include/linux/gfp_types.h | 6 -
mm/slab.h | 2 -
mm/slub.c | 318 ++++++++++++++++++++++++----------------------
3 files changed, 166 insertions(+), 160 deletions(-)
---
base-commit: 136fe0cba6aca506f116f7cbd41ce1891d17fa85
change-id: 20251105-sheaves-cleanups-548ff67d099d

Best regards,
--
Vlastimil Babka <vba...@suse.cz>

Vlastimil Babka

unread,
Nov 5, 2025, 4:05:38 AM (2 days ago) Nov 5
to Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com, Vlastimil Babka
When a pfmemalloc allocation actually dips into reserves, the slab is
marked accordingly and non-pfmemalloc allocations should not be allowed
to allocate from it. The sheaves percpu caching currently doesn't follow
this rule, so implement it before we expand sheaves usage to all caches.

Make sure objects from pfmemalloc slabs don't end up in percpu sheaves.
When freeing, skip sheaves when freeing an object from pfmemalloc slab.
When refilling sheaves, use __GFP_NOMEMALLOC to override any pfmemalloc
context - the allocation will fallback to regular slab allocations when
sheaves are depleted and can't be refilled because of the override.

For kfree_rcu(), detect pfmemalloc slabs after processing the rcu_sheaf
after the grace period in __rcu_free_sheaf_prepare() and simply flush
it if any object is from pfmemalloc slabs.

For prefilled sheaves, try to refill them first with __GFP_NOMEMALLOC
and if it fails, retry without __GFP_NOMEMALLOC but then mark the sheaf
pfmemalloc, which makes it flushed back to slabs when returned.

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/slub.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0237a329d4e5..bb744e8044f0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -469,7 +469,10 @@ struct slab_sheaf {
struct rcu_head rcu_head;
struct list_head barn_list;
/* only used for prefilled sheafs */
- unsigned int capacity;
+ struct {
+ unsigned int capacity;
+ bool pfmemalloc;
+ };
};
struct kmem_cache *cache;
unsigned int size;
@@ -2651,7 +2654,7 @@ static struct slab_sheaf *alloc_full_sheaf(struct kmem_cache *s, gfp_t gfp)
if (!sheaf)
return NULL;

- if (refill_sheaf(s, sheaf, gfp)) {
+ if (refill_sheaf(s, sheaf, gfp | __GFP_NOMEMALLOC)) {
free_empty_sheaf(s, sheaf);
return NULL;
}
@@ -2729,12 +2732,13 @@ static void sheaf_flush_unused(struct kmem_cache *s, struct slab_sheaf *sheaf)
sheaf->size = 0;
}

-static void __rcu_free_sheaf_prepare(struct kmem_cache *s,
+static bool __rcu_free_sheaf_prepare(struct kmem_cache *s,
struct slab_sheaf *sheaf)
{
bool init = slab_want_init_on_free(s);
void **p = &sheaf->objects[0];
unsigned int i = 0;
+ bool pfmemalloc = false;

while (i < sheaf->size) {
struct slab *slab = virt_to_slab(p[i]);
@@ -2747,8 +2751,13 @@ static void __rcu_free_sheaf_prepare(struct kmem_cache *s,
continue;
}

+ if (slab_test_pfmemalloc(slab))
+ pfmemalloc = true;
+
i++;
}
+
+ return pfmemalloc;
}

static void rcu_free_sheaf_nobarn(struct rcu_head *head)
@@ -5041,7 +5050,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;

if (empty) {
- if (!refill_sheaf(s, empty, gfp)) {
+ if (!refill_sheaf(s, empty, gfp | __GFP_NOMEMALLOC)) {
full = empty;
} else {
/*
@@ -5341,6 +5350,26 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int nod
}
EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);

+static int __prefill_sheaf_pfmemalloc(struct kmem_cache *s,
+ struct slab_sheaf *sheaf, gfp_t gfp)
+{
+ int ret = 0;
+
+ ret = refill_sheaf(s, sheaf, gfp | __GFP_NOMEMALLOC);
+
+ if (likely(!ret || !gfp_pfmemalloc_allowed(gfp)))
+ return ret;
+
+ /*
+ * if we are allowed to, refill sheaf with pfmemalloc but then remember
+ * it for when it's returned
+ */
+ ret = refill_sheaf(s, sheaf, gfp);
+ sheaf->pfmemalloc = true;
+
+ return ret;
+}
+
/*
* returns a sheaf that has at least the requested size
* when prefilling is needed, do so with given gfp flags
@@ -5375,6 +5404,10 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
sheaf->cache = s;
sheaf->capacity = size;

+ /*
+ * we do not need to care about pfmemalloc here because oversize
+ * sheaves area always flushed and freed when returned
+ */
if (!__kmem_cache_alloc_bulk(s, gfp, size,
&sheaf->objects[0])) {
kfree(sheaf);
@@ -5411,17 +5444,18 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
if (!sheaf)
sheaf = alloc_empty_sheaf(s, gfp);

- if (sheaf && sheaf->size < size) {
- if (refill_sheaf(s, sheaf, gfp)) {
+ if (sheaf) {
+ sheaf->capacity = s->sheaf_capacity;
+ sheaf->pfmemalloc = false;
+
+ if (sheaf->size < size &&
+ __prefill_sheaf_pfmemalloc(s, sheaf, gfp)) {
sheaf_flush_unused(s, sheaf);
free_empty_sheaf(s, sheaf);
sheaf = NULL;
}
}

- if (sheaf)
- sheaf->capacity = s->sheaf_capacity;
-
return sheaf;
}

@@ -5441,7 +5475,8 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
struct slub_percpu_sheaves *pcs;
struct node_barn *barn;

- if (unlikely(sheaf->capacity != s->sheaf_capacity)) {
+ if (unlikely((sheaf->capacity != s->sheaf_capacity)
+ || sheaf->pfmemalloc)) {
sheaf_flush_unused(s, sheaf);
kfree(sheaf);
return;
@@ -5507,7 +5542,7 @@ int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,

if (likely(sheaf->capacity >= size)) {
if (likely(sheaf->capacity == s->sheaf_capacity))
- return refill_sheaf(s, sheaf, gfp);
+ return __prefill_sheaf_pfmemalloc(s, sheaf, gfp);

if (!__kmem_cache_alloc_bulk(s, gfp, sheaf->capacity - sheaf->size,
&sheaf->objects[sheaf->size])) {
@@ -6215,8 +6250,12 @@ static void rcu_free_sheaf(struct rcu_head *head)
* handles it fine. The only downside is that sheaf will serve fewer
* allocations when reused. It only happens due to debugging, which is a
* performance hit anyway.
+ *
+ * If it returns true, there was at least one object from pfmemalloc
+ * slab so simply flush everything.
*/
- __rcu_free_sheaf_prepare(s, sheaf);
+ if (__rcu_free_sheaf_prepare(s, sheaf))
+ goto flush;

n = get_node(s, sheaf->node);
if (!n)
@@ -6371,7 +6410,8 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
continue;
}

- if (unlikely(IS_ENABLED(CONFIG_NUMA) && slab_nid(slab) != node)) {
+ if (unlikely((IS_ENABLED(CONFIG_NUMA) && slab_nid(slab) != node)
+ || slab_test_pfmemalloc(slab))) {
remote_objects[remote_nr] = p[i];
p[i] = p[--size];
if (++remote_nr >= PCS_BATCH_MAX)
@@ -6669,7 +6709,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
return;

if (s->cpu_sheaves && likely(!IS_ENABLED(CONFIG_NUMA) ||
- slab_nid(slab) == numa_mem_id())) {
+ slab_nid(slab) == numa_mem_id())
+ && likely(!slab_test_pfmemalloc(slab))) {
if (likely(free_to_pcs(s, object)))
return;
}

--
2.51.1

Vlastimil Babka

unread,
Nov 5, 2025, 4:05:38 AM (2 days ago) Nov 5
to Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com, Vlastimil Babka
The function is tricky and many of its tests are hard to understand. Try
to improve that by using more descriptively named variables and added
comments.

- rename 'prior' to 'old_head' to match the head and tail parameters
- introduce a 'bool was_full' to make it more obvious what we are
testing instead of the !prior and prior tests
- add or improve comments in various places to explain what we're doing

Also replace kmem_cache_has_cpu_partial() tests with
IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) which are compile-time constants.
We can do that because the kmem_cache_debug(s) case is handled upfront
via free_to_partial_list().

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/slub.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f1a5373eee7b..074abe8e79f8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5859,8 +5859,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
unsigned long addr)

{
- void *prior;
- int was_frozen;
+ void *old_head;
+ bool was_frozen, was_full;
struct slab new;
unsigned long counters;
struct kmem_cache_node *n = NULL;
@@ -5874,20 +5874,37 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
return;
}

+ /*
+ * It is enough to test IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) below
+ * instead of kmem_cache_has_cpu_partial(s), because kmem_cache_debug(s)
+ * is the only other reason it can be false, and it is already handled
+ * above.
+ */
+
do {
if (unlikely(n)) {
spin_unlock_irqrestore(&n->list_lock, flags);
n = NULL;
}
- prior = slab->freelist;
+ old_head = slab->freelist;
counters = slab->counters;
- set_freepointer(s, tail, prior);
+ set_freepointer(s, tail, old_head);
new.counters = counters;
- was_frozen = new.frozen;
+ was_frozen = !!new.frozen;
+ was_full = (old_head == NULL);
new.inuse -= cnt;
- if ((!new.inuse || !prior) && !was_frozen) {
- /* Needs to be taken off a list */
- if (!kmem_cache_has_cpu_partial(s) || prior) {
+ /*
+ * Might need to be taken off (due to becoming empty) or added
+ * to (due to not being full anymore) the partial list.
+ * Unless it's frozen.
+ */
+ if ((!new.inuse || was_full) && !was_frozen) {
+ /*
+ * If slab becomes non-full and we have cpu partial
+ * lists, we put it there unconditionally to avoid
+ * taking the list_lock. Otherwise we need it.
+ */
+ if (!(IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) && was_full)) {

n = get_node(s, slab_nid(slab));
/*
@@ -5905,7 +5922,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
}

} while (!slab_update_freelist(s, slab,
- prior, counters,
+ old_head, counters,
head, new.counters,
"__slab_free"));

@@ -5917,7 +5934,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
* activity can be necessary.
*/
stat(s, FREE_FROZEN);
- } else if (kmem_cache_has_cpu_partial(s) && !prior) {
+ } else if (IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) && was_full) {
/*
* If we started with a full slab then put it onto the
* per cpu partial list.
@@ -5926,6 +5943,11 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
stat(s, CPU_PARTIAL_FREE);
}

+ /*
+ * In other cases we didn't take the list_lock because the slab
+ * was already on the partial list and will remain there.
+ */
+
return;
}

@@ -5933,19 +5955,24 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
* This slab was partially empty but not on the per-node partial list,
* in which case we shouldn't manipulate its list, just return.
*/
- if (prior && !on_node_partial) {
+ if (!was_full && !on_node_partial) {
spin_unlock_irqrestore(&n->list_lock, flags);
return;
}

+ /*
+ * If slab became empty, should we add/keep it on the partial list or we
+ * have enough?
+ */
if (unlikely(!new.inuse && n->nr_partial >= s->min_partial))
goto slab_empty;

/*
* Objects left in the slab. If it was not on the partial list before
- * then add it.
+ * then add it. This can only happen when cache has no per cpu partial
+ * list otherwise we would have put it there.
*/
- if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
+ if (!IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) && unlikely(was_full)) {
add_partial(n, slab, DEACTIVATE_TO_TAIL);
stat(s, FREE_ADD_PARTIAL);
}
@@ -5953,10 +5980,11 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
return;

slab_empty:
- if (prior) {
- /*
- * Slab on the partial list.
- */
+ /*
+ * The slab could have a single object and thus go from full to empty in
+ * a single free, but more likely it was on the partial list. Remove it.
+ */
+ if (likely(!was_full)) {
remove_partial(n, slab);
stat(s, FREE_REMOVE_PARTIAL);
}

--
2.51.1

Vlastimil Babka

unread,
Nov 5, 2025, 4:05:41 AM (2 days ago) Nov 5
to Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com, Vlastimil Babka
We want to expand usage of sheaves to all non-boot caches, including
kmalloc caches. Since sheaves themselves are also allocated by
kmalloc(), we need to prevent excessive or infinite recursion -
depending on sheaf size, the sheaf can be allocated from smaller, same
or larger kmalloc size bucket, there's no particular constraint.

This is similar to allocating the objext arrays so let's just reuse the
existing mechanisms for those. __GFP_NO_OBJ_EXT in alloc_empty_sheaf()
will prevent a nested kmalloc() from allocating a sheaf itself - it will
either have sheaves already, or fallback to a non-sheaf-cached
allocation (so bootstrap of sheaves in a kmalloc cache that allocates
sheaves from its own size bucket is possible). Additionally, reuse
OBJCGS_CLEAR_MASK to clear unwanted gfp flags from the nested
allocation.

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
include/linux/gfp_types.h | 6 ------
mm/slub.c | 36 ++++++++++++++++++++++++++----------
2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 65db9349f905..3de43b12209e 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -55,9 +55,7 @@ enum {
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
#endif
-#ifdef CONFIG_SLAB_OBJ_EXT
___GFP_NO_OBJ_EXT_BIT,
-#endif
___GFP_LAST_BIT
};

@@ -98,11 +96,7 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
-#ifdef CONFIG_SLAB_OBJ_EXT
#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)
-#else
-#define ___GFP_NO_OBJ_EXT 0
-#endif

/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
diff --git a/mm/slub.c b/mm/slub.c
index a7c6d79154f8..f729c208965b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2031,6 +2031,14 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
}
#endif /* CONFIG_SLUB_DEBUG */

+/*
+ * The allocated objcg pointers array is not accounted directly.
+ * Moreover, it should not come from DMA buffer and is not readily
+ * reclaimable. So those GFP bits should be masked off.
+ */
+#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
+ __GFP_ACCOUNT | __GFP_NOFAIL)
+
#ifdef CONFIG_SLAB_OBJ_EXT

#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
@@ -2081,14 +2089,6 @@ static inline void handle_failed_objexts_alloc(unsigned long obj_exts,

#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

-/*
- * The allocated objcg pointers array is not accounted directly.
- * Moreover, it should not come from DMA buffer and is not readily
- * reclaimable. So those GFP bits should be masked off.
- */
-#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
- __GFP_ACCOUNT | __GFP_NOFAIL)
-
static inline void init_slab_obj_exts(struct slab *slab)
{
slab->obj_exts = 0;
@@ -2596,8 +2596,24 @@ static void *setup_object(struct kmem_cache *s, void *object)

static struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp)
{
- struct slab_sheaf *sheaf = kzalloc(struct_size(sheaf, objects,
- s->sheaf_capacity), gfp);
+ struct slab_sheaf *sheaf;
+ size_t sheaf_size;
+
+ if (gfp & __GFP_NO_OBJ_EXT)
+ return NULL;
+
+ gfp &= ~OBJCGS_CLEAR_MASK;
+
+ /*
+ * Prevent recursion to the same cache, or a deep stack of kmallocs of
+ * varying sizes (sheaf capacity might differ for each kmalloc size
+ * bucket)
+ */
+ if (s->flags & SLAB_KMALLOC)
+ gfp |= __GFP_NO_OBJ_EXT;
+
+ sheaf_size = struct_size(sheaf, objects, s->sheaf_capacity);
+ sheaf = kzalloc(sheaf_size, gfp);

if (unlikely(!sheaf))
return NULL;

--
2.51.1

Vlastimil Babka

unread,
Nov 5, 2025, 4:05:41 AM (2 days ago) Nov 5
to Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com, Vlastimil Babka
CONFIG_SLUB_TINY minimizes the SLUB's memory overhead in multiple ways,
mainly by avoiding percpu caching of slabs and objects. It also reduces
code size by replacing some code paths with simplified ones through
ifdefs, but the benefits of that are smaller and would complicate the
upcoming changes.

Thus remove these code paths and associated ifdefs and simplify the code
base.

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/slab.h | 2 --
mm/slub.c | 107 +++-----------------------------------------------------------
2 files changed, 4 insertions(+), 105 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 078daecc7cf5..f7b8df56727d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -236,10 +236,8 @@ struct kmem_cache_order_objects {
* Slab cache management.
*/
struct kmem_cache {
-#ifndef CONFIG_SLUB_TINY
struct kmem_cache_cpu __percpu *cpu_slab;
struct lock_class_key lock_key;
-#endif
struct slub_percpu_sheaves __percpu *cpu_sheaves;
/* Used for retrieving partial slabs, etc. */
slab_flags_t flags;
diff --git a/mm/slub.c b/mm/slub.c
index bb744e8044f0..a7c6d79154f8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -410,7 +410,6 @@ enum stat_item {
NR_SLUB_STAT_ITEMS
};

-#ifndef CONFIG_SLUB_TINY
/*
* When changing the layout, make sure freelist and tid are still compatible
* with this_cpu_cmpxchg_double() alignment requirements.
@@ -432,7 +431,6 @@ struct kmem_cache_cpu {
unsigned int stat[NR_SLUB_STAT_ITEMS];
#endif
};
-#endif /* CONFIG_SLUB_TINY */

static inline void stat(const struct kmem_cache *s, enum stat_item si)
{
@@ -597,12 +595,10 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
return freelist_ptr_decode(s, p, ptr_addr);
}

-#ifndef CONFIG_SLUB_TINY
static void prefetch_freepointer(const struct kmem_cache *s, void *object)
{
prefetchw(object + s->offset);
}
-#endif

/*
* When running under KMSAN, get_freepointer_safe() may return an uninitialized
@@ -714,10 +710,12 @@ static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s)
return s->cpu_partial_slabs;
}
#else
+#ifdef SLAB_SUPPORTS_SYSFS
static inline void
slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
{
}
+#endif

static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s)
{
@@ -2026,13 +2024,11 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node,
int objects) {}
static inline void dec_slabs_node(struct kmem_cache *s, int node,
int objects) {}
-#ifndef CONFIG_SLUB_TINY
static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
void **freelist, void *nextfree)
{
return false;
}
-#endif
#endif /* CONFIG_SLUB_DEBUG */

#ifdef CONFIG_SLAB_OBJ_EXT
@@ -3623,8 +3619,6 @@ static struct slab *get_partial(struct kmem_cache *s, int node,
return get_any_partial(s, pc);
}

-#ifndef CONFIG_SLUB_TINY
-
#ifdef CONFIG_PREEMPTION
/*
* Calculate the next globally unique transaction for disambiguation
@@ -4024,12 +4018,6 @@ static bool has_cpu_slab(int cpu, struct kmem_cache *s)
return c->slab || slub_percpu_partial(c);
}

-#else /* CONFIG_SLUB_TINY */
-static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { }
-static inline bool has_cpu_slab(int cpu, struct kmem_cache *s) { return false; }
-static inline void flush_this_cpu_slab(struct kmem_cache *s) { }
-#endif /* CONFIG_SLUB_TINY */
-
static bool has_pcs_used(int cpu, struct kmem_cache *s)
{
struct slub_percpu_sheaves *pcs;
@@ -4370,7 +4358,6 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags)
return true;
}

-#ifndef CONFIG_SLUB_TINY
static inline bool
__update_cpu_freelist_fast(struct kmem_cache *s,
void *freelist_old, void *freelist_new,
@@ -4634,7 +4621,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
pc.orig_size = orig_size;
slab = get_partial(s, node, &pc);
if (slab) {
- if (kmem_cache_debug(s)) {
+ if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
freelist = pc.object;
/*
* For debug caches here we had to go through
@@ -4672,7 +4659,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,

stat(s, ALLOC_SLAB);

- if (kmem_cache_debug(s)) {
+ if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
freelist = alloc_single_from_new_slab(s, slab, orig_size, gfpflags);

if (unlikely(!freelist)) {
@@ -4884,32 +4871,6 @@ static __always_inline void *__slab_alloc_node(struct kmem_cache *s,

return object;
}
-#else /* CONFIG_SLUB_TINY */
-static void *__slab_alloc_node(struct kmem_cache *s,
- gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
-{
- struct partial_context pc;
- struct slab *slab;
- void *object;
-
- pc.flags = gfpflags;
- pc.orig_size = orig_size;
- slab = get_partial(s, node, &pc);
-
- if (slab)
- return pc.object;
-
- slab = new_slab(s, gfpflags, node);
- if (unlikely(!slab)) {
- slab_out_of_memory(s, gfpflags, node);
- return NULL;
- }
-
- object = alloc_single_from_new_slab(s, slab, orig_size, gfpflags);
-
- return object;
-}
-#endif /* CONFIG_SLUB_TINY */

/*
* If the object has been wiped upon free, make sure it's fully initialized by
@@ -5760,9 +5721,7 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
* it did local_lock_irqsave(&s->cpu_slab->lock, flags).
* In this case fast path with __update_cpu_freelist_fast() is not safe.
*/
-#ifndef CONFIG_SLUB_TINY
if (!in_nmi() || !local_lock_is_locked(&s->cpu_slab->lock))
-#endif
ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, size);

if (PTR_ERR(ret) == -EBUSY) {
@@ -6553,14 +6512,10 @@ static void free_deferred_objects(struct irq_work *work)
llist_for_each_safe(pos, t, llnode) {
struct slab *slab = container_of(pos, struct slab, llnode);

-#ifdef CONFIG_SLUB_TINY
- free_slab(slab->slab_cache, slab);
-#else
if (slab->frozen)
deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
else
free_slab(slab->slab_cache, slab);
-#endif
}
}

@@ -6596,7 +6551,6 @@ void defer_free_barrier(void)
irq_work_sync(&per_cpu_ptr(&defer_free_objects, cpu)->work);
}

-#ifndef CONFIG_SLUB_TINY
/*
* Fastpath with forced inlining to produce a kfree and kmem_cache_free that
* can perform fastpath freeing without additional function calls.
@@ -6689,14 +6643,6 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
}
stat_add(s, FREE_FASTPATH, cnt);
}
-#else /* CONFIG_SLUB_TINY */
-static void do_slab_free(struct kmem_cache *s,
- struct slab *slab, void *head, void *tail,
- int cnt, unsigned long addr)
-{
- __slab_free(s, slab, head, tail, cnt, addr);
-}
-#endif /* CONFIG_SLUB_TINY */

static __fastpath_inline
void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
@@ -6974,11 +6920,7 @@ void kfree_nolock(const void *object)
* since kasan quarantine takes locks and not supported from NMI.
*/
kasan_slab_free(s, x, false, false, /* skip quarantine */true);
-#ifndef CONFIG_SLUB_TINY
do_slab_free(s, slab, x, x, 0, _RET_IP_);
-#else
- defer_free(s, x);
-#endif
}
EXPORT_SYMBOL_GPL(kfree_nolock);

@@ -7428,7 +7370,6 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
}
EXPORT_SYMBOL(kmem_cache_free_bulk);

-#ifndef CONFIG_SLUB_TINY
static inline
int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
void **p)
@@ -7493,35 +7434,6 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
return 0;

}
-#else /* CONFIG_SLUB_TINY */
-static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
- size_t size, void **p)
-{
- int i;
-
- for (i = 0; i < size; i++) {
- void *object = kfence_alloc(s, s->object_size, flags);
-
- if (unlikely(object)) {
- p[i] = object;
- continue;
- }
-
- p[i] = __slab_alloc_node(s, flags, NUMA_NO_NODE,
- _RET_IP_, s->object_size);
- if (unlikely(!p[i]))
- goto error;
-
- maybe_wipe_obj_freeptr(s, p[i]);
- }
-
- return i;
-
-error:
- __kmem_cache_free_bulk(s, i, p);
- return 0;
-}
-#endif /* CONFIG_SLUB_TINY */

/* Note that interrupts must be enabled when calling this function. */
int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size,
@@ -7740,7 +7652,6 @@ init_kmem_cache_node(struct kmem_cache_node *n, struct node_barn *barn)
barn_init(barn);
}

-#ifndef CONFIG_SLUB_TINY
static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
{
BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE <
@@ -7761,12 +7672,6 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)

return 1;
}
-#else
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
-{
- return 1;
-}
-#endif /* CONFIG_SLUB_TINY */

static int init_percpu_sheaves(struct kmem_cache *s)
{
@@ -7856,13 +7761,11 @@ void __kmem_cache_release(struct kmem_cache *s)
cache_random_seq_destroy(s);
if (s->cpu_sheaves)
pcs_destroy(s);
-#ifndef CONFIG_SLUB_TINY
#ifdef CONFIG_PREEMPT_RT
if (s->cpu_slab)
lockdep_unregister_key(&s->lock_key);
#endif
free_percpu(s->cpu_slab);
-#endif
free_kmem_cache_nodes(s);
}

@@ -8605,10 +8508,8 @@ void __init kmem_cache_init(void)

void __init kmem_cache_init_late(void)
{
-#ifndef CONFIG_SLUB_TINY
flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
WARN_ON(!flushwq);
-#endif
}

struct kmem_cache *

--
2.51.1

Harry Yoo

unread,
Nov 6, 2025, 3:26:37 AM (yesterday) Nov 6
to Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com
On Wed, Nov 05, 2025 at 10:05:29AM +0100, Vlastimil Babka wrote:
> The function is tricky and many of its tests are hard to understand. Try
> to improve that by using more descriptively named variables and added
> comments.
>
> - rename 'prior' to 'old_head' to match the head and tail parameters
> - introduce a 'bool was_full' to make it more obvious what we are
> testing instead of the !prior and prior tests

Yeah I recall these were cryptic when I was analyzing slab few years
ago :)

> - add or improve comments in various places to explain what we're doing
>
> Also replace kmem_cache_has_cpu_partial() tests with
> IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) which are compile-time constants.
>
> We can do that because the kmem_cache_debug(s) case is handled upfront
> via free_to_partial_list().

This makes sense. By the way, should we also check IS_ENABLED(CONFIG_SLUB_TINY)
in kmem_cache_has_cpu_partial()?

> Signed-off-by: Vlastimil Babka <vba...@suse.cz>
> ---

The code is much cleaner!

Reviewed-by: Harry Yoo <harr...@oracle.com>

--
Cheers,
Harry / Hyeonggon

Vlastimil Babka

unread,
Nov 6, 2025, 3:43:28 AM (yesterday) Nov 6
to Harry Yoo, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com
On 11/6/25 09:26, Harry Yoo wrote:
> On Wed, Nov 05, 2025 at 10:05:29AM +0100, Vlastimil Babka wrote:
>> The function is tricky and many of its tests are hard to understand. Try
>> to improve that by using more descriptively named variables and added
>> comments.
>>
>> - rename 'prior' to 'old_head' to match the head and tail parameters
>> - introduce a 'bool was_full' to make it more obvious what we are
>> testing instead of the !prior and prior tests
>
> Yeah I recall these were cryptic when I was analyzing slab few years
> ago :)
>
>> - add or improve comments in various places to explain what we're doing
>>
>> Also replace kmem_cache_has_cpu_partial() tests with
>> IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) which are compile-time constants.
>>
>> We can do that because the kmem_cache_debug(s) case is handled upfront
>> via free_to_partial_list().
>
> This makes sense. By the way, should we also check IS_ENABLED(CONFIG_SLUB_TINY)
> in kmem_cache_has_cpu_partial()?

If you really mean testing CONFIG_SLUB_TINY then it's not necessary because
CONFIG_SLUB_CPU_PARTIAL depends on !TINY.
If you mean using IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) instead of the #ifdef,
that could be possible, just out of scope here. And hopefully will be gone
fully, so no point in polishing at this point. Unlike __slab_free() which stays.

Harry Yoo

unread,
Nov 6, 2025, 8:48:50 PM (11 hours ago) Nov 6
to Vlastimil Babka, Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin, Liam R. Howlett, Suren Baghdasaryan, Alexei Starovoitov, linu...@kvack.org, linux-...@vger.kernel.org, b...@vger.kernel.org, kasa...@googlegroups.com
On Thu, Nov 06, 2025 at 09:43:24AM +0100, Vlastimil Babka wrote:
> On 11/6/25 09:26, Harry Yoo wrote:
> > On Wed, Nov 05, 2025 at 10:05:29AM +0100, Vlastimil Babka wrote:
> >> The function is tricky and many of its tests are hard to understand. Try
> >> to improve that by using more descriptively named variables and added
> >> comments.
> >>
> >> - rename 'prior' to 'old_head' to match the head and tail parameters
> >> - introduce a 'bool was_full' to make it more obvious what we are
> >> testing instead of the !prior and prior tests
> >
> > Yeah I recall these were cryptic when I was analyzing slab few years
> > ago :)
> >
> >> - add or improve comments in various places to explain what we're doing
> >>
> >> Also replace kmem_cache_has_cpu_partial() tests with
> >> IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) which are compile-time constants.
> >>
> >> We can do that because the kmem_cache_debug(s) case is handled upfront
> >> via free_to_partial_list().
> >
> > This makes sense. By the way, should we also check IS_ENABLED(CONFIG_SLUB_TINY)
> > in kmem_cache_has_cpu_partial()?
>
> If you really mean testing CONFIG_SLUB_TINY then it's not necessary because
> CONFIG_SLUB_CPU_PARTIAL depends on !TINY.

I really meant this and yeah I missed that!

> If you mean using IS_ENABLED(CONFIG_SLUB_CPU_PARTIAL) instead of the #ifdef,
> that could be possible, just out of scope here. And hopefully will be gone
> fully, so no point in polishing at this point. Unlike __slab_free() which stays.

Agreed.

> >> Signed-off-by: Vlastimil Babka <vba...@suse.cz>
> >> ---
> >
> > The code is much cleaner!
> >
> > Reviewed-by: Harry Yoo <harr...@oracle.com>

Reply all
Reply to author
Forward
0 new messages