[PATCH] mm/slab_common: fix possiable double free of kmem_cache

2 views
Skip to first unread message

Feng Tang

unread,
Sep 18, 2022, 11:13:03 PM9/18/22
to Vlastimil Babka, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Feng Tang
When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
kunit test case cause a use-after-free error:

BUG: KASAN: use-after-free in kobject_del+0x14/0x30
Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261

CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
print_address_description.constprop.0+0x87/0x2a5
print_report+0x103/0x1ed
kasan_report+0xb7/0x140
kobject_del+0x14/0x30
kmem_cache_destroy+0x130/0x170
test_exit+0x1a/0x30
kunit_try_run_case+0xad/0xc0
kunit_generic_run_threadfn_adapter+0x26/0x50
kthread+0x17b/0x1b0
</TASK>

The cause is inside kmem_cache_destroy():

kmem_cache_destroy
acquire lock/mutex
shutdown_cache
schedule_work(kmem_cache_release) (if RCU flag set)
release lock/mutex
kmem_cache_release (if RCU flag set)

in some certain timing, the scheduled work could be run before
the next RCU flag checking which will get a wrong state.

Fix it by caching the RCU flag inside protected area, just like 'refcnt'

Signed-off-by: Feng Tang <feng...@intel.com>
---

note:

The error only happens on linux-next tree, and not in Linus' tree,
which already has Waiman's commit:
0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
without holding slab_mutex/cpu_hotplug_lock")

mm/slab_common.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 07b948288f84..ccc02573588f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
void kmem_cache_destroy(struct kmem_cache *s)
{
int refcnt;
+ bool rcu_set;

if (unlikely(!s) || !kasan_check_byte(s))
return;
@@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
cpus_read_lock();
mutex_lock(&slab_mutex);

+ rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
+
refcnt = --s->refcount;
if (refcnt)
goto out_unlock;
@@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
out_unlock:
mutex_unlock(&slab_mutex);
cpus_read_unlock();
- if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
+ if (!refcnt && !rcu_set)
kmem_cache_release(s);
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
2.34.1

Vlastimil Babka

unread,
Sep 19, 2022, 5:12:40 AM9/19/22
to Feng Tang, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
^ not set

I've fixed that up.

>
> in some certain timing, the scheduled work could be run before
> the next RCU flag checking which will get a wrong state.
>
> Fix it by caching the RCU flag inside protected area, just like 'refcnt'
>
> Signed-off-by: Feng Tang <feng...@intel.com>

Thanks!

> ---
>
> note:
>
> The error only happens on linux-next tree, and not in Linus' tree,
> which already has Waiman's commit:
> 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> without holding slab_mutex/cpu_hotplug_lock")

Actually that commit is already in Linus' rc5 too, so I will send your fix
this week too. Added a Fixes: 0495e337b703 (...) too.

Hyeonggon Yoo

unread,
Sep 19, 2022, 7:56:37 AM9/19/22
to Vlastimil Babka, Feng Tang, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
> On 9/19/22 05:12, Feng Tang wrote:
> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> > kunit test case cause a use-after-free error:
> >

If I'm not mistaken, I think the subject should be:
s/double free/use after free/g
Very nice catch, thanks!

Otherwise (and with Vlastimil's fix):

Looks good to me.
Reviewed-by: Hyeonggon Yoo <42.h...@gmail.com>
--
Thanks,
Hyeonggon

Vlastimil Babka

unread,
Sep 19, 2022, 8:03:18 AM9/19/22
to Hyeonggon Yoo, Feng Tang, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On 9/19/22 13:56, Hyeonggon Yoo wrote:
> On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
>> On 9/19/22 05:12, Feng Tang wrote:
>> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
>> > kunit test case cause a use-after-free error:
>> >
>
> If I'm not mistaken, I think the subject should be:
> s/double free/use after free/g

Well, it's both AFAICS. By the initial use-after-free we can read a wrong
s->flags that was modified since we freed for the first time, and it can
lead to another kmem_cache_release() which is basically a double free.

Hyeonggon Yoo

unread,
Sep 19, 2022, 8:07:48 AM9/19/22
to Vlastimil Babka, Feng Tang, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On Mon, Sep 19, 2022 at 02:03:15PM +0200, Vlastimil Babka wrote:
> On 9/19/22 13:56, Hyeonggon Yoo wrote:
> > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
> >> On 9/19/22 05:12, Feng Tang wrote:
> >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> >> > kunit test case cause a use-after-free error:
> >> >
> >
> > If I'm not mistaken, I think the subject should be:
> > s/double free/use after free/g
>
> Well, it's both AFAICS. By the initial use-after-free we can read a wrong
> s->flags that was modified since we freed for the first time, and it can
> lead to another kmem_cache_release() which is basically a double free.
>

Yeah, I realized that just after sending the mail ;)
it is use-after-free bug that can potentially lead to double free.

Thank you for correction!
--
Thanks,
Hyeonggon

Feng Tang

unread,
Sep 19, 2022, 8:51:08 AM9/19/22
to Vlastimil Babka, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Andrew Morton, Waiman Long, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On Mon, Sep 19, 2022 at 05:12:38PM +0800, Vlastimil Babka wrote:
> On 9/19/22 05:12, Feng Tang wrote:
[...]
> > The cause is inside kmem_cache_destroy():
> >
> > kmem_cache_destroy
> > acquire lock/mutex
> > shutdown_cache
> > schedule_work(kmem_cache_release) (if RCU flag set)
> > release lock/mutex
> > kmem_cache_release (if RCU flag set)
>
> ^ not set
>
> I've fixed that up.

Oops.. Thanks for catching it!

> >
> > in some certain timing, the scheduled work could be run before
> > the next RCU flag checking which will get a wrong state.
> >
> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
> >
> > Signed-off-by: Feng Tang <feng...@intel.com>
>
> Thanks!
>
> > ---
> >
> > note:
> >
> > The error only happens on linux-next tree, and not in Linus' tree,
> > which already has Waiman's commit:
> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> > without holding slab_mutex/cpu_hotplug_lock")
>
> Actually that commit is already in Linus' rc5 too, so I will send your fix
> this week too. Added a Fixes: 0495e337b703 (...) too.

Got it, thanks

- Feng

Waiman Long

unread,
Sep 19, 2022, 9:34:16 AM9/19/22
to Feng Tang, Vlastimil Babka, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Andrew Morton, linu...@kvack.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Thanks for catching this bug.

Reviewed-by: Waiman Long <lon...@redhat.com>

Cheers,
Longman


Reply all
Reply to author
Forward
0 new messages