[Bug Report] Overflow Allocation Test failure

11 views
Skip to first unread message

Nico Pache

unread,
Jul 23, 2024, 1:52:46 AMJul 23
to KUnit Development, Kees Cook, David Gow
Hi Kees and David,

The overflow kunit test started failing in Fedora recently. The change
that seems to be causing the issue is 837018388e18 ("overflow: Replace
fake root_device with kunit_device"). Before this the test was
passing. I've managed to capture a KASAN splat relating to this issue.

It seems like before the $commit, the device would do its own
unregistering. Now that the kunit framework is handling the device
cleanup it borks somewhere along the way. I tried tracking down the
issue, but couldn't spot the issue.

My guess is that we need to properly hold a reference to the driver
resource, until the device is cleaned up, or vice versa. But i'm sure
David has a better idea of what might be happening here.

The splat below is from a RHEL kernel, but we're seeing this issue on
the latest upstream kernel as well.

--Nico

...

[ 1142.531794] ok 16 shift_nonsense_test
[ 1142.542786] # overflow_allocation_test: 11 allocation overflow
tests finished
[ 1142.556905] ==================================================================
[ 1142.556922] BUG: KASAN: stack-out-of-bounds in string+0x2a0/0x320
[ 1142.557016] Read of size 1 at addr ffffc90002f0fdb8 by task
kunit_try_catch/43761

[ 1142.557042] CPU: 0 PID: 43761 Comm: kunit_try_catch Kdump: loaded
Tainted: G B W N------- ---
5.14.0-482.kunit.el9.x86_64+debug #1
[ 1142.557051] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1142.557062] Call Trace:
[ 1142.557090] <TASK>
[ 1142.557093] ? string+0x2a0/0x320
[ 1142.557101] dump_stack_lvl+0x57/0x81
[ 1142.557164] print_address_description.constprop.0+0x8b/0x2ed
[ 1142.557183] ? string+0x2a0/0x320
[ 1142.557186] print_report+0x132/0x21c
[ 1142.557197] ? kasan_addr_to_slab+0x9/0xa0
[ 1142.557230] ? string+0x2a0/0x320
[ 1142.557233] kasan_report+0x91/0xc0
[ 1142.557239] ? string+0x2a0/0x320
[ 1142.557247] string+0x2a0/0x320
[ 1142.557252] ? __pfx_string+0x10/0x10
[ 1142.557256] ? lock_acquire+0x508/0x660
[ 1142.557300] vsnprintf+0x7e9/0x15e0
[ 1142.557306] ? __pfx_lock_acquire+0x10/0x10
[ 1142.557311] ? __pfx_vsnprintf+0x10/0x10
[ 1142.557316] ? trace_irq_enable.constprop.0+0x14f/0x1e0
[ 1142.557349] kvasprintf+0x92/0x130
[ 1142.557362] ? __pfx_kvasprintf+0x10/0x10
[ 1142.557369] ? slab_free_freelist_hook+0x11d/0x1d0
[ 1142.557399] kasprintf+0xa6/0xe0
[ 1142.557405] ? __pfx_kasprintf+0x10/0x10
[ 1142.557411] ? __up_write+0x1a2/0x510
[ 1142.557414] ? kernfs_remove_by_name_ns+0xb3/0x100
[ 1142.557438] ? lock_release+0x285/0x380
[ 1142.557448] module_remove_driver+0x123/0x250
[ 1142.557490] bus_remove_driver+0x122/0x2a0
[ 1142.557512] kunit_remove_resource+0x195/0x290 [kunit]
[ 1142.557559] ? trace_irq_enable.constprop.0+0x14f/0x1e0
[ 1142.557566] kunit_cleanup+0x78/0x120 [kunit]
[ 1142.557581] ? __kthread_parkme+0xc7/0x200
[ 1142.557597] ? __pfx_kunit_try_run_case_cleanup+0x10/0x10 [kunit]
[ 1142.557613] kunit_generic_run_threadfn_adapter+0x79/0xe0 [kunit]
[ 1142.557630] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit]
[ 1142.557646] kthread+0x2ab/0x360
[ 1142.557649] ? trace_irq_enable.constprop.0+0x14f/0x1e0
[ 1142.557653] ? __pfx_kthread+0x10/0x10
[ 1142.557658] ret_from_fork+0x29/0x50
[ 1142.557694] </TASK>

[ 1142.557708] The buggy address belongs to the virtual mapping at
[ffffc90002f08000, ffffc90002f11000) created by:
dup_task_struct+0x5e/0x570

[ 1142.557726] The buggy address belongs to the physical page:
[ 1142.557733] page:000000002e8bfeb1 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x12023a
[ 1142.557755] memcg:ffff8881008c2382
[ 1142.557757] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1142.557765] page_type: 0xffffffff()
[ 1142.557773] raw: 0017ffffc0000000 0000000000000000 dead000000000122
0000000000000000
[ 1142.557781] raw: 0000000000000000 0000000000000000 00000001ffffffff
ffff8881008c2382
[ 1142.557783] page dumped because: kasan: bad access detected

[ 1142.557786] Memory state around the buggy address:
[ 1142.557788] ffffc90002f0fc80: f3 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 1142.557794] ffffc90002f0fd00: 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1
00 f2 f2 f2
[ 1142.557796] >ffffc90002f0fd80: 00 00 00 00 00 f3 f3 f3 f3 f3 00 00
00 00 00 00
[ 1142.557798] ^
[ 1142.557800] ffffc90002f0fe00: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 f1 f1
[ 1142.557803] ffffc90002f0fe80: f1 f1 00 00 f2 f2 00 00 f3 f3 00 00
00 00 00 00
[ 1142.557805] ==================================================================
[ 1142.565156] not ok 17 overflow_allocation_test
[ 1142.566180] # overflow_size_helpers_test: 43 overflow size
helper tests finished
[ 1142.566625] ok 18 overflow_size_helpers_test
[ 1142.566646] # overflow: pass:17 fail:1 skip:0 total:18
[ 1142.566673] # Totals: pass:17 fail:1 skip:0 total:18
[ 1142.566692] not ok 1 overflow

Kees Cook

unread,
Jul 23, 2024, 4:37:54 PMJul 23
to Nico Pache, David Gow, KUnit Development
On Mon, Jul 22, 2024 at 11:52:16PM -0600, Nico Pache wrote:
> Hi Kees and David,
>
> The overflow kunit test started failing in Fedora recently. The change
> that seems to be causing the issue is 837018388e18 ("overflow: Replace
> fake root_device with kunit_device"). Before this the test was
> passing. I've managed to capture a KASAN splat relating to this issue.
>
> It seems like before the $commit, the device would do its own
> unregistering. Now that the kunit framework is handling the device
> cleanup it borks somewhere along the way. I tried tracking down the
> issue, but couldn't spot the issue.
>
> My guess is that we need to properly hold a reference to the driver
> resource, until the device is cleaned up, or vice versa. But i'm sure
> David has a better idea of what might be happening here.

So far, I can't find it either. Is KUnit trying to remove the driver
before the device maybe?

> [ 1142.557300] vsnprintf+0x7e9/0x15e0
> [ 1142.557349] kvasprintf+0x92/0x130
> [ 1142.557399] kasprintf+0xa6/0xe0
> [ 1142.557448] module_remove_driver+0x123/0x250
> [ 1142.557490] bus_remove_driver+0x122/0x2a0
> [ 1142.557512] kunit_remove_resource+0x195/0x290 [kunit]
> [ 1142.557566] kunit_cleanup+0x78/0x120 [kunit]

The only kasprintf() I see in module_remove_driver() is:

driver_name = make_driver_name(drv);

static char *make_driver_name(struct device_driver *drv)
{
char *driver_name;

driver_name = kasprintf(GFP_KERNEL, "%s:%s", drv->bus->name, drv->name);
if (!driver_name)
return NULL;

return driver_name;
}

But I still can't find a stored stack variable.

--
Kees Cook

David Gow

unread,
Jul 27, 2024, 3:07:07 AMJul 27
to Nico Pache, KUnit Development, Kees Cook, Rae Moar
On Tue, 23 Jul 2024 at 13:52, Nico Pache <npa...@redhat.com> wrote:
>
> Hi Kees and David,
>
> The overflow kunit test started failing in Fedora recently. The change
> that seems to be causing the issue is 837018388e18 ("overflow: Replace
> fake root_device with kunit_device"). Before this the test was
> passing. I've managed to capture a KASAN splat relating to this issue.
>
> It seems like before the $commit, the device would do its own
> unregistering. Now that the kunit framework is handling the device
> cleanup it borks somewhere along the way. I tried tracking down the
> issue, but couldn't spot the issue.
>
> My guess is that we need to properly hold a reference to the driver
> resource, until the device is cleaned up, or vice versa. But i'm sure
> David has a better idea of what might be happening here.
>
> The splat below is from a RHEL kernel, but we're seeing this issue on
> the latest upstream kernel as well.
>
> --Nico
>

Thanks, Nico. It took a while to reproduce this (it only seems to
happen here when the test is built as a module), but I think I've
worked it out.

The problem is the driver name (which for KUnit-managed drivers is the
same as the device name). For the overflow test, this seems to be on
the stack, so it's no-longer valid by the time the test has finished
and cleanup is occurring.

I think the correct solution here is to update the KUnit device
wrappers to strdup() the name and manage it themselves, rather than
forcing it to be statically allocated (or leaked).

The following patch seems to fix it here:
---
From 0e1d041ee95f4503927091d70c613cc3ad032034 Mon Sep 17 00:00:00 2001
From: David Gow <davi...@google.com>
Date: Sat, 27 Jul 2024 14:40:46 +0800
Subject: [RFC PATCH] kunit: Device wrappers should also manage driver name

kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.

Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one.

This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.

Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npa...@redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Signed-off-by: David Gow <davi...@google.com>
---
include/kunit/test.h | 24 ++++++++++++++++++++++++
lib/kunit/device.c | 7 +++++--
2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/kunit/test.h b/include/kunit/test.h
index e2a1f0928e8b..ddbff1f4ed1d 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -480,6 +480,30 @@ static inline void *kunit_kcalloc(struct kunit
*test, size_t n, size_t size, gfp
return kunit_kmalloc_array(test, n, size, gfp | __GFP_ZERO);
}

+/**
+ * kunit_kstrdup() - Duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kstrdup() and kunit_kmalloc_array() for mor information.
+ */
+static inline void *kunit_kstrdup(struct kunit *test, const char
*str, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (!str)
+ return NULL;
+
+ len = strlen(str) + 1;
+ buf = kunit_kmalloc(test, len, gfp);
+ if (buf)
+ memcpy(buf, str, len);
+ return buf;
+}
+
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
* @test: The test context object.
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 25c81ed465fb..54fac5873672 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -89,7 +89,7 @@ struct device_driver *kunit_driver_create(struct
kunit *test, const char *name)
if (!driver)
return ERR_PTR(err);

- driver->name = name;
+ driver->name = kunit_kstrdup(test, name, GFP_KERNEL);
driver->bus = &kunit_bus_type;
driver->owner = THIS_MODULE;

@@ -192,8 +192,11 @@ void kunit_device_unregister(struct kunit *test,
struct device *dev)
const struct device_driver *driver = to_kunit_device(dev)->driver;

kunit_release_action(test, device_unregister_wrapper, dev);
- if (driver)
+ if (driver) {
+ const char *driver_name = driver->name;
kunit_release_action(test, driver_unregister_wrapper, (void *)driver);
+ kunit_kfree(test, driver_name);
+ }
}
EXPORT_SYMBOL_GPL(kunit_device_unregister);

--
2.46.0.rc1.232.g9752f9e123-goog

---

(Let me know if gmail has broken the formatting there...)

I may try to do something akin to kstrdup_const() to optimize this for
the common name-is-constant case.

Cheers,
-- David

Kees Cook

unread,
Jul 29, 2024, 8:45:50 PMJul 29
to David Gow, Nico Pache, KUnit Development, Rae Moar
On Sat, Jul 27, 2024 at 03:06:51PM +0800, David Gow wrote:
> On Tue, 23 Jul 2024 at 13:52, Nico Pache <npa...@redhat.com> wrote:
> >
> > Hi Kees and David,
> >
> > The overflow kunit test started failing in Fedora recently. The change
> > that seems to be causing the issue is 837018388e18 ("overflow: Replace
> > fake root_device with kunit_device"). Before this the test was
> > passing. I've managed to capture a KASAN splat relating to this issue.
> >
> > It seems like before the $commit, the device would do its own
> > unregistering. Now that the kunit framework is handling the device
> > cleanup it borks somewhere along the way. I tried tracking down the
> > issue, but couldn't spot the issue.
> >
> > My guess is that we need to properly hold a reference to the driver
> > resource, until the device is cleaned up, or vice versa. But i'm sure
> > David has a better idea of what might be happening here.
> >
> > The splat below is from a RHEL kernel, but we're seeing this issue on
> > the latest upstream kernel as well.
> >
> > --Nico
> >
>
> Thanks, Nico. It took a while to reproduce this (it only seems to
> happen here when the test is built as a module), but I think I've
> worked it out.

Thank you for tracking this one down! I was suspecting it was the name
variable, but I could not find where it was declared on the stack. Whew.

> From 0e1d041ee95f4503927091d70c613cc3ad032034 Mon Sep 17 00:00:00 2001
> From: David Gow <davi...@google.com>
> Date: Sat, 27 Jul 2024 14:40:46 +0800
> Subject: [RFC PATCH] kunit: Device wrappers should also manage driver name
>
> kunit_driver_create() accepts a name for the driver, but does not copy
> it, so if that name is either on the stack, or otherwise freed, we end
> up with a use-after-free when the driver is cleaned up.
>
> Instead, strdup() the name, and manage it as another KUnit allocation.
> As there was no existing kunit_kstrdup(), we add one.
>
> This fixes a KASAN splat with overflow.overflow_allocation_test, when
> built as a module.
>
> Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
> Reported-by: Nico Pache <npa...@redhat.com>
> Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
> Signed-off-by: David Gow <davi...@google.com>

Reviewed-by: Kees Cook <ke...@kernel.org>

-Kees

--
Kees Cook
Reply all
Reply to author
Forward
0 new messages