KASAN: null-ptr-deref Write in event_handler

42 views
Skip to first unread message

syzbot

unread,
Oct 5, 2020, 9:59:20 AM10/5/20
to gre...@linuxfoundation.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, sh...@kernel.org, syzkall...@googlegroups.com, valentin...@gmail.com
Hello,

syzbot found the following issue on:

HEAD commit: d3d45f82 Merge tag 'pinctrl-v5.9-2' of git://git.kernel.or..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15781d8f900000
kernel config: https://syzkaller.appspot.com/x/.config?x=89ab6a0c48f30b49
dashboard link: https://syzkaller.appspot.com/bug?extid=bf1a360e305ee719e364
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16cbaa7d900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1364f367900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+bf1a36...@syzkaller.appspotmail.com

vhci_hcd: stop threads
vhci_hcd: release socket
vhci_hcd: disconnect device
==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:71 [inline]
BUG: KASAN: null-ptr-deref in atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
BUG: KASAN: null-ptr-deref in refcount_add include/linux/refcount.h:201 [inline]
BUG: KASAN: null-ptr-deref in refcount_inc include/linux/refcount.h:241 [inline]
BUG: KASAN: null-ptr-deref in get_task_struct include/linux/sched/task.h:104 [inline]
BUG: KASAN: null-ptr-deref in kthread_stop+0x90/0x7e0 kernel/kthread.c:591
Write of size 4 at addr 000000000000001c by task kworker/u4:5/2519

CPU: 1 PID: 2519 Comm: kworker/u4:5 Not tainted 5.9.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usbip_event event_handler
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
__kasan_report mm/kasan/report.c:517 [inline]
kasan_report.cold+0x5/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
instrument_atomic_write include/linux/instrumented.h:71 [inline]
atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
refcount_add include/linux/refcount.h:201 [inline]
refcount_inc include/linux/refcount.h:241 [inline]
get_task_struct include/linux/sched/task.h:104 [inline]
kthread_stop+0x90/0x7e0 kernel/kthread.c:591
vhci_shutdown_connection+0x170/0x2a0 drivers/usb/usbip/vhci_hcd.c:1015
event_handler+0x1a5/0x450 drivers/usb/usbip/usbip_event.c:78
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 2519 Comm: kworker/u4:5 Tainted: G B 5.9.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usbip_event event_handler
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
panic+0x382/0x7fb kernel/panic.c:231
end_report+0x4d/0x53 mm/kasan/report.c:104
__kasan_report mm/kasan/report.c:520 [inline]
kasan_report.cold+0xd/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
instrument_atomic_write include/linux/instrumented.h:71 [inline]
atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
refcount_add include/linux/refcount.h:201 [inline]
refcount_inc include/linux/refcount.h:241 [inline]
get_task_struct include/linux/sched/task.h:104 [inline]
kthread_stop+0x90/0x7e0 kernel/kthread.c:591
vhci_shutdown_connection+0x170/0x2a0 drivers/usb/usbip/vhci_hcd.c:1015
event_handler+0x1a5/0x450 drivers/usb/usbip/usbip_event.c:78
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Andrey Konovalov

unread,
Oct 5, 2020, 10:04:21 AM10/5/20
to Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu
Hi Valentina and Shuah,

There appears to be a race condition in the USB/IP vhci_hcd shutdown
procedure. It happens quite often during fuzzing with syzkaller, and
prevents us from going deeper into the USB/IP code.

Could you advise us what would be the best fix for this?

Thanks in advance!

Shuah Khan

unread,
Oct 5, 2020, 4:44:24 PM10/5/20
to Andrey Konovalov, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu, Shuah Khan
Hi Andrey,

Reading the comments for this routine, looks like there is an assumption
that context begins cleanup and race conditions aren't considered.

The right fix is holding vhci->lock and vdev->priv_lock to protect
critical sections in this routine. I will send a patch for this.

thanks,
-- Shuah


Hillf Danton

unread,
Oct 7, 2020, 6:30:43 AM10/7/20
to syzbot, gre...@linuxfoundation.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, sh...@kernel.org, Hillf Danton, syzkall...@googlegroups.com, valentin...@gmail.com

Mon, 05 Oct 2020 06:59:19
Add a state for vhci device to avoid stop more than once.

--- a/drivers/usb/usbip/vhci.h
+++ b/drivers/usb/usbip/vhci.h
@@ -16,6 +16,10 @@
#include <linux/usb/hcd.h>
#include <linux/wait.h>

+enum {
+ VDEV_STATE_STOP,
+};
+
struct vhci_device {
struct usb_device *udev;

@@ -27,6 +31,7 @@ struct vhci_device {

/* speed of a remote device */
enum usb_device_speed speed;
+ int state;

/* vhci root-hub port to which this device is attached */
__u32 rhport;
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -1001,6 +1001,9 @@ static void vhci_shutdown_connection(str
{
struct vhci_device *vdev = container_of(ud, struct vhci_device, ud);

+ if (test_and_set_bit(VDEV_STATE_STOP, &vdev->state))
+ return;
+
/* need this? see stub_dev.c */
if (ud->tcp_socket) {
pr_debug("shutdown tcp_socket %d\n", ud->sockfd);

Shuah Khan

unread,
Oct 7, 2020, 9:56:58 AM10/7/20
to Andrey Konovalov, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu, Shuah Khan
Hi Andrey,

I have been unable to reproduce the problem with the reproducer
so far. You mentioned it happens quite often.

- matched config with yours
- load vhci_hcd module and run the reproducer

I do see the messages during shutdown - stop threads etc.

What am I missing?

thanks,
-- Shuah



Andrey Konovalov

unread,
Oct 7, 2020, 10:29:11 AM10/7/20
to Shuah Khan, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu
Hm, if you matched the config, then the module should be built-in?

>
> I do see the messages during shutdown - stop threads etc.
>
> What am I missing?

This appears to be a race that requires precise timings. I failed to
reproduce it with the C reproducer, but I managed to reproduce it with
the syzkaller repro program:

https://syzkaller.appspot.com/x/repro.syz?x=16cbaa7d900000

To do that you need to build syzkaller, and copy ./bin/syz-execprog
and ./bin/syz-executor into your testing environment, and then do:

./syz-execprog -sandbox=none -repeat=0 -procs=6 ./repro.prog

Thanks!

Shuah Khan

unread,
Oct 7, 2020, 11:48:07 AM10/7/20
to Andrey Konovalov, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu, Shuah Khan
Right. I did notice that your config has built-in. This shouldn't
matter, I have a kernel built with it static. I will try it to
see if it makes a difference.

>>
>> I do see the messages during shutdown - stop threads etc.
>>
>> What am I missing?
>
> This appears to be a race that requires precise timings. I failed to
> reproduce it with the C reproducer, but I managed to reproduce it with
> the syzkaller repro program:
>
> https://syzkaller.appspot.com/x/repro.syz?x=16cbaa7d900000
>
> To do that you need to build syzkaller, and copy ./bin/syz-execprog
> and ./bin/syz-executor into your testing environment, and then do:
>
> ./syz-execprog -sandbox=none -repeat=0 -procs=6 ./repro.prog
>

Thanks for the tips on your environment.

thanks,
-- Shuah

Shuah Khan

unread,
Oct 9, 2020, 10:38:02 AM10/9/20
to Hillf Danton, syzbot, gre...@linuxfoundation.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, sh...@kernel.org, syzkall...@googlegroups.com, valentin...@gmail.com, Shuah Khan
It is better for it be called more than once and protect the
paths.

I am working on a fix for this.

thanks,
-- Shuah


Shuah Khan

unread,
Oct 16, 2020, 3:48:03 PM10/16/20
to Andrey Konovalov, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu, Shuah Khan
On 10/7/20 9:48 AM, Shuah Khan wrote:
> On 10/7/20 8:28 AM, Andrey Konovalov wrote:
>> On Wed, Oct 7, 2020 at 3:56 PM Shuah Khan <sk...@linuxfoundation.org>
>> wrote:
>>>

[snip]

>>>
>>> Hi Andrey,
>>>
>>> I have been unable to reproduce the problem with the reproducer
>>> so far. You mentioned it happens quite often.
>>>
>>> - matched config with yours
>>> - load vhci_hcd module and run the reproducer
>>
>> Hm, if you matched the config, then the module should be built-in?
>>
>
> Right. I did notice that your config has built-in. This shouldn't
> matter, I have a kernel built with it static. I will try it to
> see if it makes a difference.
>
>>>
>>> I do see the messages during shutdown - stop threads etc.
>>>
>>> What am I missing?
>>
>> This appears to be a race that requires precise timings. I failed to
>> reproduce it with the C reproducer, but I managed to reproduce it with
>> the syzkaller repro program:
>>
>> https://syzkaller.appspot.com/x/repro.syz?x=16cbaa7d900000
>>
>> To do that you need to build syzkaller, and copy ./bin/syz-execprog
>> and ./bin/syz-executor into your testing environment, and then do:
>>
>> ./syz-execprog -sandbox=none -repeat=0 -procs=6 ./repro.prog
>>
>

Andrey,

I am unable to reproduce the problem. I even tweaked the reproducer
to launch 10 procs and 100 threads.

Can you test the following patch with your setup

Here it is - also attached.

=======================================================================
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 66cde5e5f796..3fdcb93b667d 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -997,14 +997,12 @@ static void vhci_device_unlink_cleanup(struct
vhci_device *vdev)
spin_unlock_irqrestore(&vhci->lock, flags);
}

-/*
- * The important thing is that only one context begins cleanup.
- * This is why error handling and cleanup become simple.
- * We do not want to consider race condition as possible.
- */
static void vhci_shutdown_connection(struct usbip_device *ud)
{
struct vhci_device *vdev = container_of(ud, struct vhci_device, ud);
+ struct vhci_hcd *vhci_hcd = vdev_to_vhci_hcd(vdev);
+ struct vhci *vhci = vhci_hcd->vhci;
+ unsigned long flags;

/* need this? see stub_dev.c */
if (ud->tcp_socket) {
@@ -1012,6 +1010,10 @@ static void vhci_shutdown_connection(struct
usbip_device *ud)
kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR);
}

+ /* avoid races during shutdown */
+ spin_lock_irqsave(&vhci->lock, flags);
+ spin_lock(&vdev->priv_lock);
+
/* kill threads related to this sdev */
if (vdev->ud.tcp_rx) {
kthread_stop_put(vdev->ud.tcp_rx);
@@ -1031,6 +1033,10 @@ static void vhci_shutdown_connection(struct
usbip_device *ud)
}
pr_info("release socket\n");

+ /* unlock - vhci_device_unlink_cleanup() holds the locks */
+ spin_unlock(&vdev->priv_lock);
+ spin_unlock_irqrestore(&vhci->lock, flags);
+
vhci_device_unlink_cleanup(vdev);

/*
=======================================================================

thanks,
-- Shuah





vhci_hcd_shutdown.diff

Andrey Konovalov

unread,
Oct 19, 2020, 11:11:05 AM10/19/20
to Shuah Khan, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu
On Fri, Oct 16, 2020 at 9:48 PM Shuah Khan <sk...@linuxfoundation.org> wrote:
>
> Andrey,
>
> I am unable to reproduce the problem. I even tweaked the reproducer
> to launch 10 procs and 100 threads.
>
> Can you test the following patch with your setup
>
> Here it is - also attached.

Hi Shuah,

I get the crash below with this change. It complains about
kthread_stop_put() being called in atomic context.

BUG: sleeping function called from invalid context at
kernel/sched/completion.c:101
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 112, name: kworker/u4:2
4 locks held by kworker/u4:2/112:
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
process_one_work+0x841/0x15c0 kernel/workqueue.c:2243
#1: ffffc900001dfdc0 (usbip_work){+.+.}-{0:0}, at:
process_one_work+0x874/0x15c0 kernel/workqueue.c:2247
#2: ffff88801371d280 (&vhci_hcd->vhci->lock){....}-{2:2}, at:
vhci_shutdown_connection+0xd2/0x360 drivers/usb/usbip/vhci_hcd.c:1019
#3: ffff88801a904890 (&vdev->priv_lock){....}-{2:2}, at: spin_lock
include/linux/spinlock.h:354 [inline]
#3: ffff88801a904890 (&vdev->priv_lock){....}-{2:2}, at:
vhci_shutdown_connection+0xeb/0x360 drivers/usb/usbip/vhci_hcd.c:1020
irq event stamp: 1298372
hardirqs last enabled at (1298371): [<ffffffff81b6ebd6>]
kfree+0x236/0x290 mm/slab.c:3759
hardirqs last disabled at (1298372): [<ffffffff8844b5f9>]
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (1298372): [<ffffffff8844b5f9>]
_raw_spin_lock_irqsave+0xa9/0xd0 kernel/locking/spinlock.c:159
softirqs last enabled at (1298362): [<ffffffff87fdcae1>]
spin_unlock_bh include/linux/spinlock.h:399 [inline]
softirqs last enabled at (1298362): [<ffffffff87fdcae1>]
batadv_nc_purge_paths+0x2d1/0x400 net/batman-adv/network-coding.c:470
softirqs last disabled at (1298360): [<ffffffff87fdc8ef>] spin_lock_bh
include/linux/spinlock.h:359 [inline]
softirqs last disabled at (1298360): [<ffffffff87fdc8ef>]
batadv_nc_purge_paths+0xdf/0x400 net/batman-adv/network-coding.c:441
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 0 PID: 112 Comm: kworker/u4:2 Not tainted 5.9.0+ #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Workqueue: usbip_event event_handler
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fb lib/dump_stack.c:118
___might_sleep.cold+0x1ef/0x235 kernel/sched/core.c:7298
__wait_for_common kernel/sched/completion.c:101 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x85/0x270 kernel/sched/completion.c:138
kthread_stop+0x17d/0x720 kernel/kthread.c:596
vhci_shutdown_connection+0x121/0x360 drivers/usb/usbip/vhci_hcd.c:1024
event_handler+0x1a4/0x3b0 drivers/usb/usbip/usbip_event.c:78
process_one_work+0x953/0x15c0 kernel/workqueue.c:2272
worker_thread+0x96/0xe20 kernel/workqueue.c:2418
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
BUG: scheduling while atomic: kworker/u4:2/112/0x00000003
4 locks held by kworker/u4:2/112:
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888012a7b138 ((wq_completion)usbip_event){+.+.}-{0:0}, at:
process_one_work+0x841/0x15c0 kernel/workqueue.c:2243
#1: ffffc900001dfdc0 (usbip_work){+.+.}-{0:0}, at:
process_one_work+0x874/0x15c0 kernel/workqueue.c:2247
#2: ffff88801371d280 (&vhci_hcd->vhci->lock){+.+.}-{2:2}, at:
vhci_shutdown_connection+0xd2/0x360 drivers/usb/usbip/vhci_hcd.c:1019
#3: ffff88801a904890 (&vdev->priv_lock){+.+.}-{2:2}, at: spin_lock
include/linux/spinlock.h:354 [inline]
#3: ffff88801a904890 (&vdev->priv_lock){+.+.}-{2:2}, at:
vhci_shutdown_connection+0xeb/0x360 drivers/usb/usbip/vhci_hcd.c:1020
Modules linked in:
Preemption disabled at:
[<0000000000000000>] 0x0

Shuah Khan

unread,
Oct 19, 2020, 11:22:25 AM10/19/20
to Andrey Konovalov, Shuah Khan, Valentina Manea, Greg Kroah-Hartman, LKML, USB list, syzkaller-bugs, syzbot, Nazime Hande Harputluoglu, Shuah Khan
On 10/19/20 9:10 AM, Andrey Konovalov wrote:
> On Fri, Oct 16, 2020 at 9:48 PM Shuah Khan <sk...@linuxfoundation.org> wrote:
>>
>> Andrey,
>>
>> I am unable to reproduce the problem. I even tweaked the reproducer
>> to launch 10 procs and 100 threads.
>>
>> Can you test the following patch with your setup
>>
>> Here it is - also attached.
>
> Hi Shuah,
>
> I get the crash below with this change. It complains about
> kthread_stop_put() being called in atomic context.
>

Thanks for testing. I was concerned about that. I will try to
reproduce the problem again.

I am running into a amd_gpu issue during boot on my test system
with KASAN enabled which results in disabling lock debugging. This
makes it harder to debug and fix this null-ptr-deref Write in
event_handler as it needs KASAN enabled. Fun stuff.

I will switch to a different test system and see if I can reproduce
the problem first.

Thanks again for testing.

thanks,
-- Shuah
Reply all
Reply to author
Forward
0 new messages