[syzbot] possible deadlock in vmci_qp_broker_detach

17 views
Skip to first unread message

syzbot

unread,
Mar 25, 2021, 9:19:16 PM3/25/21
to alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
Hello,

syzbot found the following issue on:

HEAD commit: 5ee96fa9 Merge tag 'irq-urgent-2021-03-21' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15466edcd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=6abda3336c698a07
dashboard link: https://syzkaller.appspot.com/bug?extid=44e40ac2cfe68e8ce207

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+44e40a...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.12.0-rc3-syzkaller #0 Not tainted
--------------------------------------------
syz-executor.2/24589 is trying to acquire lock:
ffffffff8ca63f38 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

but task is already holding lock:
ffffffff8ca63f38 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(qp_broker_list.mutex);
lock(qp_broker_list.mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor.2/24589:
#0: ffffffff8ca63f38 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

stack backtrace:
CPU: 0 PID: 24589 Comm: syz-executor.2 Not tainted 5.12.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
check_deadlock kernel/locking/lockdep.c:2872 [inline]
validate_chain kernel/locking/lockdep.c:3661 [inline]
__lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
__mutex_lock_common kernel/locking/mutex.c:949 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093
ctx_free_ctx+0x4e5/0xd30 drivers/misc/vmw_vmci/vmci_context.c:444
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put drivers/misc/vmw_vmci/vmci_context.c:497 [inline]
vmci_ctx_enqueue_datagram+0x4dc/0x620 drivers/misc/vmw_vmci/vmci_context.c:360
dg_dispatch_as_host drivers/misc/vmw_vmci/vmci_datagram.c:275 [inline]
vmci_datagram_dispatch+0x39b/0xb50 drivers/misc/vmw_vmci/vmci_datagram.c:339
qp_notify_peer+0x182/0x260 drivers/misc/vmw_vmci/vmci_queue_pair.c:1479
vmci_qp_broker_detach+0xa09/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2186
ctx_free_ctx+0x4e5/0xd30 drivers/misc/vmw_vmci/vmci_context.c:444
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put drivers/misc/vmw_vmci/vmci_context.c:497 [inline]
vmci_ctx_destroy+0x169/0x1d0 drivers/misc/vmw_vmci/vmci_context.c:195
vmci_host_close+0x116/0x1a0 drivers/misc/vmw_vmci/vmci_host.c:143
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x41926b
Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44
RSP: 002b:0000000000a9fb80 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 000000000041926b
RDX: 0000001b32f25054 RSI: 0000001b32f24cec RDI: 0000000000000004
RBP: 0000000000000001 R08: 0000000000000000 R09: 000000000000165e
R10: 000000008afb7662 R11: 0000000000000293 R12: 000000000056c9e0
R13: 000000000056c9e0 R14: 000000000056bf60 R15: 0000000000212216


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Hillf Danton

unread,
Mar 26, 2021, 4:07:14 AM3/26/21
to syzbot, gre...@linuxfoundation.org, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, Hillf Danton, vd...@vmware.com
Thu, 25 Mar 2021 18:19:15
Ask kworker to free contexts if there is a reproducer.
They are handled by kworker one after another with the chance cut for the
embeded free as this report illustrates.

--- x/drivers/misc/vmw_vmci/vmci_context.h
+++ y/drivers/misc/vmw_vmci/vmci_context.h
@@ -44,6 +44,7 @@ struct vmci_ctx {
struct list_head datagram_queue; /* Head of per VM queue. */
u32 pending_datagrams;
size_t datagram_queue_size; /* Size of datagram queue in bytes. */
+ struct list_head exit_item;

/*
* Version of the code that created
--- x/drivers/misc/vmw_vmci/vmci_context.c
+++ y/drivers/misc/vmw_vmci/vmci_context.c
@@ -114,6 +114,7 @@ struct vmci_ctx *vmci_ctx_create(u32 cid
kref_init(&context->kref);
spin_lock_init(&context->lock);
INIT_LIST_HEAD(&context->list_item);
+ INIT_LIST_HEAD(&context->exit_item);
INIT_LIST_HEAD(&context->datagram_queue);
INIT_LIST_HEAD(&context->notifier_list);

@@ -421,9 +422,8 @@ struct vmci_ctx *vmci_ctx_get(u32 cid)
* function doesn't lock the context, because it assumes that
* the caller was holding the last reference to context.
*/
-static void ctx_free_ctx(struct kref *kref)
+static void __ctx_free_ctx(struct vmci_ctx *context)
{
- struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref);
struct vmci_datagram_queue_entry *dq_entry, *dq_entry_tmp;
struct vmci_handle temp_handle;
struct vmci_handle_list *notifier, *tmp;
@@ -483,6 +483,49 @@ static void ctx_free_ctx(struct kref *kr
kfree(context);
}

+static LIST_HEAD(vmci_exit_list);
+static DEFINE_SPINLOCK(vmci_exit_lock);
+static bool vmci_exit_worker_busy;
+
+static void vmci_exit_work_fn(struct work_struct *w)
+{
+ struct vmci_ctx *ctx;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vmci_exit_lock, flags);
+ vmci_exit_worker_busy = true;
+
+ while (!list_empty(&vmci_exit_list)) {
+ ctx = list_last_entry(&vmci_exit_list, struct vmci_ctx,
+ exit_item);
+ list_del(&ctx->exit_item);
+ spin_unlock_irqrestore(&vmci_exit_lock, flags);
+
+ __ctx_free_ctx(ctx);
+ cond_resched();
+
+ spin_lock_irqsave(&vmci_exit_lock, flags);
+ }
+ vmci_exit_worker_busy = false;
+ spin_unlock_irqrestore(&vmci_exit_lock, flags);
+}
+static DECLARE_WORK(vmci_exit_work, vmci_exit_work_fn);
+
+static void ctx_free_ctx(struct kref *kref)
+{
+ struct vmci_ctx *ctx = container_of(kref, struct vmci_ctx, kref);
+ unsigned long flags;
+ bool busy;
+
+ spin_lock_irqsave(&vmci_exit_lock, flags);
+ busy = vmci_exit_worker_busy;
+ list_add(&ctx->exit_item, &vmci_exit_list);
+ spin_unlock_irqrestore(&vmci_exit_lock, flags);
+
+ if (!busy)
+ queue_work(system_unbound_wq, &vmci_exit_work);
+}
+
/*
* Drops reference to VMCI context. If this is the last reference to
* the context it will be deallocated. A context is created with

syzbot

unread,
Apr 12, 2021, 1:29:15 PM4/12/21
to alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
syzbot has found a reproducer for the following issue on:

HEAD commit: d434405a Linux 5.12-rc7
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1661482ed00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9c3d8981d2bdb103
dashboard link: https://syzkaller.appspot.com/bug?extid=44e40ac2cfe68e8ce207
compiler: Debian clang version 11.0.1-2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=102336a6d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+44e40a...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.12.0-rc7-syzkaller #0 Not tainted
--------------------------------------------
syz-executor.0/10571 is trying to acquire lock:
ffffffff8ce6c1f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0xd3/0x10c0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

but task is already holding lock:
ffffffff8ce6c1f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0xd3/0x10c0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(qp_broker_list.mutex);
lock(qp_broker_list.mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor.0/10571:
#0: ffffffff8ce6c1f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0xd3/0x10c0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

stack backtrace:
CPU: 1 PID: 10571 Comm: syz-executor.0 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x176/0x24e lib/dump_stack.c:120
__lock_acquire+0x2303/0x5e60 kernel/locking/lockdep.c:4739
lock_acquire+0x126/0x650 kernel/locking/lockdep.c:5511
__mutex_lock_common+0x167/0x2eb0 kernel/locking/mutex.c:949
__mutex_lock kernel/locking/mutex.c:1096 [inline]
mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1111
vmci_qp_broker_detach+0xd3/0x10c0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093
ctx_free_ctx drivers/misc/vmw_vmci/vmci_context.c:444 [inline]
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put+0x722/0xe00 drivers/misc/vmw_vmci/vmci_context.c:497
vmci_ctx_enqueue_datagram+0x3a7/0x440 drivers/misc/vmw_vmci/vmci_context.c:360
dg_dispatch_as_host drivers/misc/vmw_vmci/vmci_datagram.c:275 [inline]
vmci_datagram_dispatch+0x3ec/0xb40 drivers/misc/vmw_vmci/vmci_datagram.c:339
qp_notify_peer drivers/misc/vmw_vmci/vmci_queue_pair.c:1479 [inline]
vmci_qp_broker_detach+0x9fa/0x10c0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2186
ctx_free_ctx drivers/misc/vmw_vmci/vmci_context.c:444 [inline]
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put+0x722/0xe00 drivers/misc/vmw_vmci/vmci_context.c:497
vmci_host_close+0x96/0x160 drivers/misc/vmw_vmci/vmci_host.c:143
__fput+0x352/0x7b0 fs/file_table.c:280
task_work_run+0x146/0x1c0 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x41926b
Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44
RSP: 002b:00007ffee76536f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000041926b
RDX: 0000000000570698 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000001b30e200a8
R10: 00007ffee76537e0 R11: 0000000000000293 R12: 00000000000688ea
R13: 00000000000003e8 R14: 000000000056bf60 R15: 00000000000688cf

syzbot

unread,
Jun 30, 2021, 1:21:27 PM6/30/21
to alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
syzbot has found a reproducer for the following issue on:

HEAD commit: a1f92694 Add linux-next specific files for 20210518
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14cf5118300000
kernel config: https://syzkaller.appspot.com/x/.config?x=d612e75ffd53a6d3
dashboard link: https://syzkaller.appspot.com/bug?extid=44e40ac2cfe68e8ce207
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15dae18c300000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14c680e2300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+44e40a...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.13.0-rc2-next-20210518-syzkaller #0 Not tainted
--------------------------------------------
syz-executor723/9333 is trying to acquire lock:
ffffffff8cc8b5f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

but task is already holding lock:
ffffffff8cc8b5f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(qp_broker_list.mutex);
lock(qp_broker_list.mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor723/9333:
#0: ffffffff8cc8b5f8 (qp_broker_list.mutex){+.+.}-{3:3}, at: vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093

stack backtrace:
CPU: 0 PID: 9333 Comm: syz-executor723 Not tainted 5.13.0-rc2-next-20210518-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x13e/0x1d6 lib/dump_stack.c:129
print_deadlock_bug kernel/locking/lockdep.c:2831 [inline]
check_deadlock kernel/locking/lockdep.c:2874 [inline]
validate_chain kernel/locking/lockdep.c:3663 [inline]
__lock_acquire.cold+0x22f/0x3b4 kernel/locking/lockdep.c:4902
lock_acquire kernel/locking/lockdep.c:5512 [inline]
lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5477
__mutex_lock_common kernel/locking/mutex.c:949 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1096
vmci_qp_broker_detach+0x147/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093
ctx_free_ctx+0x4cc/0xd30 drivers/misc/vmw_vmci/vmci_context.c:444
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put drivers/misc/vmw_vmci/vmci_context.c:497 [inline]
vmci_ctx_enqueue_datagram+0x4dc/0x620 drivers/misc/vmw_vmci/vmci_context.c:360
dg_dispatch_as_host drivers/misc/vmw_vmci/vmci_datagram.c:275 [inline]
vmci_datagram_dispatch+0x39b/0xb50 drivers/misc/vmw_vmci/vmci_datagram.c:339
qp_notify_peer+0x182/0x260 drivers/misc/vmw_vmci/vmci_queue_pair.c:1479
vmci_qp_broker_detach+0xa09/0x11b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2186
ctx_free_ctx+0x4cc/0xd30 drivers/misc/vmw_vmci/vmci_context.c:444
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put drivers/misc/vmw_vmci/vmci_context.c:497 [inline]
vmci_ctx_destroy+0x169/0x1d0 drivers/misc/vmw_vmci/vmci_context.c:195
vmci_host_close+0x116/0x1a0 drivers/misc/vmw_vmci/vmci_host.c:143
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x26f/0x280 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:301
do_syscall_64+0x3e/0xb0 arch/x86/entry/common.c:57
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x445ac9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe38cec92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000124
RAX: 0000000000000005 RBX: 00000000004ca420 RCX: 0000000000445ac9
RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000004
RBP: 00000000004ca42c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049a074
R13: 65732f636f72702f R14: 636d762f7665642f R15: 00000000004ca428

Pavel Skripkin

unread,
Jun 30, 2021, 5:36:21 PM6/30/21
to syzbot, alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
Very ugly patch just to test the idea:

vmci_ctx_put() in vmci_ctx_enqueue_datagram() should not be the last
vmci_ctx_put() in context life, so we need to block vmci_ctx_destroy() until
vmci_ctx_enqueue_datagram() is done.

#syz test
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master



With regards,
Pavel Skripkin
0001-misc-vmv_vmci-fix-deadlock.patch

syzbot

unread,
Jun 30, 2021, 5:56:07 PM6/30/21
to alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, paskr...@gmail.com, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in vmci_ctx_destroy

INFO: task syz-executor.4:4967 blocked for more than 143 seconds.
Tainted: G W 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.4 state:D stack:29136 pid: 4967 ppid: 8823 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4683 [inline]
__schedule+0xb39/0x5980 kernel/sched/core.c:5940
schedule+0xd3/0x270 kernel/sched/core.c:6019
vmci_ctx_destroy+0x2db/0x3b0 drivers/misc/vmw_vmci/vmci_context.c:197
vmci_host_close+0xef/0x170 drivers/misc/vmw_vmci/vmci_host.c:144
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:209
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
RSP: 002b:00007f4a80452188 EFLAGS: 00000246 ORIG_RAX: 0000000000000124
RAX: 0000000000000005 RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000004
RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffc7a5d942f R14: 00007f4a80452300 R15: 0000000000022000
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 1650 Comm: khungtaskd Tainted: G W 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 25 Comm: kworker/u4:1 Tainted: G W 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: phy16 ieee80211_iface_work
RIP: 0010:__sanitizer_cov_trace_switch+0xb7/0xf0 kernel/kcov.c:321
Code: 00 48 39 f7 72 1b 48 83 c2 01 48 89 5c 30 e0 48 89 6c 30 e8 4c 89 5c 30 f0 4e 89 4c e8 20 48 89 10 48 83 c1 01 49 39 ca 75 95 <5b> 5d 41 5c 41 5d c3 48 83 f8 40 bb 07 00 00 00 0f 84 6c ff ff ff
RSP: 0018:ffffc90000dff340 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000020
RDX: 0000000000000000 RSI: ffff888010a2d580 RDI: 0000000000000003
RBP: 00000000000000f4 R08: ffffffff8a896c00 R09: ffffffff887bdca3
R10: 0000000000000020 R11: 00000000000000dd R12: ffff888010a2d580
R13: dffffc0000000000 R14: ffff88802abe2094 R15: ffff88802abe2093
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1f91962000 CR3: 0000000035ae5000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
_ieee802_11_parse_elems_crc+0x1e3/0x1f90 net/mac80211/util.c:1018
ieee802_11_parse_elems_crc+0x89e/0xfe0 net/mac80211/util.c:1478
ieee802_11_parse_elems net/mac80211/ieee80211_i.h:2030 [inline]
ieee80211_rx_mgmt_probe_beacon+0x17f/0x17b0 net/mac80211/ibss.c:1612
ieee80211_ibss_rx_queued_mgmt+0xd82/0x15f0 net/mac80211/ibss.c:1642
ieee80211_iface_work+0x761/0x9e0 net/mac80211/iface.c:1439
process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


Tested on:

commit: 44046219 Merge tag 'for-5.14/drivers-2021-06-29' of git://..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17d99918300000
kernel config: https://syzkaller.appspot.com/x/.config?x=efdd5c8b8b556274
patch: https://syzkaller.appspot.com/x/patch.diff?x=149e7918300000

Pavel Skripkin

unread,
Jun 30, 2021, 6:00:36 PM6/30/21
to syzbot, alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
On Wed, 30 Jun 2021 14:56:06 -0700
syzbot <syzbot+44e40a...@syzkaller.appspotmail.com> wrote:

> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still
> triggering an issue: INFO: task hung in vmci_ctx_destroy
>
> INFO: task syz-executor.4:4967 blocked for more than 143 seconds.
> Tainted: G W 5.13.0-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message. task:syz-executor.4 state:D stack:29136 pid: 4967 ppid:
> 8823 flags:0x00004004 Call Trace:

Hmm, I forgot to change old vmci_ctx_put() in
vmci_ctx_enqueue_datagram()...
0001-misc-vmv_vmci-fix-deadlock.patch

syzbot

unread,
Jun 30, 2021, 6:20:11 PM6/30/21
to alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, paskr...@gmail.com, snov...@gmail.com, syzkall...@googlegroups.com, vd...@vmware.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in vmci_ctx_destroy

INFO: task syz-executor.1:10566 blocked for more than 143 seconds.
Not tainted 5.13.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.1 state:D stack:28336 pid:10566 ppid: 8853 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4683 [inline]
__schedule+0xb39/0x5980 kernel/sched/core.c:5940
schedule+0xd3/0x270 kernel/sched/core.c:6019
vmci_ctx_destroy+0x2db/0x3b0 drivers/misc/vmw_vmci/vmci_context.c:197
vmci_host_close+0xef/0x170 drivers/misc/vmw_vmci/vmci_host.c:144
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:209
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
RSP: 002b:00007f09abe44188 EFLAGS: 00000246 ORIG_RAX: 0000000000000124
RAX: 0000000000000005 RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000004
RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffcea410a5f R14: 00007f09abe44300 R15: 0000000000022000

Showing all locks held in the system:
1 lock held by khungtaskd/1641:
#0: ffffffff8b77d7c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8286:

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1641 Comm: khungtaskd Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
watchdog+0xd4b/0xfb0 kernel/hung_task.c:294
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:write_comp_data kernel/kcov.c:218 [inline]
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x17/0x70 kernel/kcov.c:284
Code: 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 89 f1 65 48 8b 34 25 00 f0 01 00 <e8> 54 f0 ff ff 84 c0 74 4b 48 8b 86 40 15 00 00 8b b6 3c 15 00 00
RSP: 0018:ffffc90000ca7b28 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
RDX: ffff88813fec8000 RSI: ffff88813fec8000 RDI: 0000000000000003
RBP: ffff8880109b1e48 R08: 0000000000000000 R09: ffff8880109b1e4b
R10: ffffffff817c6c69 R11: 0000000077db5095 R12: 0000000000000002
R13: ffffc90000ca7bb0 R14: 1ffff92000194f72 R15: 0000000000000022
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdc1cf12000 CR3: 0000000036471000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
trace_hardirqs_on+0x19/0x1c0 kernel/trace/trace_preemptirq.c:42
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
_raw_spin_unlock_irqrestore+0x50/0x70 kernel/locking/spinlock.c:191
crng_backtrack_protect drivers/char/random.c:1053 [inline]
_get_random_bytes+0x295/0x670 drivers/char/random.c:1540
nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:540 [inline]
nsim_dev_trap_report drivers/net/netdevsim/dev.c:570 [inline]
nsim_dev_trap_report_work+0x740/0xbd0 drivers/net/netdevsim/dev.c:611
process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295


Tested on:

commit: 44046219 Merge tag 'for-5.14/drivers-2021-06-29' of git://..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17e7bbdc300000
patch: https://syzkaller.appspot.com/x/patch.diff?x=134eb1e8300000

Hillf Danton

unread,
Jul 1, 2021, 2:44:16 AM7/1/21
to syzbot, alex.d...@gmail.com, ar...@arndb.de, gre...@linuxfoundation.org, hda...@sina.com, jha...@vmware.com, linux-...@vger.kernel.org, snov...@gmail.com, syzkall...@googlegroups.com, Pavel Skripkin, vd...@vmware.com
On Wed, 30 Jun 2021 10:21:26 -0700
One of the quick fixes is add vmci_ctx to workqueue in bid to cut the
chance of re-entrance into vmci_qp_broker_detach().

+++ x/drivers/misc/vmw_vmci/vmci_context.c
@@ -416,14 +416,9 @@ struct vmci_ctx *vmci_ctx_get(u32 cid)
return context;
}

-/*
- * Deallocates all parts of a context data structure. This
- * function doesn't lock the context, because it assumes that
- * the caller was holding the last reference to context.
- */
-static void ctx_free_ctx(struct kref *kref)
+static void ctx_free_ctx_workfn(struct work_struct *w)
{
- struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref);
+ struct vmci_ctx *context = container_of(w, struct vmci_ctx, free_work);
struct vmci_datagram_queue_entry *dq_entry, *dq_entry_tmp;
struct vmci_handle temp_handle;
struct vmci_handle_list *notifier, *tmp;
@@ -484,6 +479,18 @@ static void ctx_free_ctx(struct kref *kr
}

/*
+ * Deallocates all parts of a context data structure. This
+ * function doesn't lock the context, because it assumes that
+ * the caller was holding the last reference to context.
+ */
+static void ctx_free_ctx(struct kref *k)
+{
+ struct vmci_ctx *ctx = container_of(k, struct vmci_ctx, kref);
+
+ queue_work(system_unbound_wq, &ctx->free_work);
+}
+
+/*
* Drops reference to VMCI context. If this is the last reference to
* the context it will be deallocated. A context is created with
* a reference count of one, and on destroy, it is removed from
Reply all
Reply to author
Forward
0 new messages