[syzbot] INFO: task can't die in vmci_qp_broker_detach

8 views
Skip to first unread message

syzbot

unread,
Mar 23, 2022, 1:52:29 PM3/23/22
to ar...@arndb.de, gre...@linuxfoundation.org, jha...@vmware.com, linux-...@vger.kernel.org, pv-dr...@vmware.com, syzkall...@googlegroups.com, vd...@vmware.com
Hello,

syzbot found the following issue on:

HEAD commit: 6b1f86f8e9c7 Merge tag 'folio-5.18b' of git://git.infradea..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=141eed99700000
kernel config: https://syzkaller.appspot.com/x/.config?x=bc982714c733be2b
dashboard link: https://syzkaller.appspot.com/bug?extid=6e07eb10996f8ea7a825
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f0ca51700000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=152b7871700000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+6e07eb...@syzkaller.appspotmail.com

INFO: task syz-executor172:4407 blocked for more than 143 seconds.
Tainted: G W 5.17.0-syzkaller-02172-g6b1f86f8e9c7 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor172 state:D stack:25016 pid: 4407 ppid: 3638 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5073 [inline]
__schedule+0x937/0x1090 kernel/sched/core.c:6382
schedule+0xeb/0x1b0 kernel/sched/core.c:6454
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6513
__mutex_lock_common+0xd1f/0x2590 kernel/locking/mutex.c:673
__mutex_lock kernel/locking/mutex.c:733 [inline]
mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:785
vmci_qp_broker_detach+0x129/0x12b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2093
ctx_free_ctx drivers/misc/vmw_vmci/vmci_context.c:444 [inline]
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put+0x7e2/0xf00 drivers/misc/vmw_vmci/vmci_context.c:497
vmci_ctx_enqueue_datagram+0x3a7/0x440 drivers/misc/vmw_vmci/vmci_context.c:360
dg_dispatch_as_host drivers/misc/vmw_vmci/vmci_datagram.c:275 [inline]
vmci_datagram_dispatch+0x479/0xc40 drivers/misc/vmw_vmci/vmci_datagram.c:339
qp_notify_peer drivers/misc/vmw_vmci/vmci_queue_pair.c:1479 [inline]
vmci_qp_broker_detach+0xb35/0x12b0 drivers/misc/vmw_vmci/vmci_queue_pair.c:2186
ctx_free_ctx drivers/misc/vmw_vmci/vmci_context.c:444 [inline]
kref_put include/linux/kref.h:65 [inline]
vmci_ctx_put+0x7e2/0xf00 drivers/misc/vmw_vmci/vmci_context.c:497
vmci_host_close+0x96/0x160 drivers/misc/vmw_vmci/vmci_host.c:143
__fput+0x3fc/0x870 fs/file_table.c:317
task_work_run+0x146/0x1c0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:190 [inline]
exit_to_user_mode_prepare+0x1dd/0x200 kernel/entry/common.c:222
__syscall_exit_to_user_mode_work kernel/entry/common.c:304 [inline]
syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:315
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fa95cd5cc5b
RSP: 002b:00007fffd32b0640 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fa95cd5cc5b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000009 R08: 0000000000000000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000c284f
R13: 00007fa95ce2540c R14: 00007fffd32b06a0 R15: 00007fa95ce25400
</TASK>
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 PID: 27 Comm: khungtaskd Tainted: G W 5.17.0-syzkaller-02172-g6b1f86f8e9c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
nmi_cpu_backtrace+0x45f/0x490 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x16a/0x280 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:212 [inline]
watchdog+0xc82/0xcd0 kernel/hung_task.c:369
kthread+0x2a3/0x2d0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 55 Comm: kworker/u4:3 Tainted: G W 5.17.0-syzkaller-02172-g6b1f86f8e9c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:rcu_read_lock_held_common kernel/rcu/update.c:104 [inline]
RIP: 0010:rcu_read_lock_sched_held+0x5a/0x130 kernel/rcu/update.c:123
Code: 8a b5 41 48 c7 44 24 08 36 db 5f 8c 48 c7 44 24 10 00 94 6f 81 48 89 e3 48 c1 eb 03 48 b8 f1 f1 f1 f1 00 f3 f3 f3 4a 89 04 33 <e8> f1 59 b3 08 85 c0 74 2a 45 31 ff e8 35 e8 00 00 84 c0 74 24 e8
RSP: 0018:ffffc90001a3f5e0 EFLAGS: 00000802
RAX: f3f3f300f1f1f1f1 RBX: 1ffff92000347ebc RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff8e2054e8
RBP: ffffc90001a3f668 R08: dffffc0000000000 R09: fffffbfff1c40a9e
R10: fffffbfff1c40a9e R11: 0000000000000000 R12: ffffffff8cde7c60
R13: dffffc0000000000 R14: dffffc0000000000 R15: ffff888012408000
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005620a28f3680 CR3: 00000001406f6000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
trace_tlb_flush+0x7b/0x190 include/trace/events/tlb.h:38
switch_mm_irqs_off+0x5c7/0x910
use_temporary_mm arch/x86/kernel/alternative.c:924 [inline]
__text_poke+0x5bd/0x9f0 arch/x86/kernel/alternative.c:1021
text_poke arch/x86/kernel/alternative.c:1083 [inline]
text_poke_bp_batch+0x1b5/0x920 arch/x86/kernel/alternative.c:1297
text_poke_flush arch/x86/kernel/alternative.c:1470 [inline]
text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1477
arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
static_key_disable_cpuslocked+0xcc/0x1b0 kernel/jump_label.c:207
static_key_disable+0x16/0x20 kernel/jump_label.c:215
toggle_allocation_gate+0x3c8/0x460 mm/kfence/core.c:793
process_one_work+0x86c/0x1190 kernel/workqueue.c:2307
worker_thread+0xab1/0x1300 kernel/workqueue.c:2454
kthread+0x2a3/0x2d0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.059 msecs
----------------
Code disassembly (best guess):
0: 8a b5 41 48 c7 44 mov 0x44c74841(%rbp),%dh
6: 24 08 and $0x8,%al
8: 36 db 5f 8c fistpl %ss:-0x74(%rdi)
c: 48 c7 44 24 10 00 94 movq $0xffffffff816f9400,0x10(%rsp)
13: 6f 81
15: 48 89 e3 mov %rsp,%rbx
18: 48 c1 eb 03 shr $0x3,%rbx
1c: 48 b8 f1 f1 f1 f1 00 movabs $0xf3f3f300f1f1f1f1,%rax
23: f3 f3 f3
26: 4a 89 04 33 mov %rax,(%rbx,%r14,1)
* 2a: e8 f1 59 b3 08 callq 0x8b35a20 <-- trapping instruction
2f: 85 c0 test %eax,%eax
31: 74 2a je 0x5d
33: 45 31 ff xor %r15d,%r15d
36: e8 35 e8 00 00 callq 0xe870
3b: 84 c0 test %al,%al
3d: 74 24 je 0x63
3f: e8 .byte 0xe8


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Mar 24, 2022, 9:34:41 AM3/24/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, 23 Mar 2022 10:52:28 -0700
See what will come up if vmci_ctx_put() is made non re-entrant by freeing
context in workqueue.

Hillf

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 6b1f86f8e9c7

diff -pur x/drivers/misc/vmw_vmci/vmci_context.c y/drivers/misc/vmw_vmci/vmci_context.c
--- x/drivers/misc/vmw_vmci/vmci_context.c 2022-03-24 20:48:40.613297600 +0800
+++ y/drivers/misc/vmw_vmci/vmci_context.c 2022-03-24 21:12:11.482451200 +0800
@@ -416,14 +416,9 @@ struct vmci_ctx *vmci_ctx_get(u32 cid)
return context;
}

-/*
- * Deallocates all parts of a context data structure. This
- * function doesn't lock the context, because it assumes that
- * the caller was holding the last reference to context.
- */
-static void ctx_free_ctx(struct kref *kref)
+static void ctx_free_wfunc(struct work_struct *work)
{
- struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref);
+ struct vmci_ctx *context = container_of(work, struct vmci_ctx, free_work);
struct vmci_datagram_queue_entry *dq_entry, *dq_entry_tmp;
struct vmci_handle temp_handle;
struct vmci_handle_list *notifier, *tmp;
@@ -484,6 +479,19 @@ static void ctx_free_ctx(struct kref *kr
}

/*
+ * Deallocates all parts of a context data structure. This
+ * function doesn't lock the context, because it assumes that
+ * the caller was holding the last reference to context.
+ */
+static void ctx_free_ctx(struct kref *kref)
+{
+ struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref);
+
+ INIT_WORK(&context->free_work, ctx_free_wfunc);
+ schedule_work(&context->free_work);
+}
+
+/*
* Drops reference to VMCI context. If this is the last reference to
* the context it will be deallocated. A context is created with
* a reference count of one, and on destroy, it is removed from
diff -pur x/drivers/misc/vmw_vmci/vmci_context.h y/drivers/misc/vmw_vmci/vmci_context.h
--- x/drivers/misc/vmw_vmci/vmci_context.h 2022-03-24 20:47:25.859163300 +0800
+++ y/drivers/misc/vmw_vmci/vmci_context.h 2022-03-24 20:57:52.644149600 +0800
@@ -13,6 +13,7 @@
#include <linux/kref.h>
#include <linux/types.h>
#include <linux/wait.h>
+#include <linux/workqueue.h>

#include "vmci_handle_array.h"
#include "vmci_datagram.h"
@@ -80,6 +81,7 @@ struct vmci_ctx {
const struct cred *cred;
bool *notify; /* Notify flag pointer - hosted only. */
struct page *notify_page; /* Page backing the notify UVA. */
+ struct work_struct free_work;
};

/* VMCINotifyAddRemoveInfo: Used to add/remove remote context notifications. */
--

syzbot

unread,
Mar 24, 2022, 12:06:10 PM3/24/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+6e07eb...@syzkaller.appspotmail.com

Tested on:

commit: 6b1f86f8 Merge tag 'folio-5.18b' of git://git.infradea..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
kernel config: https://syzkaller.appspot.com/x/.config?x=bc982714c733be2b
dashboard link: https://syzkaller.appspot.com/bug?extid=6e07eb10996f8ea7a825
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15b451db700000

Note: testing is done by a robot and is best-effort only.
Reply all
Reply to author
Forward
0 new messages