[syzbot] [bluetooth?] possible deadlock in __flush_workqueue

10 views
Skip to first unread message

syzbot

unread,
Jan 17, 2024, 5:03:27 AMJan 17
to johan....@gmail.com, linux-b...@vger.kernel.org, linux-...@vger.kernel.org, luiz....@gmail.com, mar...@holtmann.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 943b9f0ab2cf Add linux-next specific files for 20240117
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=121de2fbe80000
kernel config: https://syzkaller.appspot.com/x/.config?x=12af1d067b6a6d19
dashboard link: https://syzkaller.appspot.com/bug?extid=da0a9c9721e36db712e8
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9c032ce79e0f/disk-943b9f0a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/93163e287878/vmlinux-943b9f0a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/512cc2e14a4b/bzImage-943b9f0a.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+da0a9c...@syzkaller.appspotmail.com

Bluetooth: hci2: Opcode 0x0c03 failed: -110
============================================
WARNING: possible recursive locking detected
6.7.0-next-20240117-syzkaller #0 Not tainted
--------------------------------------------
kworker/u5:1/21244 is trying to acquire lock:
ffff88802e0a2538 ((wq_completion)hci2){+.+.}-{0:0}, at: __flush_workqueue+0x141/0x1340 kernel/workqueue.c:3147

but task is already holding lock:
ffff88802e0a2538 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x7ba/0x16e0 kernel/workqueue.c:2608

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock((wq_completion)hci2);
lock((wq_completion)hci2);

*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by kworker/u5:1/21244:
#0: ffff88802e0a2538 ((wq_completion)hci2){+.+.}-{0:0}, at: process_one_work+0x7ba/0x16e0 kernel/workqueue.c:2608
#1: ffffc9000dc27d80 ((work_completion)(&hdev->error_reset)){+.+.}-{0:0}, at: process_one_work+0x824/0x16e0 kernel/workqueue.c:2609

stack backtrace:
CPU: 1 PID: 21244 Comm: kworker/u5:1 Not tainted 6.7.0-next-20240117-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: hci2 hci_error_reset
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
check_deadlock kernel/locking/lockdep.c:3062 [inline]
validate_chain kernel/locking/lockdep.c:3856 [inline]
__lock_acquire+0x20e6/0x3b30 kernel/locking/lockdep.c:5137
lock_acquire kernel/locking/lockdep.c:5754 [inline]
lock_acquire+0x1b1/0x540 kernel/locking/lockdep.c:5719
__flush_workqueue+0x14b/0x1340 kernel/workqueue.c:3147
drain_workqueue+0x18f/0x3d0 kernel/workqueue.c:3312
destroy_workqueue+0xc3/0xb10 kernel/workqueue.c:4801
hci_release_dev+0x14e/0x620 net/bluetooth/hci_core.c:2808
bt_host_release+0x6a/0xb0 net/bluetooth/hci_sysfs.c:94
device_release+0xa1/0x240 drivers/base/core.c:2485
kobject_cleanup lib/kobject.c:682 [inline]
kobject_release lib/kobject.c:716 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x1d0/0x440 lib/kobject.c:733
put_device+0x1f/0x30 drivers/base/core.c:3733
process_one_work+0x8d5/0x16e0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2707 [inline]
worker_thread+0x8b6/0x1290 kernel/workqueue.c:2788
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:242
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Tetsuo Handa

unread,
Jun 10, 2024, 7:00:42 AMJun 10
to Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz, linux-b...@vger.kernel.org, syzbot+da0a9c...@syzkaller.appspotmail.com, syzkaller-bugs
syzbot is reporting that calling hci_release_dev() from hci_error_reset()
due to hci_dev_put() from hci_error_reset() can cause deadlock at
destroy_workqueue(), for hci_error_reset() is called from
hdev->req_workqueue which destroy_workqueue() needs to flush.

We need to make sure that hdev->{rx_work,cmd_work,tx_work} which are
queued into hdev->workqueue and hdev->{power_on,error_reset} which are
queued into hdev->req_workqueue are no longer running by the moment

destroy_workqueue(hdev->workqueue);
destroy_workqueue(hdev->req_workqueue);

are called from hci_release_dev().

Call cancel_work_sync() on these work items from hci_unregister_dev()
as soon as hdev->list is removed from hci_dev_list.

Reported-by: syzbot <syzbot+da0a9c...@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=da0a9c9721e36db712e8
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
Completely untested. Please do tests with lockdep enabled before committing.
Maybe it is too early to cancel hdev->{rx_work,cmd_work,tx_work}.
Maybe there are more work items which should be canceled before
hci_unregister_dev() completes. I don't know...

net/bluetooth/hci_core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index dd3b0f501018..dbbe5e2da210 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2751,7 +2751,11 @@ void hci_unregister_dev(struct hci_dev *hdev)
list_del(&hdev->list);
write_unlock(&hci_dev_list_lock);

+ cancel_work_sync(&hdev->rx_work);
+ cancel_work_sync(&hdev->cmd_work);
+ cancel_work_sync(&hdev->tx_work);
cancel_work_sync(&hdev->power_on);
+ cancel_work_sync(&hdev->error_reset);

hci_cmd_sync_clear(hdev);

--
2.18.4


patchwork-b...@kernel.org

unread,
Jun 14, 2024, 11:20:47 AMJun 14
to Tetsuo Handa, mar...@holtmann.org, johan....@gmail.com, luiz....@gmail.com, linux-b...@vger.kernel.org, syzbot+da0a9c...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Hello:

This patch was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <luiz.vo...@intel.com>:

On Mon, 10 Jun 2024 20:00:32 +0900 you wrote:
> syzbot is reporting that calling hci_release_dev() from hci_error_reset()
> due to hci_dev_put() from hci_error_reset() can cause deadlock at
> destroy_workqueue(), for hci_error_reset() is called from
> hdev->req_workqueue which destroy_workqueue() needs to flush.
>
> We need to make sure that hdev->{rx_work,cmd_work,tx_work} which are
> queued into hdev->workqueue and hdev->{power_on,error_reset} which are
> queued into hdev->req_workqueue are no longer running by the moment
>
> [...]

Here is the summary with links:
- Bluetooth: hci_core: cancel rx_work,cmd_work,tx_work,power_on,error_reset works upon hci_unregister_dev()
https://git.kernel.org/bluetooth/bluetooth-next/c/5b41aa213455

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


Reply all
Reply to author
Forward
0 new messages