[syzbot] [net?] [wireless?] possible deadlock in rfkill_send_events

7 views
Skip to first unread message

syzbot

unread,
Oct 9, 2023, 4:09:46 PM10/9/23
to da...@davemloft.net, edum...@google.com, johann...@intel.com, joha...@sipsolutions.net, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: f291209eca5e Merge tag 'net-6.6-rc5' of git://git.kernel.o..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1139f1be680000
kernel config: https://syzkaller.appspot.com/x/.config?x=b89b61abf7449972
dashboard link: https://syzkaller.appspot.com/bug?extid=509238e523e032442b80
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1103ec86680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=118b6bda680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7b2add3a29c7/disk-f291209e.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7a15a9498899/vmlinux-f291209e.xz
kernel image: https://storage.googleapis.com/syzbot-assets/f39062509f32/bzImage-f291209e.xz

The issue was bisected to:

commit 2c3dfba4cf84ac4f306cc6653b37b6dd6859ae9d
Author: Johannes Berg <johann...@intel.com>
Date: Thu Sep 14 13:45:17 2023 +0000

rfkill: sync before userspace visibility/changes

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13967b3a680000
final oops: https://syzkaller.appspot.com/x/report.txt?x=10567b3a680000
console output: https://syzkaller.appspot.com/x/log.txt?x=17967b3a680000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+509238...@syzkaller.appspotmail.com
Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")

============================================
WARNING: possible recursive locking detected
6.6.0-rc4-syzkaller-00158-gf291209eca5e #0 Not tainted
--------------------------------------------
syz-executor675/5132 is trying to acquire lock:
ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286

but task is already holding lock:
ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&data->mtx);
lock(&data->mtx);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by syz-executor675/5132:
#0: ffffffff8d6f39e8 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x59/0x4c0 drivers/char/misc.c:129
#1: ffffffff8ea7fa68 (rfkill_global_mutex){+.+.}-{3:3}, at: rfkill_fop_open+0x13c/0x750 net/rfkill/core.c:1182
#2: ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

stack backtrace:
CPU: 1 PID: 5132 Comm: syz-executor675 Not tainted 6.6.0-rc4-syzkaller-00158-gf291209eca5e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
check_deadlock kernel/locking/lockdep.c:3062 [inline]
validate_chain kernel/locking/lockdep.c:3855 [inline]
__lock_acquire+0x2971/0x5de0 kernel/locking/lockdep.c:5136
lock_acquire kernel/locking/lockdep.c:5753 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__mutex_lock+0x181/0x1340 kernel/locking/mutex.c:747
rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286
rfkill_event net/rfkill/core.c:301 [inline]
rfkill_event net/rfkill/core.c:293 [inline]
rfkill_set_block+0x3d0/0x550 net/rfkill/core.c:369
rfkill_sync net/rfkill/core.c:379 [inline]
rfkill_sync+0x10a/0x1c0 net/rfkill/core.c:372
rfkill_fop_open+0x1d6/0x750 net/rfkill/core.c:1193
misc_open+0x3da/0x4c0 drivers/char/misc.c:165
chrdev_open+0x277/0x700 fs/char_dev.c:414
do_dentry_open+0x88b/0x1730 fs/open.c:929
do_open fs/namei.c:3639 [inline]
path_openat+0x19af/0x29c0 fs/namei.c:3796
do_filp_open+0x1de/0x430 fs/namei.c:3823
do_sys_openat2+0x176/0x1e0 fs/open.c:1422
do_sys_open fs/open.c:1437 [inline]
__do_sys_openat fs/open.c:1453 [inline]
__se_sys_openat fs/open.c:1448 [inline]
__x64_sys_openat+0x175/0x210 fs/open.c:1448
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f8b31dbc989
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 1e 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8b31f5ccc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f8b31e3c4c8 RCX: 00007f8b31dbc989
RDX: 0000000000000801 RSI: 0000000020000040 RDI: ffffffffffffff9c
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f8b31f5c7f0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007f8b31f5cd00 R14: 00007f8b31f5cd40 R15: 0000000000000000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Edward AD

unread,
Oct 9, 2023, 8:18:02 PM10/9/23
to syzbot+509238...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test deadlock in rfkill_send_events

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f291209eca5e

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 08630896b6c8..a14e0d4a0b00 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1180,7 +1180,6 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
init_waitqueue_head(&data->read_wait);

mutex_lock(&rfkill_global_mutex);
- mutex_lock(&data->mtx);
/*
* start getting events from elsewhere but hold mtx to get
* startup events added first
@@ -1191,9 +1190,12 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
if (!ev)
goto free;
rfkill_sync(rfkill);
+ mutex_lock(&data->mtx);
rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD);
list_add_tail(&ev->list, &data->events);
+ mutex_unlock(&data->mtx);
}
+ mutex_lock(&data->mtx);
list_add(&data->list, &rfkill_fds);
mutex_unlock(&data->mtx);
mutex_unlock(&rfkill_global_mutex);

syzbot

unread,
Oct 9, 2023, 8:53:36 PM10/9/23
to syzkall...@googlegroups.com, twuu...@gmail.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+509238...@syzkaller.appspotmail.com

Tested on:

commit: f291209e Merge tag 'net-6.6-rc5' of git://git.kernel.o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1267ade9680000
kernel config: https://syzkaller.appspot.com/x/.config?x=b89b61abf7449972
dashboard link: https://syzkaller.appspot.com/bug?extid=509238e523e032442b80
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1708d02e680000

Note: testing is done by a robot and is best-effort only.

Edward AD

unread,
Oct 9, 2023, 9:08:24 PM10/9/23
to syzbot+509238...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, johann...@intel.com, joha...@sipsolutions.net, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
syzbot report:
syz-executor675/5132 is trying to acquire lock:
ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286

but task is already holding lock:
ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&data->mtx);
lock(&data->mtx);

*** DEADLOCK ***

In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
rfkill_send_events() and then triger this issue.

Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
Reported-and-tested-by: syzbot+509238...@syzkaller.appspotmail.com
Signed-off-by: Edward AD <twuu...@gmail.com>
---
net/rfkill/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 08630896b6c8..a14e0d4a0b00 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1180,7 +1180,6 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
init_waitqueue_head(&data->read_wait);

mutex_lock(&rfkill_global_mutex);
- mutex_lock(&data->mtx);
/*
* start getting events from elsewhere but hold mtx to get
* startup events added first
@@ -1191,9 +1190,12 @@ static int rfkill_fop_open(struct inode *inode, struct file *file)
if (!ev)
goto free;
rfkill_sync(rfkill);
+ mutex_lock(&data->mtx);
rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD);
list_add_tail(&ev->list, &data->events);
+ mutex_unlock(&data->mtx);
}
+ mutex_lock(&data->mtx);
list_add(&data->list, &rfkill_fds);
mutex_unlock(&data->mtx);
mutex_unlock(&rfkill_global_mutex);
--
2.25.1

Simon Horman

unread,
Oct 13, 2023, 7:06:48 AM10/13/23
to Edward AD, syzbot+509238...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, johann...@intel.com, joha...@sipsolutions.net, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
On Tue, Oct 10, 2023 at 09:08:15AM +0800, Edward AD wrote:
> syzbot report:
> syz-executor675/5132 is trying to acquire lock:
> ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286
>
> but task is already holding lock:
> ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&data->mtx);
> lock(&data->mtx);
>
> *** DEADLOCK ***
>
> In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
> rfkill_send_events() and then triger this issue.
>
> Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
> Reported-and-tested-by: syzbot+509238...@syzkaller.appspotmail.com
> Signed-off-by: Edward AD <twuu...@gmail.com>

Hi Edward,

I am wondering if you considered moving the rfkill_sync() calls
to before &data->mtx is taken, to avoid the need to drop and
retake it?

Perhaps it doesn't work for some reason (compile tested only!).
But this does seem somehow cleaner for me.

Edward AD

unread,
Oct 13, 2023, 10:43:31 PM10/13/23
to ho...@kernel.org, da...@davemloft.net, edum...@google.com, johann...@intel.com, joha...@sipsolutions.net, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+509238...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, twuu...@gmail.com
Hi Simon Horman,
On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> I am wondering if you considered moving the rfkill_sync() calls
> to before &data->mtx is taken, to avoid the need to drop and
> retake it?
If you move rfkill_sync() before calling &data->mtx, more code will be added
because rfkill_sync() is in the loop body.
>
> Perhaps it doesn't work for some reason (compile tested only!).
> But this does seem somehow cleaner for me.
BR,
edward

Simon Horman

unread,
Oct 14, 2023, 3:29:50 AM10/14/23
to Edward AD, da...@davemloft.net, edum...@google.com, johann...@intel.com, joha...@sipsolutions.net, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+509238...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> Hi Simon Horman,
> On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > I am wondering if you considered moving the rfkill_sync() calls
> > to before &data->mtx is taken, to avoid the need to drop and
> > retake it?
> If you move rfkill_sync() before calling &data->mtx, more code will be added
> because rfkill_sync() is in the loop body.

Maybe that is true. And maybe that is a good argument for
not taking the approach that I suggested. But I do think it
is simpler from a locking perspective, and that has some merit.

Johannes Berg

unread,
Oct 14, 2023, 4:01:53 PM10/14/23
to Simon Horman, Edward AD, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, linux-w...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+509238...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
On Sat, 2023-10-14 at 09:29 +0200, Simon Horman wrote:
> On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> > Hi Simon Horman,
> > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > > I am wondering if you considered moving the rfkill_sync() calls
> > > to before &data->mtx is taken, to avoid the need to drop and
> > > retake it?
> > If you move rfkill_sync() before calling &data->mtx, more code will be added
> > because rfkill_sync() is in the loop body.
>
> Maybe that is true. And maybe that is a good argument for
> not taking the approach that I suggested. But I do think it
> is simpler from a locking perspective, and that has some merit.
>

FWIW, I missed this patch and discussion until now, but I already fixed
the issue differently:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/commit/?id=f2ac54ebf85615a6d78f5eb213a8bbeeb17ebe5d

There was never any need to hold the data->mtx for anything but the list
manipulation, and even that isn't _really_ needed since the 'data' is
completely fresh and not seen anywhere else yet.

(I'll also note that the subject of this thread is wrong since this was
never an *actual* deadlock, just a *possible* one reported by lockdep.)

johannes
Reply all
Reply to author
Forward
0 new messages