[syzbot] BUG: corrupted list in p9_fd_cancel (2)

16 views
Skip to first unread message

syzbot

unread,
Oct 23, 2022, 6:41:35ā€ÆAM10/23/22
to asma...@codewreck.org, da...@davemloft.net, edum...@google.com, eri...@gmail.com, ku...@kernel.org, linux-...@vger.kernel.org, linu...@crudebyte.com, lu...@ionkov.net, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, v9fs-de...@lists.sourceforge.net
Hello,

syzbot found the following issue on:

HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/5664e231e97f/disk-d47136c2.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9bbe0daa4a04/vmlinux-d47136c2.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9b69b8...@syzkaller.appspotmail.com

list_del corruption, ffff88802295c4b0->next is LIST_POISON1 (dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:55!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 4018 Comm: syz-executor365 Not tainted 6.1.0-rc1-syzkaller-00427-gd47136c28015 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
RIP: 0010:__list_del_entry_valid+0xef/0x130 lib/list_debug.c:53
Code: 29 40 03 06 0f 0b 48 c7 c7 e0 bf 0a 8b 4c 89 fe 31 c0 e8 16 40 03 06 0f 0b 48 c7 c7 40 c0 0a 8b 4c 89 fe 31 c0 e8 03 40 03 06 <0f> 0b 48 c7 c7 a0 c0 0a 8b 4c 89 fe 31 c0 e8 f0 3f 03 06 0f 0b 48
RSP: 0018:ffffc900044c7630 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 5051969350135b00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffffffff816cec5d R09: fffff52000898e7d
R10: fffff52000898e7d R11: 1ffff92000898e7c R12: dffffc0000000000
R13: 1ffff1100452b880 R14: dead000000000100 R15: ffff88802295c4b0
FS: 00007f0d52859700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001000 CR3: 000000007ed87000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_del_entry include/linux/list.h:134 [inline]
list_del include/linux/list.h:148 [inline]
p9_fd_cancel+0x9c/0x230 net/9p/trans_fd.c:703
p9_client_rpc+0x92c/0xad0 net/9p/client.c:723
p9_client_create+0x997/0x1030 net/9p/client.c:1015
v9fs_session_init+0x1e3/0x1990 fs/9p/v9fs.c:408
v9fs_mount+0xd2/0xcb0 fs/9p/vfs_super.c:126
legacy_get_tree+0xea/0x180 fs/fs_context.c:610
vfs_get_tree+0x88/0x270 fs/super.c:1530
do_new_mount+0x289/0xad0 fs/namespace.c:3040
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0d528a89f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0d528592f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f0d529344e0 RCX: 00007f0d528a89f9
RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 00007f0d52901174 R08: 0000000020000080 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
R13: 0000000000000004 R14: 64663d736e617274 R15: 00007f0d529344e8
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_del_entry_valid+0xef/0x130 lib/list_debug.c:53
Code: 29 40 03 06 0f 0b 48 c7 c7 e0 bf 0a 8b 4c 89 fe 31 c0 e8 16 40 03 06 0f 0b 48 c7 c7 40 c0 0a 8b 4c 89 fe 31 c0 e8 03 40 03 06 <0f> 0b 48 c7 c7 a0 c0 0a 8b 4c 89 fe 31 c0 e8 f0 3f 03 06 0f 0b 48
RSP: 0018:ffffc900044c7630 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: 5051969350135b00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffffffff816cec5d R09: fffff52000898e7d
R10: fffff52000898e7d R11: 1ffff92000898e7c R12: dffffc0000000000
R13: 1ffff1100452b880 R14: dead000000000100 R15: ffff88802295c4b0
FS: 00007f0d52859700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001000 CR3: 000000007ed87000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Oct 23, 2022, 9:58:33ā€ÆAM10/23/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 23 Oct 2022 03:41:34 -0700
> syzbot found the following issue on:
>
> HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
> dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000

Set status under req_lock in bid to avoid double list_del.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d47136c28015

--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -204,6 +204,7 @@ static void p9_conn_cancel(struct p9_con
list_move(&req->req_list, &cancel_list);
}
list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) {
+ req->status = REQ_STATUS_ERROR;
list_move(&req->req_list, &cancel_list);
}

@@ -705,6 +706,8 @@ static int p9_fd_cancel(struct p9_client
p9_req_put(client, req);
ret = 0;
}
+ else if (req->status == REQ_STATUS_ERROR)
+ ret = 0;
spin_unlock(&m->req_lock);

return ret;
--

syzbot

unread,
Oct 23, 2022, 12:08:22ā€ÆPM10/23/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4079 } 2646 jiffies s: 2653 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: d47136c2 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1613fd6e880000
kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15434bc2880000

Christian Schoenebeck

unread,
Oct 23, 2022, 12:10:08ā€ÆPM10/23/22
to asma...@codewreck.org, da...@davemloft.net, edum...@google.com, eri...@gmail.com, ku...@kernel.org, linux-...@vger.kernel.org, lu...@ionkov.net, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, v9fs-de...@lists.sourceforge.net, syzbot
On Sunday, October 23, 2022 12:41:34 PM CEST syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
> dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/5664e231e97f/disk-d47136c2.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/9bbe0daa4a04/vmlinux-d47136c2.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9b69b8...@syzkaller.appspotmail.com
>
> list_del corruption, ffff88802295c4b0->next is LIST_POISON1 (dead000000000100)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:55!
[...]
> Call Trace:
> <TASK>
> __list_del_entry include/linux/list.h:134 [inline]
> list_del include/linux/list.h:148 [inline]
> p9_fd_cancel+0x9c/0x230 net/9p/trans_fd.c:703

I only had a short cycle on this yet: so the problem is that the req_list list
head is removed twice, which triggers this warning from [lib/list_debug.c].

Probably moving spin_unlock() call back down to the end of function
p9_conn_cancel() might fix this:

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56a186768750..409f0da70c52 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -207,8 +207,6 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
list_move(&req->req_list, &cancel_list);
}

- spin_unlock(&m->req_lock);
-
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
list_del(&req->req_list);
@@ -216,6 +214,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
req->t_err = err;
p9_client_cb(m->client, req, REQ_STATUS_ERROR);
}
+
+ spin_unlock(&m->req_lock);
}

static __poll_t

spin_unlock() was recently moved up a bit to fix a dead lock, however that
dead lock happened with a lock on client level, meanwhile it was converted
into a lock on connection level.

The question is whether that would fix this for good and not just move it,
because there are a bunch of list removal calls that don't check for the
request state or something to prevent a double removal at other places.

Best regards,
Christian Schoenebeck



Hillf Danton

unread,
Oct 23, 2022, 9:23:22ā€ÆPM10/23/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 23 Oct 2022 03:41:34 -0700
> syzbot found the following issue on:
>
> HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
> dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000

Set status under req_lock in bid to avoid double list_del.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git aae703b02f92

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3663,8 +3663,9 @@ static inline bool netif_attr_test_online(unsigned long j,
static inline unsigned int netif_attrmask_next(int n, const unsigned long *srcp,
unsigned int nr_bits)
{
- /* n is a prior cpu */
- cpu_max_bits_warn(n + 1, nr_bits);
+ /* -1 is a legal arg here. */
+ if (n != -1)
+ cpu_max_bits_warn(n, nr_bits);

if (srcp)
return find_next_bit(srcp, nr_bits, n + 1);
@@ -3685,8 +3686,9 @@ static inline int netif_attrmask_next_and(int n, const unsigned long *src1p,
const unsigned long *src2p,
unsigned int nr_bits)
{
- /* n is a prior cpu */
- cpu_max_bits_warn(n + 1, nr_bits);
+ /* -1 is a legal arg here. */
+ if (n != -1)
+ cpu_max_bits_warn(n, nr_bits);

if (src1p && src2p)
return find_next_and_bit(src1p, src2p, nr_bits, n + 1);
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c

syzbot

unread,
Oct 24, 2022, 12:22:26ā€ÆAM10/24/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4081 } 2640 jiffies s: 2857 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: aae703b0 Merge tag 'for-6.1-rc1-tag' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=161be616880000
kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=153d373a880000

Hillf Danton

unread,
Oct 24, 2022, 3:04:37ā€ÆAM10/24/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 23 Oct 2022 03:41:34 -0700
> syzbot found the following issue on:
>
> HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
> dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000

Invoke client callback under req_lock to serialize with canceling request.
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -207,8 +207,6 @@ static void p9_conn_cancel(struct p9_con
list_move(&req->req_list, &cancel_list);
}

- spin_unlock(&m->req_lock);
-
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
list_del(&req->req_list);
@@ -216,6 +214,7 @@ static void p9_conn_cancel(struct p9_con
req->t_err = err;
p9_client_cb(m->client, req, REQ_STATUS_ERROR);
}
+ spin_unlock(&m->req_lock);
}

static __poll_t
--

syzbot

unread,
Oct 24, 2022, 1:31:27ā€ÆPM10/24/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4081 } 2634 jiffies s: 2749 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: d47136c2 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15754f4a880000
kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1338856a880000

Reply all
Reply to author
Forward
0 new messages