WARNING: locking bug in inet_autobind

37 views
Skip to first unread message

syzbot

unread,
May 16, 2019, 1:46:06 AM5/16/19
to a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, ka...@fb.com, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, yosh...@linux-ipv6.org
Hello,

syzbot found the following crash on:

HEAD commit: 35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10e970f4a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+94cc2a...@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734
arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734
arch_local_save_flags arch/x86/include/asm/paravirt.h:760 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734
look_up_lock_class kernel/locking/lockdep.c:725 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734
register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 32543 Comm: syz-executor.4 Not tainted 5.1.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x45 kernel/panic.c:566
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:180 [inline]
fixup_bug arch/x86/kernel/traps.c:175 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:273
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:292
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972
RIP: 0010:look_up_lock_class kernel/locking/lockdep.c:734 [inline]
RIP: 0010:register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Code: 00 48 89 da 4d 8b 76 c0 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80
3c 02 00 0f 85 23 07 00 00 4c 89 33 e9 e3 f4 ff ff 0f 0b <0f> 0b e9 ea f3
ff ff 44 89 e0 4c 8b 95 50 ff ff ff 83 c0 01 4c 8b
RSP: 0018:ffff88806395f9e8 EFLAGS: 00010083
RAX: dffffc0000000000 RBX: ffff8880a947f1e0 RCX: 0000000000000000
RDX: 1ffff1101528fe3f RSI: 0000000000000000 RDI: ffff8880a947f1f8
RBP: ffff88806395fab0 R08: 1ffff1100c72bf45 R09: ffffffff8a459c80
R10: ffffffff8a0e47e0 R11: 0000000000000000 R12: ffffffff8a1235a0
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff87fe4c60
__lock_acquire+0x116/0x5490 kernel/locking/lockdep.c:3673
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4302
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:343 [inline]
lock_sock_nested+0x41/0x120 net/core/sock.c:2917
lock_sock include/net/sock.h:1525 [inline]
inet_autobind+0x20/0x1a0 net/ipv4/af_inet.c:183
inet_dgram_connect+0x252/0x2e0 net/ipv4/af_inet.c:573
__sys_connect+0x266/0x330 net/socket.c:1840
__do_sys_connect net/socket.c:1851 [inline]
__se_sys_connect net/socket.c:1848 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1848
do_syscall_64+0x103/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f695f8b6c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458da9
RDX: 000000000000001c RSI: 0000000020000000 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f695f8b76d4
R13: 00000000004bf1fe R14: 00000000004d04f8 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,
May 21, 2019, 4:31:06 AM5/21/19
to a...@kernel.org, b...@vger.kernel.org, dan...@iogearbox.net, da...@davemloft.net, ka...@fb.com, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, net...@vger.kernel.org, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, yosh...@linux-ipv6.org
syzbot has found a reproducer for the following crash on:

HEAD commit: f49aa1de Merge tag 'for-5.2-rc1-tag' of git://git.kernel.o..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14e5b130a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=fc045131472947d7
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=163731f8a00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+94cc2a...@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734
arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734
arch_local_save_flags arch/x86/include/asm/paravirt.h:760 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734
look_up_lock_class kernel/locking/lockdep.c:725 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734
register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 28592 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x744 kernel/panic.c:218
__warn.cold+0x20/0x4d kernel/panic.c:575
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
RIP: 0010:look_up_lock_class kernel/locking/lockdep.c:734 [inline]
RIP: 0010:register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Code: 00 48 89 da 4d 8b 76 c0 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80
3c 02 00 0f 85 23 07 00 00 4c 89 33 e9 e3 f4 ff ff 0f 0b <0f> 0b e9 ea f3
ff ff 44 89 e0 4c 8b 95 50 ff ff ff 83 c0 01 4c 8b
RSP: 0018:ffff888093d179e8 EFLAGS: 00010083
RAX: dffffc0000000000 RBX: ffff8880967cd160 RCX: 0000000000000000
RDX: 1ffff11012cf9a2f RSI: 0000000000000000 RDI: ffff8880967cd178
RBP: ffff888093d17ab0 R08: 1ffff110127a2f45 R09: ffffffff8a659d40
R10: ffffffff8a2e8440 R11: 0000000000000000 R12: ffffffff8a323030
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff88022ba0
__lock_acquire+0x116/0x5490 kernel/locking/lockdep.c:3673
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4302
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:343 [inline]
lock_sock_nested+0x41/0x120 net/core/sock.c:2917
lock_sock include/net/sock.h:1525 [inline]
inet_autobind+0x20/0x1a0 net/ipv4/af_inet.c:183
inet_dgram_connect+0x243/0x2d0 net/ipv4/af_inet.c:573
__sys_connect+0x264/0x330 net/socket.c:1840
__do_sys_connect net/socket.c:1851 [inline]
__se_sys_connect net/socket.c:1848 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1848
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459279
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2321b1ac78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459279
RDX: 000000000000001c RSI: 0000000020000000 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2321b1b6d4
R13: 00000000004bf74d R14: 00000000004d0c18 R15: 00000000ffffffff

syzbot

unread,
May 21, 2019, 11:16:01 PM5/21/19
to Yong...@amd.com, air...@linux.ie, alexande...@amd.com, amd...@lists.freedesktop.org, a...@kernel.org, b...@vger.kernel.org, christia...@amd.com, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, david...@amd.com, dri-...@lists.freedesktop.org, evan...@amd.com, felix.k...@amd.com, harry.w...@amd.com, ka...@fb.com, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, net...@vger.kernel.org, oz...@amd.com, ray....@amd.com, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, yong...@amd.com, yosh...@linux-ipv6.org
syzbot has bisected this bug to:

commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
Author: Yong Zhao <Yong...@amd.com>
Date: Fri Feb 1 23:36:21 2019 +0000

drm/amdgpu: Delete user queue doorbell variables

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
start commit: f49aa1de Merge tag 'for-5.2-rc1-tag' of git://git.kernel.o..
git tree: net-next
final crash: https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=163731f8a00000

Reported-by: syzbot+94cc2a...@syzkaller.appspotmail.com
Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Zhao, Yong

unread,
May 21, 2019, 11:21:22 PM5/21/19
to syzbot, air...@linux.ie, Deucher, Alexander, amd...@lists.freedesktop.org, a...@kernel.org, b...@vger.kernel.org, Koenig, Christian, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, Zhou, David(ChunMing), dri-...@lists.freedesktop.org, Quan, Evan, Kuehling, Felix, Wentland, Harry, ka...@fb.com, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, net...@vger.kernel.org, Zeng, Oak, Huang, Ray, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, y...@fb.com, yosh...@linux-ipv6.org
This commit was reverted later. I guess the revert was probably not picked up properly.

Regards,
Yong

From: syzbot <syzbot+94cc2a...@syzkaller.appspotmail.com>
Sent: Tuesday, May 21, 2019 11:16 PM
To: Zhao, Yong; air...@linux.ie; Deucher, Alexander; amd...@lists.freedesktop.org; a...@kernel.org; b...@vger.kernel.org; Koenig, Christian; dan...@ffwll.ch; dan...@iogearbox.net; da...@davemloft.net; Zhou, David(ChunMing); dri-...@lists.freedesktop.org; Quan, Evan; Kuehling, Felix; Wentland, Harry; ka...@fb.com; kuz...@ms2.inr.ac.ru; linux-...@vger.kernel.org; net...@vger.kernel.org; Zeng, Oak; Huang, Ray; rex...@amd.com; songliu...@fb.com; syzkall...@googlegroups.com; y...@fb.com; Zhao, Yong; yosh...@linux-ipv6.org
Subject: Re: WARNING: locking bug in inet_autobind
 
[CAUTION: External Email]

Tetsuo Handa

unread,
Sep 18, 2022, 11:53:08 AM9/18/22
to Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, net...@vger.kernel.org, syzbot, syzkall...@googlegroups.com
syzbot is reporting locking bug in inet_autobind(), for
commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
calling

lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")

in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
How can we fix this problem?

------------[ cut here ]------------
class->name=slock-AF_INET6 lock->name=l2tp_sock lock->key=l2tp_socket_class
WARNING: CPU: 2 PID: 9237 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
Modules linked in:
CPU: 2 PID: 9237 Comm: a.out Not tainted 6.0.0-rc5-00094-ga335366bad13-dirty #860
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:look_up_lock_class+0xcc/0x140

On 2019/05/16 14:46, syzbot wrote:
> HEAD commit:    35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e970f4a00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
> dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

C reproducer is available at
https://syzkaller.appspot.com/text?tag=ReproC&x=15062310080000 .

Boqun Feng

unread,
Sep 18, 2022, 2:25:46 PM9/18/22
to Tetsuo Handa, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, net...@vger.kernel.org, syzbot, syzkall...@googlegroups.com
On Mon, Sep 19, 2022 at 12:52:45AM +0900, Tetsuo Handa wrote:
> syzbot is reporting locking bug in inet_autobind(), for
> commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
> calling
>
> lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
>
> in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
> How can we fix this problem?
>

Just a theory, it seems that we have a memory corruption happened for
lockdep_set_class_and_name(), in l2tp_tunnel_register(), the "sk" gets
published before lockdep_set_class_and_name():

tunnel->sock = sk;
...
lockdep_set_class_and_name(&sk->sk_lock.slock,...);

And what could happen is that sock_lock_init() races with the
l2tp_tunnel_register(), which results into two
lockdep_set_class_and_name()s race with each other.

Anyway, "sk" should not be published until its lock gets properly
initialized, could you try the following (untested)? Looks to me all
other code around the lockdep_set_class_and_name() should be moved
upwards, but I don't want to pretend I'm an expert ;-)

Regards,
Boqun

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..1a01d23abc53 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1480,7 +1480,9 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

sk = sock->sk;
sock_hold(sk);
- tunnel->sock = sk;
+ lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
+ "l2tp_sock");
+ smp_store_release(&tunnel->sock, sk);

spin_lock_bh(&pn->l2tp_tunnel_list_lock);
list_for_each_entry(tunnel_walk, &pn->l2tp_tunnel_list, list) {
@@ -1509,8 +1511,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

tunnel->old_sk_destruct = sk->sk_destruct;
sk->sk_destruct = &l2tp_tunnel_destruct;
- lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
- "l2tp_sock");
sk->sk_allocation = GFP_ATOMIC;

trace_register_tunnel(tunnel);

Tetsuo Handa

unread,
Sep 19, 2022, 1:02:49 AM9/19/22
to Boqun Feng, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, net...@vger.kernel.org, syzbot, syzkall...@googlegroups.com
On 2022/09/19 3:25, Boqun Feng wrote:
> On Mon, Sep 19, 2022 at 12:52:45AM +0900, Tetsuo Handa wrote:
>> syzbot is reporting locking bug in inet_autobind(), for
>> commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
>> calling
>>
>> lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
>>
>> in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
>> How can we fix this problem?
>>
>
> Just a theory, it seems that we have a memory corruption happened for
> lockdep_set_class_and_name(), in l2tp_tunnel_register(), the "sk" gets
> published before lockdep_set_class_and_name():
>
> tunnel->sock = sk;
> ...
> lockdep_set_class_and_name(&sk->sk_lock.slock,...);
>
> And what could happen is that sock_lock_init() races with the
> l2tp_tunnel_register(), which results into two
> lockdep_set_class_and_name()s race with each other.
>
> Anyway, "sk" should not be published until its lock gets properly
> initialized, could you try the following (untested)? Looks to me all
> other code around the lockdep_set_class_and_name() should be moved
> upwards, but I don't want to pretend I'm an expert ;-)

This diff did not help.

------------[ cut here ]------------
Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
WARNING: CPU: 1 PID: 14195 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
Modules linked in:
CPU: 1 PID: 14195 Comm: a.out Not tainted 6.0.0-rc6-dirty #863
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:look_up_lock_class+0xcc/0x140

A roughly simplified reproducer (be unlikely able to reproduce) is shown below.

----------------------------------------
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/if_pppox.h>

int main(int argc, char *argv[])
{
const int fd0 = socket(AF_PPPOX, SOCK_STREAM, 1);
const int fd1 = socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP);
struct sockaddr_pppol2tp addr0 = {
.sa_family = AF_PPPOX, .sa_protocol = 1, .pppol2tp.fd = fd1, /* AF_INET6 UDP socket. */
.pppol2tp.addr.sin_port = htons(1),
.pppol2tp.addr.sin_addr = htonl(INADDR_LOOPBACK),
.pppol2tp.s_tunnel = 2
};
struct sockaddr_in6 addr1 = { .sin6_family = AF_INET6, .sin6_port = htons(0), .sin6_addr = in6addr_loopback };
if (fork() == 0) {
connect(fd1, (struct sockaddr *) &addr1, sizeof(addr1)); /* Invoke inet_autobind() due to .sin6_port = htons(0). */
_exit(0);
}
connect(fd0, (struct sockaddr *) &addr0, sizeof(addr0)); /* Call lockdep_set_class_and_name(sk) of already published fd1. */
return 0;
}
----------------------------------------

The reproducer is creating two file descriptors via socket(AF_PPPOX, SOCK_STREAM, 1)
and socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP).

The connect() on AF_PPPOX socket calls l2tp_tunnel_register() via pppol2tp_connect().
l2tp_tunnel_register() changes an already published socket's "sk" which can be reached
via file descriptor using sockfd_lookup(). And for this reproducer, a "sk" created via
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) is modified by the connect() on AF_PPPOX socket.

But since this file descriptor is visible to userspace, the userspace can concurrently
call connect() on AF_INET6 socket (which invokes inet_autobind() by passing port == 0)
using this file descriptor. As a result, spin_lock_bh(&sk->sk_lock.slock) from
lock_sock_nested(sk) from lock_sock(sk) from inet_autobind() from inet_dgram_connect()
finds that there already is a class "slock-AF_INET6" which would have been a normal
result if l2tp_tunnel_register() did not call
lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
on this AF_INET6 socket.

It seems like a race condition, for a debug printk() patch shown below suggested that
this happens when lock_sock(sk) and lockdep_set_class_and_name(&sk->sk_lock.slock) ran
in parallel.

----------------------------------------
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3ca0cc467886..57b31d06b0e1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -174,6 +174,8 @@ static int inet_autobind(struct sock *sk)
{
struct inet_sock *inet;
/* We may need to bind the socket. */
+ if (!strcmp(current->comm, "a.out"))
+ pr_info("inet_autobind(sk=%px)\n", sk);
lock_sock(sk);
inet = inet_sk(sk);
if (!inet->inet_num) {
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..1bb14b19bca0 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1509,8 +1509,12 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

tunnel->old_sk_destruct = sk->sk_destruct;
sk->sk_destruct = &l2tp_tunnel_destruct;
+ if (!strcmp(current->comm, "a.out"))
+ pr_info("l2tp_tunnel_register(sk=%px) before\n", sk);
lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
"l2tp_sock");
+ if (!strcmp(current->comm, "a.out"))
+ pr_info("l2tp_tunnel_register(sk=%px) after\n", sk);
sk->sk_allocation = GFP_ATOMIC;

trace_register_tunnel(tunnel);
----------------------------------------

----------------------------------------
[ 229.873612][T41464] l2tp_core: l2tp_tunnel_register(sk=ffff8880148a7800) before
[ 229.873619][T41464] l2tp_core: l2tp_tunnel_register(sk=ffff8880148a7800) after
[ 229.873654][T41465] IPv4: inet_autobind(sk=ffff8880148a7800)
[ 229.879263][T41468] IPv4: inet_autobind(sk=ffff8880d63a1e00)
[ 229.879264][T41467] l2tp_core: l2tp_tunnel_register(sk=ffff8880d63a1e00) before
[ 229.879272][T41468] ------------[ cut here ]------------
[ 229.879272][T41467] l2tp_core: l2tp_tunnel_register(sk=ffff8880d63a1e00) after
[ 229.879275][T41468] Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
[ 229.879932][T41450] l2tp_core: l2tp_tunnel_register(sk=ffff88807c416180) after
[ 229.882029][T41468] WARNING: CPU: 0 PID: 41468 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
[ 229.888126][T41471] IPv4: inet_autobind(sk=ffff88807c410000)
[ 229.888126][T41470] l2tp_core: l2tp_tunnel_register(sk=ffff88807c410000) before
[ 229.888134][T41470] l2tp_core: l2tp_tunnel_register(sk=ffff88807c410000) after
[ 229.889140][T41468] Modules linked in:
[ 230.006548][T41468] CPU: 0 PID: 41468 Comm: a.out Not tainted 6.0.0-rc6-00001-g7def00e9a851-dirty #871
[ 230.009327][T41468] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 230.012117][T41468] RIP: 0010:look_up_lock_class+0xcc/0x140
[ 230.014633][T41468] Code: 8b 17 48 c7 c0 90 42 4b 88 48 39 c2 74 c4 f6 05 dd 31 dc 01 01 75 bb c6 05 d4 31 dc 01 01 48 c7 c7 26 5e f3 85 e8 f4 17 4c fc <0f> 0b eb a4 e8 5b c1 93 fd 48 c7 c7 fd 4c 19 86 89 de e8 c5 06 ff
[ 230.020534][T41468] RSP: 0018:ffffc90013bc3ba0 EFLAGS: 00010046
[ 230.023183][T41468] RAX: 4ca7765a49bbb600 RBX: ffffffff8837db90 RCX: ffff8880d5ddd580
[ 230.025998][T41468] RDX: 0000000000000000 RSI: 0000000080000201 RDI: 0000000000000000
[ 230.028984][T41468] RBP: 0000000000000001 R08: ffffffff8136457a R09: 0000000000000000
[ 230.031785][T41468] R10: ffffffff81366013 R11: ffff8880d5ddd580 R12: 0000000000000000
[ 230.034512][T41468] R13: ffff8880d63a1eb0 R14: 0000000000000000 R15: 0000000000000000
[ 230.037347][T41468] FS: 00007efccdb44640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 230.040207][T41468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 230.042940][T41468] CR2: 00007efccdb43ef8 CR3: 0000000011a99000 CR4: 00000000000506f0
[ 230.045741][T41468] Call Trace:
[ 230.048282][T41468] <TASK>
[ 230.050869][T41468] register_lock_class+0x48/0x300
[ 230.053474][T41468] __lock_acquire+0x87/0x3340
[ 230.056057][T41468] ? __lock_acquire+0x65f/0x3340
[ 230.058852][T41468] ? console_trylock_spinning+0x187/0x2c0
[ 230.061637][T41468] lock_acquire+0xc6/0x1d0
[ 230.064189][T41468] ? lock_sock_nested+0x56/0xa0
[ 230.066753][T41468] ? lock_sock_nested+0x56/0xa0
[ 230.069337][T41468] _raw_spin_lock_bh+0x31/0x40
[ 230.071879][T41468] ? lock_sock_nested+0x56/0xa0
[ 230.074527][T41468] lock_sock_nested+0x56/0xa0
[ 230.077195][T41468] inet_dgram_connect+0xd7/0x1c0
[ 230.079829][T41468] __sys_connect+0x137/0x150
[ 230.082440][T41468] ? syscall_enter_from_user_mode+0x2e/0x1d0
[ 230.085198][T41468] ? lockdep_hardirqs_on+0x8d/0x130
[ 230.087957][T41468] __x64_sys_connect+0x18/0x20
[ 230.090690][T41468] do_syscall_64+0x3d/0x90
[ 230.093232][T41468] entry_SYSCALL_64_after_hwframe+0x63/0xcd
----------------------------------------

But unfortunately reordering

tunnel->sock = sk;
...
lockdep_set_class_and_name(&sk->sk_lock.slock,...);

by

lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
smp_store_release(&tunnel->sock, sk);

does not help, for connect() on AF_INET6 socket is not finding this "sk" by
accessing tunnel->sock.

syzbot

unread,
Sep 19, 2022, 3:49:19 AM9/19/22
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+94cc2a...@syzkaller.appspotmail.com

Tested on:

commit: 521a547c Linux 6.0-rc6
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11a242f8880000
kernel config: https://syzkaller.appspot.com/x/.config?x=16251d3dc40f0261
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1560b580880000

Note: testing is done by a robot and is best-effort only.

Tetsuo Handa

unread,
Sep 27, 2022, 9:00:39 AM9/27/22
to Boqun Feng, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, net...@vger.kernel.org, syzbot, syzkall...@googlegroups.com
On 2022/09/19 14:02, Tetsuo Handa wrote:
> But unfortunately reordering
>
> tunnel->sock = sk;
> ...
> lockdep_set_class_and_name(&sk->sk_lock.slock,...);
>
> by
>
> lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
> smp_store_release(&tunnel->sock, sk);
>
> does not help, for connect() on AF_INET6 socket is not finding this "sk" by
> accessing tunnel->sock.
>

I considered something like below diff, but I came to think that this problem
cannot be solved unless l2tp_tunnel_register() stops using userspace-supplied
file descriptor and starts always calling l2tp_tunnel_sock_create(), for
userspace can continue using userspace-supplied file descriptor as if a normal
socket even after lockdep_set_class_and_name() told that this is a tunneling
socket.

Since userspace-supplied file descriptor has to be a datagram socket,
can we somehow copy the source/destination addresses from
userspace-supplied socket to kernel-created socket?


diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..07429bed7c4c 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1382,8 +1382,6 @@ static int l2tp_tunnel_sock_create(struct net *net,
return err;
}

-static struct lock_class_key l2tp_socket_class;
-
int l2tp_tunnel_create(int fd, int version, u32 tunnel_id, u32 peer_tunnel_id,
struct l2tp_tunnel_cfg *cfg, struct l2tp_tunnel **tunnelp)
{
@@ -1509,8 +1507,20 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

tunnel->old_sk_destruct = sk->sk_destruct;
sk->sk_destruct = &l2tp_tunnel_destruct;
- lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
- "l2tp_sock");
+ if (IS_ENABLED(CONFIG_LOCKDEP)) {
+ static struct lock_class_key l2tp_socket_class;
+
+ /* Changing class/name of an already visible sock might race
+ * with first lock_sock() call on that sock. In order to make
+ * sure that register_lock_class() has completed before
+ * lockdep_set_class_and_name() changes class/name, explicitly
+ * lock/release that sock.
+ */
+ lock_sock(sk);
+ release_sock(sk);
+ lockdep_set_class_and_name(&sk->sk_lock.slock,
+ &l2tp_socket_class, "l2tp_sock");
+ }

Jakub Sitnicki

unread,
Nov 22, 2022, 1:31:32 PM11/22/22
to Eric Dumazet, Tetsuo Handa, Boqun Feng, David S. Miller, Jakub Kicinski, Paolo Abeni, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, net...@vger.kernel.org, syzbot, syzkall...@googlegroups.com
What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
a lockdep splat") and:

1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
like an odd case within the network stack, and

2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
break what has been fixed in 37159ef2c1ae.

Eric, WDYT?

syzbot

unread,
Dec 29, 2022, 1:26:38 AM12/29/22
to Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Felix.K...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, alexande...@amd.com, amd...@lists.freedesktop.org, a...@kernel.org, boqun...@gmail.com, b...@vger.kernel.org, christia...@amd.com, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, david...@amd.com, dri-...@lists.freedesktop.org, dsa...@kernel.org, edum...@google.com, evan...@amd.com, felix.k...@amd.com, gautamme...@gmail.com, harry.w...@amd.com, ja...@cloudflare.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, lon...@redhat.com, mi...@redhat.com, net...@vger.kernel.org, oz...@amd.com, pab...@redhat.com, penguin...@i-love.sakura.ne.jp, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, ray....@amd.com, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, wi...@kernel.org, y...@fb.com, yong...@amd.com, yosh...@linux-ipv6.org
syzbot has found a reproducer for the following issue on:

HEAD commit: 1b929c02afd3 Linux 6.2-rc1
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=145c6a68480000
kernel config: https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e13e32480000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13790f08480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d1849f1ca322/disk-1b929c02.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/924cb8aa4ada/vmlinux-1b929c02.xz
kernel image: https://storage.googleapis.com/syzbot-assets/8c7330dae0a0/bzImage-1b929c02.xz

The issue was bisected to:

commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
Author: Yong Zhao <Yong...@amd.com>
Date: Fri Feb 1 23:36:21 2019 +0000

drm/amdgpu: Delete user queue doorbell variables

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+94cc2a...@syzkaller.appspotmail.com
Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")

------------[ cut here ]------------
Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
WARNING: CPU: 0 PID: 7280 at kernel/locking/lockdep.c:937 look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
Modules linked in:
CPU: 0 PID: 7280 Comm: syz-executor835 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
Code: 17 48 81 fa e0 e5 f6 8f 74 59 80 3d 5d bc 57 04 00 75 50 48 c7 c7 00 4d 4c 8a 48 89 04 24 c6 05 49 bc 57 04 01 e8 a9 42 b9 ff <0f> 0b 48 8b 04 24 eb 31 9c 5a 80 e6 02 74 95 e8 45 38 02 fa 85 c0
RSP: 0018:ffffc9000b5378b8 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffffffff91c06a00 RCX: 0000000000000000
RDX: ffff8880292d0000 RSI: ffffffff8166721c RDI: fffff520016a6f09
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000201 R11: 20676e696b6f6f4c R12: 0000000000000000
R13: ffff88802a5820b0 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f1fd7a97700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000078ab4000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
register_lock_class+0xbe/0x1120 kernel/locking/lockdep.c:1289
__lock_acquire+0x109/0x56d0 kernel/locking/lockdep.c:4934
lock_acquire kernel/locking/lockdep.c:5668 [inline]
lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
lock_sock_nested+0x5f/0xf0 net/core/sock.c:3473
lock_sock include/net/sock.h:1725 [inline]
inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:177
inet_send_prepare net/ipv4/af_inet.c:813 [inline]
inet_send_prepare+0x325/0x4e0 net/ipv4/af_inet.c:807
inet6_sendmsg+0x43/0xe0 net/ipv6/af_inet6.c:655
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
__sys_sendto+0x23a/0x340 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f1fd78538b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f1fd7a971f8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f1fd78f0038 RCX: 00007f1fd78538b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f1fd78f0030 R08: 0000000020000100 R09: 000000000000001c
R10: 0000000004008000 R11: 0000000000000212 R12: 00007f1fd78f003c
R13: 00007f1fd79ffc8f R14: 00007f1fd7a97300 R15: 0000000000022000
</TASK>

Hillf Danton

unread,
Dec 29, 2022, 5:16:18 AM12/29/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 28 Dec 2022 22:26:36 -0800
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 1b929c02afd3 Linux 6.2-rc1
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13790f08480000

Update lock key only for newly created sock.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 1b929c02afd3

--- x/net/l2tp/l2tp_core.c
+++ y/net/l2tp/l2tp_core.c
@@ -1460,6 +1460,7 @@ int l2tp_tunnel_register(struct l2tp_tun
struct socket *sock;
struct sock *sk;
int ret;
+ int lookup = 0;

if (tunnel->fd < 0) {
ret = l2tp_tunnel_sock_create(net, tunnel->tunnel_id,
@@ -1471,6 +1472,7 @@ int l2tp_tunnel_register(struct l2tp_tun
sock = sockfd_lookup(tunnel->fd, &ret);
if (!sock)
goto err;
+ lookup = 1;
}

sk = sock->sk;
@@ -1512,8 +1514,8 @@ int l2tp_tunnel_register(struct l2tp_tun

tunnel->old_sk_destruct = sk->sk_destruct;
sk->sk_destruct = &l2tp_tunnel_destruct;
- lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
- "l2tp_sock");
+ if (!lookup)
+ lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
sk->sk_allocation = GFP_ATOMIC;

trace_register_tunnel(tunnel);
--

syzbot

unread,
Dec 29, 2022, 5:43:23 AM12/29/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5564 } 2687 jiffies s: 2885 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 1b929c02 Linux 6.2-rc1
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=124c2632480000
kernel config: https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16485ff2480000

syzbot

unread,
Jan 2, 2023, 11:44:20 PM1/2/23
to syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5565 } 2667 jiffies s: 2837 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 8ebb40bc l2tp: fix a lockdep warning
git tree: https://github.com/congwang/linux.git net
console output: https://syzkaller.appspot.com/x/log.txt?x=1612a062480000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ca07260bb631fb4
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.

Felix Kuehling

unread,
Jan 3, 2023, 10:45:11 AM1/3/23
to syzbot, Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, amd...@lists.freedesktop.org, a...@kernel.org, boqun...@gmail.com, b...@vger.kernel.org, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, dri-...@lists.freedesktop.org, dsa...@kernel.org, edum...@google.com, gautamme...@gmail.com, ja...@cloudflare.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, lon...@redhat.com, mi...@redhat.com, net...@vger.kernel.org, oz...@amd.com, pab...@redhat.com, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, wi...@kernel.org, y...@fb.com, yosh...@linux-ipv6.org
The regression point doesn't make sense. The kernel config doesn't
enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU
could have caused this regression.

Regards,
  Felix

Felix Kuehling

unread,
Jan 3, 2023, 11:20:14 AM1/3/23
to Waiman Long, syzbot, Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, amd...@lists.freedesktop.org, a...@kernel.org, boqun...@gmail.com, b...@vger.kernel.org, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, dri-...@lists.freedesktop.org, dsa...@kernel.org, edum...@google.com, gautamme...@gmail.com, ja...@cloudflare.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, mi...@redhat.com, net...@vger.kernel.org, oz...@amd.com, pab...@redhat.com, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, wi...@kernel.org, y...@fb.com, yosh...@linux-ipv6.org

Am 2023-01-03 um 11:05 schrieb Waiman Long:
> On 1/3/23 10:39, Felix Kuehling wrote:
>> The regression point doesn't make sense. The kernel config doesn't
>> enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU
>> could have caused this regression.
>>
> I agree. It is likely a pre-existing problem or caused by another
> commit that got triggered because of the change in cacheline alignment
> caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell
> variable").
I don't think the change can affect cache line alignment. The entire
amdgpu driver doesn't even get compiled in the kernel config that was
used, and the change doesn't touch any files outside
drivers/gpu/drm/amd/amdgpu:

# CONFIG_DRM_AMDGPU is not set

My guess would be that it's an intermittent bug that is confusing bisect.

Regards,
  Felix


>
> Cheers,
> Longman

Waiman Long

unread,
Jan 3, 2023, 11:44:12 AM1/3/23
to Felix Kuehling, syzbot, Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, amd...@lists.freedesktop.org, a...@kernel.org, boqun...@gmail.com, b...@vger.kernel.org, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, dri-...@lists.freedesktop.org, dsa...@kernel.org, edum...@google.com, gautamme...@gmail.com, ja...@cloudflare.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, linux-...@vger.kernel.org, mi...@redhat.com, net...@vger.kernel.org, oz...@amd.com, pab...@redhat.com, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, rex...@amd.com, songliu...@fb.com, syzkall...@googlegroups.com, wi...@kernel.org, y...@fb.com, yosh...@linux-ipv6.org
On 1/3/23 10:39, Felix Kuehling wrote:
> The regression point doesn't make sense. The kernel config doesn't
> enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU
> could have caused this regression.
>
I agree. It is likely a pre-existing problem or caused by another commit
that got triggered because of the change in cacheline alignment caused
by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").

Cheers,
Longman

syzbot

unread,
Jan 3, 2023, 12:20:16 PM1/3/23
to syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5563 } 2636 jiffies s: 2801 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 8ebb40bc l2tp: fix a lockdep warning
git tree: https://github.com/congwang/linux.git net
console output: https://syzkaller.appspot.com/x/log.txt?x=15f5351a480000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ca07260bb631fb4
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Tetsuo Handa

unread,
Jan 3, 2023, 5:08:13 PM1/3/23
to Felix Kuehling, Waiman Long, edum...@google.com, ja...@cloudflare.com, syzkall...@googlegroups.com, net...@vger.kernel.org, syzbot, Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, a...@kernel.org, boqun...@gmail.com, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, dsa...@kernel.org, gautamme...@gmail.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, mi...@redhat.com, oz...@amd.com, pab...@redhat.com, pet...@infradead.org, rex...@amd.com, songliu...@fb.com, wi...@kernel.org, y...@fb.com, yosh...@linux-ipv6.org
On 2023/01/04 1:20, Felix Kuehling wrote:
>
> Am 2023-01-03 um 11:05 schrieb Waiman Long:
>> On 1/3/23 10:39, Felix Kuehling wrote:
>>> The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression.
>>>
>> I agree. It is likely a pre-existing problem or caused by another commit that got triggered because of the change in cacheline alignment caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").
> I don't think the change can affect cache line alignment. The entire amdgpu driver doesn't even get compiled in the kernel config that was used, and the change doesn't touch any files outside drivers/gpu/drm/amd/amdgpu:
>
> # CONFIG_DRM_AMDGPU is not set
>
> My guess would be that it's an intermittent bug that is confusing bisect.
>
> Regards,
> Felix

This was already explained in https://groups.google.com/g/syzkaller-bugs/c/1rmGDmbXWIw/m/nIQm0EmxBAAJ .

Jakub Sitnicki suggested

What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
a lockdep splat") and:

1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
like an odd case within the network stack, and

2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
break what has been fixed in 37159ef2c1ae.

and we are waiting for response from Eric Dumazet.

Eric Dumazet

unread,
Jan 3, 2023, 6:39:08 PM1/3/23
to Tetsuo Handa, Felix Kuehling, Waiman Long, ja...@cloudflare.com, syzkall...@googlegroups.com, net...@vger.kernel.org, syzbot, Alexande...@amd.com, Christia...@amd.com, David...@amd.com, Evan...@amd.com, Harry.W...@amd.com, Oak....@amd.com, Ray....@amd.com, Yong...@amd.com, air...@linux.ie, a...@kernel.org, boqun...@gmail.com, dan...@ffwll.ch, dan...@iogearbox.net, da...@davemloft.net, dsa...@kernel.org, gautamme...@gmail.com, ka...@fb.com, ku...@kernel.org, kuz...@ms2.inr.ac.ru, mi...@redhat.com, oz...@amd.com, pab...@redhat.com, pet...@infradead.org, rex...@amd.com, songliu...@fb.com, wi...@kernel.org, y...@fb.com, yosh...@linux-ipv6.org
Eric Dumazet has been very busy.

Send a patch, instead of an idea/description.

Thanks.
Reply all
Reply to author
Forward
0 new messages