[syzbot] [net?] general protection fault in add_wait_queue

8 views
Skip to first unread message

syzbot

unread,
Feb 3, 2025, 4:57:22 AM2/3/25
to da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, mh...@rbox.co, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, sgar...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
Hello,

syzbot found the following issue on:

HEAD commit: c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel...
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16f676b0580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158
dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13300b24580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12418518580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c7667ae12603/disk-c2933b2b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/944ca63002c1/vmlinux-c2933b2b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/30748115bf0b/bzImage-c2933b2b.xz

The issue was bisected to:

commit fcdd2242c0231032fc84e1404315c245ae56322a
Author: Michal Luczaj <mh...@rbox.co>
Date: Tue Jan 28 13:15:27 2025 +0000

vsock: Keep the binding until socket destruction

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=148f5ddf980000
final oops: https://syzkaller.appspot.com/x/report.txt?x=168f5ddf980000
console output: https://syzkaller.appspot.com/x/log.txt?x=128f5ddf980000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9d55b1...@syzkaller.appspotmail.com
Fixes: fcdd2242c023 ("vsock: Keep the binding until socket destruction")

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 1 UID: 0 PID: 5845 Comm: syz-executor865 Not tainted 6.13.0-syzkaller-09685-gc2933b2befe2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:__lock_acquire+0x6a/0x2100 kernel/locking/lockdep.c:5091
Code: b6 04 30 84 c0 0f 85 f8 16 00 00 45 31 f6 83 3d 2b 98 80 0e 00 0f 84 c8 13 00 00 89 54 24 60 89 5c 24 38 4c 89 f8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ff e8 88 26 8b 00 48 be 00 00 00 00 00 fc
RSP: 0018:ffffc9000407f870 EFLAGS: 00010006
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
R10: dffffc0000000000 R11: fffffbfff203680f R12: ffff888035760000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000018
FS: 000055555c9b3380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000044c CR3: 00000000352c0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
add_wait_queue+0x46/0x180 kernel/sched/wait.c:22
virtio_transport_wait_close net/vmw_vsock/virtio_transport_common.c:1200 [inline]
virtio_transport_close net/vmw_vsock/virtio_transport_common.c:1282 [inline]
virtio_transport_release+0x4c4/0xce0 net/vmw_vsock/virtio_transport_common.c:1302
__vsock_release+0xf1/0x4f0 net/vmw_vsock/af_vsock.c:830
vsock_release+0x97/0x100 net/vmw_vsock/af_vsock.c:941
__sock_release net/socket.c:642 [inline]
sock_close+0xbc/0x240 net/socket.c:1393
__fput+0x3e9/0x9f0 fs/file_table.c:450
__do_sys_close fs/open.c:1579 [inline]
__se_sys_close fs/open.c:1564 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1564
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f2406c95400
Code: ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 8c 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffe044a2b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f2406c95400
RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000004
RBP: 00000000000f4240 R08: 0000000000000008 R09: 000000005c9b4610
R10: 0000000020000180 R11: 0000000000000202 R12: 000000000000e3ae
R13: 00007ffe044a2b34 R14: 00007ffe044a2b50 R15: 00007ffe044a2b40
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__lock_acquire+0x6a/0x2100 kernel/locking/lockdep.c:5091
Code: b6 04 30 84 c0 0f 85 f8 16 00 00 45 31 f6 83 3d 2b 98 80 0e 00 0f 84 c8 13 00 00 89 54 24 60 89 5c 24 38 4c 89 f8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ff e8 88 26 8b 00 48 be 00 00 00 00 00 fc
RSP: 0018:ffffc9000407f870 EFLAGS: 00010006
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
R10: dffffc0000000000 R11: fffffbfff203680f R12: ffff888035760000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000018
FS: 000055555c9b3380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000044c CR3: 00000000352c0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: b6 04 mov $0x4,%dh
2: 30 84 c0 0f 85 f8 16 xor %al,0x16f8850f(%rax,%rax,8)
9: 00 00 add %al,(%rax)
b: 45 31 f6 xor %r14d,%r14d
e: 83 3d 2b 98 80 0e 00 cmpl $0x0,0xe80982b(%rip) # 0xe809840
15: 0f 84 c8 13 00 00 je 0x13e3
1b: 89 54 24 60 mov %edx,0x60(%rsp)
1f: 89 5c 24 38 mov %ebx,0x38(%rsp)
23: 4c 89 f8 mov %r15,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 80 3c 30 00 cmpb $0x0,(%rax,%rsi,1) <-- trapping instruction
2e: 74 12 je 0x42
30: 4c 89 ff mov %r15,%rdi
33: e8 88 26 8b 00 call 0x8b26c0
38: 48 rex.W
39: be 00 00 00 00 mov $0x0,%esi
3e: 00 fc add %bh,%ah


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Michal Luczaj

unread,
Feb 3, 2025, 7:36:52 AM2/3/25
to syzbot, da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, sgar...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
> ...

Let me take a look.

Michal

Jeongjun Park

unread,
Feb 3, 2025, 7:50:07 AM2/3/25
to syzbot+9d55b1...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
#syz test git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

---
net/vmw_vsock/virtio_transport_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 7f7de6d88096..655d5f911c84 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1298,7 +1298,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
struct sock *sk = &vsk->sk;
bool remove_sock = true;

- if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET)
+ if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET)
+ && !sock_flag(sk, SOCK_DEAD))
remove_sock = virtio_transport_close(vsk);

if (remove_sock) {
--

syzbot

unread,
Feb 3, 2025, 10:22:04 AM2/3/25
to aha3...@gmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+9d55b1...@syzkaller.appspotmail.com
Tested-by: syzbot+9d55b1...@syzkaller.appspotmail.com

Tested on:

commit: 2014c95a Linux 6.14-rc1
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=165c9764580000
kernel config: https://syzkaller.appspot.com/x/.config?x=b93160eda29d3321
dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=170b43df980000

Note: testing is done by a robot and is best-effort only.

Hillf Danton

unread,
Feb 3, 2025, 7:32:56 PM2/3/25
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 03 Feb 2025 01:57:20 -0800
> syzbot found the following issue on:
>
> HEAD commit: c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel...
> git tree: net-next
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12418518580000

#syz test

--- x/net/vmw_vsock/af_vsock.c
+++ y/net/vmw_vsock/af_vsock.c
@@ -824,13 +824,12 @@ static void __vsock_release(struct sock
*/
lock_sock_nested(sk, level);

- sock_orphan(sk);
-
if (vsk->transport)
vsk->transport->release(vsk);
else if (sock_type_connectible(sk->sk_type))
vsock_remove_sock(vsk);

+ sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK;

skb_queue_purge(&sk->sk_receive_queue);
--

Michal Luczaj

unread,
Feb 3, 2025, 7:39:04 PM2/3/25
to syzbot, da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, sgar...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
On 2/3/25 10:57, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel...
> git tree: net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=16f676b0580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158
> dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13300b24580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12418518580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/c7667ae12603/disk-c2933b2b.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/944ca63002c1/vmlinux-c2933b2b.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/30748115bf0b/bzImage-c2933b2b.xz
>
> The issue was bisected to:
>
> commit fcdd2242c0231032fc84e1404315c245ae56322a
> Author: Michal Luczaj <mh...@rbox.co>
> Date: Tue Jan 28 13:15:27 2025 +0000
>
> vsock: Keep the binding until socket destruction

syzbot is correct (thanks), bisected commit introduced a regression.

sock_orphan(sk) is being called without taking into consideration that it
does `sk->sk_wq = NULL`. Later, if SO_LINGER is set, sk->sk_wq gets
dereferenced in virtio_transport_wait_close().

Repro, as shown by syzbot, is simply
from socket import *
lis = socket(AF_VSOCK, SOCK_STREAM)
lis.bind((1, 1234)) # VMADDR_CID_LOCAL
lis.listen()
s = socket(AF_VSOCK, SOCK_STREAM)
s.setsockopt(SOL_SOCKET, SO_LINGER, (1<<32) | 1)
s.connect(lis.getsockname())
s.close()

A way of fixing this is to put sock_orphan(sk) back where it was before the
breaking patch and instead explicitly flip just the SOCK_DEAD bit, i.e.

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 075695173648..06250bb9afe2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -824,13 +824,14 @@ static void __vsock_release(struct sock *sk, int level)
*/
lock_sock_nested(sk, level);

- sock_orphan(sk);
+ sock_set_flag(sk, SOCK_DEAD);

if (vsk->transport)
vsk->transport->release(vsk);
else if (sock_type_connectible(sk->sk_type))
vsock_remove_sock(vsk);

+ sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK;

skb_queue_purge(&sk->sk_receive_queue);

I'm not sure this is the most elegant code (sock_orphan(sk) sets SOCK_DEAD
on a socket that is already SOCK_DEAD), but here it goes:
https://lore.kernel.org/netdev/20250204-vsock-linger-nu...@rbox.co/

One more note: man socket(7) says lingering also happens on shutdown().
Should vsock follow that?

Thanks,
Michal

syzbot

unread,
Feb 4, 2025, 12:47:04 AM2/4/25
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+9d55b1...@syzkaller.appspotmail.com
Tested-by: syzbot+9d55b1...@syzkaller.appspotmail.com

Tested on:

commit: c2933b2b Merge tag 'net-6.14-rc1' of git://git.kernel...
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=155b9d18580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158
dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=15a96eb0580000

Stefano Garzarella

unread,
Feb 4, 2025, 5:00:05 AM2/4/25
to Michal Luczaj, syzbot, da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
What about the fix proposed here:
https://lore.kernel.org/lkml/20250203124959.1...@gmail.com/

>
>One more note: man socket(7) says lingering also happens on shutdown().
>Should vsock follow that?

Good point, I think so.
IMHO we should handle both of them in af_vsock.c if it's possible, but
maybe we need a bit of refactoring.

Anyway, net-next material, right?

Stefano

Stefano Garzarella

unread,
Feb 4, 2025, 5:04:21 AM2/4/25
to Michal Luczaj, syzbot, da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
mmm, nope, that one will completely bypass the lingering, right?

Stefano

Michal Luczaj

unread,
Feb 4, 2025, 6:58:52 PM2/4/25
to Stefano Garzarella, syzbot, da...@davemloft.net, edum...@google.com, eper...@redhat.com, ho...@kernel.org, jaso...@redhat.com, ku...@kernel.org, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, pab...@redhat.com, stef...@redhat.com, syzkall...@googlegroups.com, virtual...@lists.linux.dev, xuan...@linux.alibaba.com
On 2/4/25 11:04, Stefano Garzarella wrote:
> On Tue, 4 Feb 2025 at 10:59, Stefano Garzarella <sgar...@redhat.com> wrote:
>> On Tue, Feb 04, 2025 at 01:38:50AM +0100, Michal Luczaj wrote:
>>> ...
>>> I'm not sure this is the most elegant code (sock_orphan(sk) sets SOCK_DEAD
>>> on a socket that is already SOCK_DEAD), but here it goes:
>>> https://lore.kernel.org/netdev/20250204-vsock-linger-nu...@rbox.co/
>>
>> What about the fix proposed here:
>> https://lore.kernel.org/lkml/20250203124959.1...@gmail.com/
>
> mmm, nope, that one will completely bypass the lingering, right?

Right. Besides that, it's a transport-specific approach, so all the other
transports would need their .release() tweaked.

>>> One more note: man socket(7) says lingering also happens on shutdown().
>>> Should vsock follow that?
>>
>> Good point, I think so.
>> IMHO we should handle both of them in af_vsock.c if it's possible, but
>> maybe we need a bit of refactoring.
>>
>> Anyway, net-next material, right?

Yeah, I guess.

Michal

Reply all
Reply to author
Forward
0 new messages