[syzbot] [kernel?] general protection fault in try_to_wake_up (3)

4 views
Skip to first unread message

syzbot

unread,
Sep 2, 2025, 4:54:35 PM (2 days ago) Sep 2
to andre...@igalia.com, da...@stgolabs.net, dvh...@infradead.org, linux-...@vger.kernel.org, mi...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following issue on:

HEAD commit: 5c3b3264e585 Merge tag 'x86_urgent_for_v6.17_rc4' of git:/..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12e1ae34580000
kernel config: https://syzkaller.appspot.com/x/.config?x=bd9738e00c1bbfb4
dashboard link: https://syzkaller.appspot.com/bug?extid=034246a838a10d181e78
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f6a1f0580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/37953b384dff/disk-5c3b3264.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/df5cc1c4e51d/vmlinux-5c3b3264.xz
kernel image: https://storage.googleapis.com/syzbot-assets/2ed6195eae9f/bzImage-5c3b3264.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+034246...@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc000000014b: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000a58-0x0000000000000a5f]
CPU: 1 UID: 0 PID: 6293 Comm: syz.0.60 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:kasan_byte_accessible+0x12/0x30 mm/kasan/generic.c:199
Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 e9 d0 9f dc 08 cc 66 66 66 66 66 66 2e
RSP: 0018:ffffc9000157f7e0 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: ffffffff8af9dfe7 RCX: e1dbfc1ee2ae4a00
RDX: 0000000000000000 RSI: ffffffff8af9dfe7 RDI: 000000000000014b
RBP: ffffffff81908477 R08: 0000000000000001 R09: 0000000000000000
R10: dffffc0000000000 R11: fffffbfff1e3a947 R12: 0000000000000000
R13: 0000000000000a58 R14: 0000000000000a58 R15: 0000000000000001
FS: 00007ff6ed61d6c0(0000) GS:ffff8881269c2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff6ed61cf40 CR3: 0000000027554000 CR4: 00000000003526f0
Call Trace:
<TASK>
__kasan_check_byte+0x12/0x40 mm/kasan/common.c:567
kasan_check_byte include/linux/kasan.h:399 [inline]
lock_acquire+0x8d/0x360 kernel/locking/lockdep.c:5842
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xa7/0xf0 kernel/locking/spinlock.c:162
class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:557 [inline]
try_to_wake_up+0x67/0x12b0 kernel/sched/core.c:4216
requeue_pi_wake_futex+0x24b/0x2f0 kernel/futex/requeue.c:249
futex_proxy_trylock_atomic kernel/futex/requeue.c:340 [inline]
futex_requeue+0x135f/0x1870 kernel/futex/requeue.c:498
do_futex+0x362/0x420 kernel/futex/syscalls.c:-1
__do_sys_futex kernel/futex/syscalls.c:179 [inline]
__se_sys_futex+0x36f/0x400 kernel/futex/syscalls.c:160
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff6edfcebe9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff6ed61d038 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: ffffffffffffffda RBX: 00007ff6ee206090 RCX: 00007ff6edfcebe9
RDX: 0000000000000001 RSI: 000000000000000c RDI: 000020000000cffc
RBP: 00007ff6ee051e19 R08: 0000200000048000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff6ee206128 R14: 00007ff6ee206090 R15: 00007ffd53c7a368
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kasan_byte_accessible+0x12/0x30 mm/kasan/generic.c:199
Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 e9 d0 9f dc 08 cc 66 66 66 66 66 66 2e
RSP: 0018:ffffc9000157f7e0 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: ffffffff8af9dfe7 RCX: e1dbfc1ee2ae4a00
RDX: 0000000000000000 RSI: ffffffff8af9dfe7 RDI: 000000000000014b
RBP: ffffffff81908477 R08: 0000000000000001 R09: 0000000000000000
R10: dffffc0000000000 R11: fffffbfff1e3a947 R12: 0000000000000000
R13: 0000000000000a58 R14: 0000000000000a58 R15: 0000000000000001
FS: 00007ff6ed61d6c0(0000) GS:ffff8881269c2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff6ed61cf40 CR3: 0000000027554000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
16: 90 nop
17: 90 nop
18: 66 0f 1f 00 nopw (%rax)
1c: 48 c1 ef 03 shr $0x3,%rdi
20: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
27: fc ff df
* 2a: 0f b6 04 07 movzbl (%rdi,%rax,1),%eax <-- trapping instruction
2e: 3c 08 cmp $0x8,%al
30: 0f 92 c0 setb %al
33: e9 d0 9f dc 08 jmp 0x8dca008
38: cc int3
39: 66 data16
3a: 66 data16
3b: 66 data16
3c: 66 data16
3d: 66 data16
3e: 66 data16
3f: 2e cs


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Peter Zijlstra

unread,
Sep 2, 2025, 5:46:38 PM (2 days ago) Sep 2
to syzbot, andre...@igalia.com, da...@stgolabs.net, dvh...@infradead.org, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, Sebastian Andrzej Siewior
On Tue, Sep 02, 2025 at 01:54:33PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 5c3b3264e585 Merge tag 'x86_urgent_for_v6.17_rc4' of git:/..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12e1ae34580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=bd9738e00c1bbfb4
> dashboard link: https://syzkaller.appspot.com/bug?extid=034246a838a10d181e78
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f6a1f0580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/37953b384dff/disk-5c3b3264.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/df5cc1c4e51d/vmlinux-5c3b3264.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/2ed6195eae9f/bzImage-5c3b3264.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+034246...@syzkaller.appspotmail.com
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000014b: 0000 [#1] SMP KASAN PTI
> KASAN: null-ptr-deref in range [0x0000000000000a58-0x0000000000000a5f]

When I build the provided .config with clang-20, that a58 offset is
exactly task_struct::pi_lock::lockdep_map, which nicely corresponds with
the below stacktrace, and seems to suggest someone did:
try_to_wake_up(NULL).

> CPU: 1 UID: 0 PID: 6293 Comm: syz.0.60 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> RIP: 0010:kasan_byte_accessible+0x12/0x30 mm/kasan/generic.c:199
> Code: 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 e9 d0 9f dc 08 cc 66 66 66 66 66 66 2e
> RSP: 0018:ffffc9000157f7e0 EFLAGS: 00010006
> RAX: dffffc0000000000 RBX: ffffffff8af9dfe7 RCX: e1dbfc1ee2ae4a00
> RDX: 0000000000000000 RSI: ffffffff8af9dfe7 RDI: 000000000000014b
> RBP: ffffffff81908477 R08: 0000000000000001 R09: 0000000000000000
> R10: dffffc0000000000 R11: fffffbfff1e3a947 R12: 0000000000000000
> R13: 0000000000000a58 R14: 0000000000000a58 R15: 0000000000000001
> FS: 00007ff6ed61d6c0(0000) GS:ffff8881269c2000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff6ed61cf40 CR3: 0000000027554000 CR4: 00000000003526f0
> Call Trace:
> <TASK>
> __kasan_check_byte+0x12/0x40 mm/kasan/common.c:567
> kasan_check_byte include/linux/kasan.h:399 [inline]
> lock_acquire+0x8d/0x360 kernel/locking/lockdep.c:5842
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0xa7/0xf0 kernel/locking/spinlock.c:162
> class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:557 [inline]
> try_to_wake_up+0x67/0x12b0 kernel/sched/core.c:4216
> requeue_pi_wake_futex+0x24b/0x2f0 kernel/futex/requeue.c:249

Trouble is, we've not changed the requeue bits in a fair while... So I'm
somewhat confused on how this happens now ?!

Sebastian Andrzej Siewior

unread,
Sep 3, 2025, 9:07:19 AM (yesterday) Sep 3
to Peter Zijlstra, syzbot, andre...@igalia.com, da...@stgolabs.net, dvh...@infradead.org, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, Jens Axboe
+Jens

On 2025-09-02 23:46:28 [+0200], Peter Zijlstra wrote:
> When I build the provided .config with clang-20, that a58 offset is
> exactly task_struct::pi_lock::lockdep_map, which nicely corresponds with
> the below stacktrace, and seems to suggest someone did:
> try_to_wake_up(NULL).

correct.

> > try_to_wake_up+0x67/0x12b0 kernel/sched/core.c:4216
> > requeue_pi_wake_futex+0x24b/0x2f0 kernel/futex/requeue.c:249
>
> Trouble is, we've not changed the requeue bits in a fair while... So I'm
> somewhat confused on how this happens now ?!

This means syzkaller managed to invoke futex_wait_setup(…, NULL) in
order to get futex_q::task assigned to NULL. All users use current
except for io_futex_wait().

The syz-reproducer lists only:
| timer_create(0x0, &(0x7f0000000080)={0x0, 0x11, 0x0, @thr={0x0, 0x0}}, &(0x7f0000000000))
| timer_settime(0x0, 0x0, &(0x7f0000000240)={{0x0, 0x8}, {0x0, 0x9}}, 0x0)
| futex(&(0x7f000000cffc), 0x80000000000b, 0x0, 0x0, &(0x7f0000048000), 0x0)
| futex(&(0x7f000000cffc), 0xc, 0x1, 0x0, &(0x7f0000048000), 0x0)

and that is probably why it can't come up with C-reproducer.
The whole log has (filtered) the following lines:

| io_uring_setup(0x85a, &(0x7f0000000180)={0x0, 0x58b9, 0x1, 0x2, 0x383})
| syz_io_uring_setup(0x88f, &(0x7f0000000300)={0x0, 0xaedf, 0x0, 0x0, 0x25d}, &(0x7f0000000140)=<r0=>0x0, &(0x7f0000000280)=<r1=>0x0)
| syz_memcpy_off$IO_URING_METADATA_GENERIC(r0, 0x4, &(0x7f0000000080)=0xfffffffc, 0x0, 0x4)
| syz_io_uring_submit(r0, r1, &(0x7f00000001c0)=@IORING_OP_RECVMSG={0xa, 0x8, 0x1, r2, 0x0, &(0x7f0000000440)={0x0, 0x0, 0x0}, 0x0, 0x40000020, 0x1, {0x2}})

This should explain the how the waiter got NULL. There is no private
flag so that is how they interact with each other.
Do we want this:

diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index c716a66f86929..0c98256ebdcb7 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -312,6 +312,8 @@ futex_proxy_trylock_atomic(u32 __user *pifutex, struct futex_hash_bucket *hb1,
if (!top_waiter->rt_waiter || top_waiter->pi_state)
return -EINVAL;

+ if (!top_waiter->task)
+ -EINVAL;
/* Ensure we requeue to the expected futex. */
if (!futex_match(top_waiter->requeue_pi_key, key2))
return -EINVAL;

?

Sebastian

Jens Axboe

unread,
Sep 3, 2025, 2:51:13 PM (yesterday) Sep 3
to Sebastian Andrzej Siewior, Peter Zijlstra, syzbot, andre...@igalia.com, da...@stgolabs.net, dvh...@infradead.org, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de
On 9/3/25 7:07 AM, Sebastian Andrzej Siewior wrote:
> +Jens
>
> On 2025-09-02 23:46:28 [+0200], Peter Zijlstra wrote:
>> When I build the provided .config with clang-20, that a58 offset is
>> exactly task_struct::pi_lock::lockdep_map, which nicely corresponds with
>> the below stacktrace, and seems to suggest someone did:
>> try_to_wake_up(NULL).
>
> correct.
>
>>> try_to_wake_up+0x67/0x12b0 kernel/sched/core.c:4216
>>> requeue_pi_wake_futex+0x24b/0x2f0 kernel/futex/requeue.c:249
>>
>> Trouble is, we've not changed the requeue bits in a fair while... So I'm
>> somewhat confused on how this happens now ?!
>
> This means syzkaller managed to invoke futex_wait_setup(?, NULL) in
Yep that looks reasonable to me. And agree that this futex must've been
setup on the io_uring side, which is why you end up with ->task == NULL.

--
Jens Axboe

Sebastian Andrzej Siewior

unread,
12:28 PM (10 hours ago) 12:28 PM
to Jens Axboe, Peter Zijlstra, syzbot, andre...@igalia.com, da...@stgolabs.net, dvh...@infradead.org, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de
On 2025-09-03 12:51:09 [-0600], Jens Axboe wrote:
> > The syz-reproducer lists only:
> > | timer_create(0x0, &(0x7f0000000080)={0x0, 0x11, 0x0, @thr={0x0, 0x0}}, &(0x7f0000000000))
> > | timer_settime(0x0, 0x0, &(0x7f0000000240)={{0x0, 0x8}, {0x0, 0x9}}, 0x0)
> > | futex(&(0x7f000000cffc), 0x80000000000b, 0x0, 0x0, &(0x7f0000048000), 0x0)
> > | futex(&(0x7f000000cffc), 0xc, 0x1, 0x0, &(0x7f0000048000), 0x0)
> >
> > and that is probably why it can't come up with C-reproducer.
> > The whole log has (filtered) the following lines:
> >
> > | io_uring_setup(0x85a, &(0x7f0000000180)={0x0, 0x58b9, 0x1, 0x2, 0x383})
> > | syz_io_uring_setup(0x88f, &(0x7f0000000300)={0x0, 0xaedf, 0x0, 0x0, 0x25d}, &(0x7f0000000140)=<r0=>0x0, &(0x7f0000000280)=<r1=>0x0)
> > | syz_memcpy_off$IO_URING_METADATA_GENERIC(r0, 0x4, &(0x7f0000000080)=0xfffffffc, 0x0, 0x4)
> > | syz_io_uring_submit(r0, r1, &(0x7f00000001c0)=@IORING_OP_RECVMSG={0xa, 0x8, 0x1, r2, 0x0, &(0x7f0000000440)={0x0, 0x0, 0x0}, 0x0, 0x40000020, 0x1, {0x2}})
> >
> > This should explain the how the waiter got NULL. There is no private
> > flag so that is how they interact with each other.
> > Do we want this:
> >
> > diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
> > index c716a66f86929..0c98256ebdcb7 100644
> > --- a/kernel/futex/requeue.c
> > +++ b/kernel/futex/requeue.c
> > @@ -312,6 +312,8 @@ futex_proxy_trylock_atomic(u32 __user *pifutex, struct futex_hash_bucket *hb1,
> > if (!top_waiter->rt_waiter || top_waiter->pi_state)
> > return -EINVAL;

I've been poking at this today and I have one problem with my
explanation:
The io_uring code initializes its futex_q with futex_q_init. At this
point futex_q::rt_waiter is set to NULL and never set to something else.
We should bail out here instead of going further.
Only the PI bits set rt_waiter. Only io_uring sets task to NULL.
I'm hopeless, this makes no sense.

> > + if (!top_waiter->task)
> > + -EINVAL;
> > /* Ensure we requeue to the expected futex. */
> > if (!futex_match(top_waiter->requeue_pi_key, key2))
> > return -EINVAL;
> >

Sebastian
Reply all
Reply to author
Forward
0 new messages