[syzbot] [ext4?] general protection fault in hrtimer_nanosleep

11 views
Skip to first unread message

syzbot

unread,
Nov 1, 2023, 1:36:24 AM11/1/23
to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de, ty...@mit.edu
Hello,

syzbot found the following issue on:

HEAD commit: 888cf78c29e2 Merge tag 'iommu-fix-v6.6-rc7' of git://git.k..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10339673680000
kernel config: https://syzkaller.appspot.com/x/.config?x=7d1f30869bb78ec6
dashboard link: https://syzkaller.appspot.com/bug?extid=b408cd9b40ec25380ee1
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=165bbce3680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/2e776d64243c/disk-888cf78c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9ce776a2bcfc/vmlinux-888cf78c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/86a6c193c013/bzImage-888cf78c.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/8021bba287f0/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b408cd...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc003ffff113: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x00000001ffff8898-0x00000001ffff889f]
CPU: 1 PID: 5308 Comm: syz-executor.4 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012
RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
FS: 0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0
Call Trace:
<TASK>
hrtimer_init_sleeper_on_stack kernel/time/hrtimer.c:447 [inline]
hrtimer_nanosleep+0x122/0x440 kernel/time/hrtimer.c:2098
common_nsleep+0xa1/0xc0 kernel/time/posix-timers.c:1350
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1396 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1373 [inline]
__x64_sys_clock_nanosleep+0x344/0x490 kernel/time/posix-timers.c:1373
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff1a56a7ef5
Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 f6 b9 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 4f ba ff ff 48 8b 04 24 48 83 c4 28 f7 d8
RSP: 002b:00007ffe80c6ee30 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
RAX: ffffffffffffffda RBX: 00007ff1a579bf80 RCX: 00007ff1a56a7ef5
RDX: 00007ffe80c6ee70 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007ff1a579d980 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000fef3
R13: ffffffffffffffff R14: 00007ff1a5200000 R15: 000000000000fbb2
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012

RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
FS: 0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0
----------------
Code disassembly (best guess):
0: d8 48 c1 fmuls -0x3f(%rax)
3: e8 03 42 80 3c call 0x3c80420b
8: 20 00 and %al,(%rax)
a: 0f 85 85 01 00 00 jne 0x195
10: 48 8b 1b mov (%rbx),%rbx
13: 48 85 db test %rbx,%rbx
16: 0f 84 9f 00 00 00 je 0xbb
1c: 48 8d 7b 18 lea 0x18(%rbx),%rdi
20: 83 c5 01 add $0x1,%ebp
23: 48 89 f8 mov %rdi,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) <-- trapping instruction
2f: 0f 85 4c 01 00 00 jne 0x181
35: 4c 3b 73 18 cmp 0x18(%rbx),%r14
39: 75 c3 jne 0xfffffffe
3b: 48 8d 7b 10 lea 0x10(%rbx),%rdi
3f: 48 rex.W


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Thomas Gleixner

unread,
Nov 1, 2023, 8:58:55 AM11/1/23
to syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
On Tue, Oct 31 2023 at 22:36, syzbot wrote:
> general protection fault, probably for non-canonical address 0xdffffc003ffff113: 0000 [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range [0x00000001ffff8898-0x00000001ffff889f]
> CPU: 1 PID: 5308 Comm: syz-executor.4 Not tainted 6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
> RIP: 0010:lookup_object lib/debugobjects.c:195 [inline]
> RIP: 0010:lookup_object_or_alloc lib/debugobjects.c:564 [inline]
> RIP: 0010:__debug_object_init+0xf3/0x2b0 lib/debugobjects.c:634
> Code: d8 48 c1 e8 03 42 80 3c 20 00 0f 85 85 01 00 00 48 8b 1b 48 85 db 0f 84 9f 00 00 00 48 8d 7b 18 83 c5 01 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 4c 01 00 00 4c 3b 73 18 75 c3 48 8d 7b 10 48
> RSP: 0018:ffffc900050e7d08 EFLAGS: 00010012
> RAX: 000000003ffff113 RBX: 00000001ffff8880 RCX: ffffffff8169123e
> RDX: 1ffffffff249b149 RSI: 0000000000000004 RDI: 00000001ffff8898
> RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000216
> R10: 0000000000000003 R11: 0000000000000000 R12: dffffc0000000000
> R13: ffffffff924d8a48 R14: ffffc900050e7d90 R15: ffffffff924d8a50
> FS: 0000555556eec480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa23ab065ee CR3: 000000007e5c1000 CR4: 0000000000350ee0

So this dies in debugobjects::lookup_object()

hlist_for_each_entry()

> 10: 48 8b 1b mov (%rbx),%rbx

Gets the next entry

> 13: 48 85 db test %rbx,%rbx
> 16: 0f 84 9f 00 00 00 je 0xbb

Checks for the termination condition (NULL pointer)

> 1c: 48 8d 7b 18 lea 0x18(%rbx),%rdi

Calculates the address of obj->object

> 20: 83 c5 01 add $0x1,%ebp

cnt++;

> 23: 48 89 f8 mov %rdi,%rax
> 26: 48 c1 e8 03 shr $0x3,%rax

KASAN shadow address calculation

> * 2a: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) <-- trapping instruction

Kasan accesses 0xdffffc003ffff113 and dies.

RBX contains the pointer to the next object: 0x00000001ffff8880 which is
clearly a user space address, but I have no idea where that might come
from. It's obviously data corruption of unknown provenience.

Unfortunately repro.syz does not hold up to its name and refuses to
reproduce.

Thanks,

tglx

Aleksandr Nogikh

unread,
Nov 2, 2023, 8:09:11 AM11/2/23
to Thomas Gleixner, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
For me, on a locally built kernel (gcc 13.2.0) it didn't work either.

But, interestingly, it does reproduce using the syzbot-built kernel
shared via the "Downloadable assets" [1] in the original report. The
repro crashed the kernel in ~1 minute.

[1] https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md

[ 125.919060][ C0] BUG: KASAN: stack-out-of-bounds in rb_next+0x10a/0x130
[ 125.921169][ C0] Read of size 8 at addr ffffc900048e7c60 by task
kworker/0:1/9
[ 125.923235][ C0]
[ 125.923243][ C0] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted
6.6.0-rc7-syzkaller-00142-g888cf78c29e2 #0
[ 125.924546][ C0] Hardware name: QEMU Standard PC (Q35 + ICH9,
2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 125.926915][ C0] Workqueue: events nsim_dev_trap_report_work
[ 125.929333][ C0]
[ 125.929341][ C0] Call Trace:
[ 125.929350][ C0] <IRQ>
[ 125.929356][ C0] dump_stack_lvl+0xd9/0x1b0
[ 125.931302][ C0] print_report+0xc4/0x620
[ 125.932115][ C0] ? __virt_addr_valid+0x5e/0x2d0
[ 125.933194][ C0] kasan_report+0xda/0x110
[ 125.934814][ C0] ? rb_next+0x10a/0x130
[ 125.936521][ C0] ? rb_next+0x10a/0x130
[ 125.936544][ C0] rb_next+0x10a/0x130
[ 125.936565][ C0] timerqueue_del+0xd4/0x140
[ 125.936590][ C0] __remove_hrtimer+0x99/0x290
[ 125.936613][ C0] __hrtimer_run_queues+0x55b/0xc10
[ 125.936638][ C0] ? enqueue_hrtimer+0x310/0x310
[ 125.936659][ C0] ? ktime_get_update_offsets_now+0x3bc/0x610
[ 125.936688][ C0] hrtimer_interrupt+0x31b/0x800
[ 125.936715][ C0] __sysvec_apic_timer_interrupt+0x105/0x3f0
[ 125.936737][ C0] sysvec_apic_timer_interrupt+0x8e/0xc0
[ 125.936755][ C0] </IRQ>
[ 125.936759][ C0] <TASK>



>
> Thanks,
>
> tglx
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/875y2lmxys.ffs%40tglx.

Thomas Gleixner

unread,
Nov 2, 2023, 11:57:06 AM11/2/23
to Aleksandr Nogikh, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Which is a completely different failure mode.

It explodes in the hrtimer interrupt when dequeuing an hrtimer for
expiry. That means the corresponding embedded rb_node is corrupted,
which points to random data corruption.

As you can reproduce (it still fails here with the provided assets),
does the failure change when you run it several times?

Thanks,

tglx

carsten...@siemens.com

unread,
Nov 3, 2023, 7:26:50 AM11/3/23
to Aleksandr Nogikh, Thomas Gleixner, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hi,

i had sporadic similar issues with 4.14 kernels (several maturities, .147 .212 .247 .300) in the past 5 years where stack looked quite similar:

[ 432.041880] general protection fault: 0000 [#1] PREEMPT SMP NOPTI
[ 432.048697] Modules linked in: intel_tfm_governor ecryptfs coretemp i2c_i801 sbi_apl snd_soc_skl sdw_cnl snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress snd_soc_skl_ipc xhci_pci xhci_hcd sdw_bus crc8 ahci snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core libahci snd_hda_core libata snd_pcm usbcore mei_me snd_timer scsi_mod usb_common snd mei soundcore fuse 8021q inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon spi_pxa2xx_platform pps_core
[ 432.099672] CPU: 3 PID: 5729 Comm: dlt_segmented Tainted: G U O 4.14.244-apl #1
[ 432.108909] task: 00000000504d2561 task.stack: 000000007d0046fd
[ 432.115530] RIP: 0010:rb_erase_cached+0x31/0x3b0
[ 432.120683] RSP: 0018:ffffa31d84f77d40 EFLAGS: 00010006
[ 432.126517] RAX: 0000000000000001 RBX: ffffa31d84f77e30 RCX: 0000000000000000
[ 432.134485] RDX: 0000000000000000 RSI: ffff9ed077c1bb10 RDI: ffffa31d84f77e30
[ 432.142456] RBP: ffffa31d84f77d40 R08: ffffa31d84f77e30 R09: 0000a31d80a77c90
[ 432.150426] R10: ffff9ed077c1bee0 R11: 0000000000000400 R12: ffff9ed077c1bb10
[ 432.158394] R13: 0000000000000000 R14: ffff9ed077c1bac0 R15: 0000000000000000
[ 432.166366] FS: 00007ff718cce700(0000) GS:ffff9ed077d80000(0000) knlGS:0000000000000000
[ 432.175403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 432.181819] CR2: 00007ff7182ca3e4 CR3: 000000026175c000 CR4: 00000000003406a0
[ 432.189790] Call Trace:
[ 432.192526] timerqueue_del+0x1d/0x40
[ 432.196617] __remove_hrtimer+0x37/0x70
[ 432.200898] hrtimer_try_to_cancel+0xa0/0x120
[ 432.205769] do_nanosleep+0xa9/0x180
[ 432.209765] ? kfree+0x169/0x180
[ 432.213370] hrtimer_nanosleep+0xbb/0x150
[ 432.217849] ? hrtimer_init+0x110/0x110
[ 432.222134] SyS_nanosleep+0x6d/0xa0
[ 432.226126] do_syscall_64+0x79/0x350
[ 432.230218] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 432.235861] RIP: 0033:0x7ff7199b7240
[ 432.239850] RSP: 002b:00007ff718ccddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000023
[ 432.248309] RAX: ffffffffffffffda RBX: 00007ff718ccde20 RCX: 00007ff7199b7240
[ 432.256282] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff718ccde20
[ 432.264252] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 432.272222] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe333ec72e
[ 432.280190] R13: 00007ffe333ec72f R14: 0000000000802000 R15: 00007ffe333ec730
[ 432.288161] Code: 89 f8 4c 8b 4f 08 48 89 e5 4c 8b 57 10 74 0a 48 3b 7e 08 0f 84 a6 02 00 00 4d 85 d2 0f 84 28 02 00 00 4d 85 c9 0f 84 03 02 00 00 <49> 8b 51 10 4c 89 cf 4c 89 c8 48 85 d2 75 0b e9 65 02 00 00 48
[ 432.309346] RIP: rb_erase_cached+0x31/0x3b0 RSP: ffffa31d84f77d40

Looks like it's worth to dig inside that.
Unfortunately i wasn't able to reproduce this, and i'm still not. So i can't help digging but wanted to tell that this seems not to be related to a specific kernel ....

Thanks
Carsten
>>
>> Thanks,
>>
>> tglx
>>

Aleksandr Nogikh

unread,
Nov 10, 2023, 12:00:34 AM11/10/23
to Thomas Gleixner, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hmm, it's weird. Maybe I was very lucky that time.

The reproducer does work on the attached disk image, but definitely
not very often. I've just run it 10 times or so and got interleaved
BUG/KFENCE bug reports like this (twice):
https://pastebin.com/W0TkRsnw

These seem to be related to ext4 rather than hrtimers though.

--
Aleksandr

>
> Thanks,
>
> tglx

Theodore Ts'o

unread,
Nov 10, 2023, 7:55:10 PM11/10/23
to Aleksandr Nogikh, Thomas Gleixner, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Thu, Nov 09, 2023 at 09:00:18PM -0800, Aleksandr Nogikh wrote:
>
> The reproducer does work on the attached disk image, but definitely
> not very often. I've just run it 10 times or so and got interleaved
> BUG/KFENCE bug reports like this (twice):
> https://pastebin.com/W0TkRsnw
>
> These seem to be related to ext4 rather than hrtimers though.

So what would be nice is if there was a way to ask the syzkaller
tester to use a different config or to change the reproducer somehow
--- for example, is it *really* necessary to twiddle the bluetooth
subsystem, as demonstrated by the spew in the console?

I've certainly spent hours cutting down the reproducer to a simple C
program which is readable by humans, which makes it *clear* the syzbot
minimizer doesn't do a good job. Why should a time-limited maintainer
spend hours trying to cut down the reproducer, when a robot should be
able to do that for us? And when often it doesn't reproduce on
anything via syzbot test, but not when run using KVM, this is why we
need to have a simple way of trigger a test where things are as close
as possible to whatever syzbot is using.

Cheers,

- Ted
Reply all
Reply to author
Forward
0 new messages