[syzbot] [udf?] BUG: unable to handle kernel NULL pointer dereference in __writepage

13 views
Skip to first unread message

syzbot

unread,
Dec 17, 2022, 10:04:48 AM12/17/22
to ak...@linux-foundation.org, ja...@suse.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, wi...@infradead.org
Hello,

syzbot found the following issue on:

HEAD commit: 77856d911a8c Merge tag 'arm64-fixes' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=117055e7880000
kernel config: https://syzkaller.appspot.com/x/.config?x=55043d38f21f0e0f
dashboard link: https://syzkaller.appspot.com/bug?extid=c27475eb921c46bbdc62
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141da6e7880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17d81b8b880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0b78ce281e8c/disk-77856d91.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/8af6f6a5481b/vmlinux-77856d91.xz
kernel image: https://storage.googleapis.com/syzbot-assets/8c902de7af92/bzImage-77856d91.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/280fb5acc0d8/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c27475...@syzkaller.appspotmail.com

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 2bff1067 P4D 2bff1067 PUD 1f5dc067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9019 Comm: syz-executor202 Not tainted 6.1.0-syzkaller-13031-g77856d911a8c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc9000be3f6a8 EFLAGS: 00010246
RAX: 1ffffffff1659874 RBX: ffffea0001bf0e00 RCX: ffff8880183c57c0
RDX: 0000000000000000 RSI: ffffc9000be3fb00 RDI: ffffea0001bf0e00
RBP: ffffffff8b2cc3a0 R08: ffffffff81bf03f6 R09: fffffbfff1d200ae
R10: fffffbfff1d200ae R11: 1ffffffff1d200ad R12: dffffc0000000000
R13: ffffea0001bf0e00 R14: ffff8880738dbd28 R15: ffffc9000be3fb00
FS: 00007ff98e385700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000001ca8e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__writepage+0x60/0x120 mm/page-writeback.c:2537
write_cache_pages+0x7dd/0x1350 mm/page-writeback.c:2472
generic_writepages mm/page-writeback.c:2563 [inline]
do_writepages+0x438/0x690 mm/page-writeback.c:2583
filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388
__filemap_fdatawrite_range mm/filemap.c:421 [inline]
file_write_and_wait_range+0x228/0x330 mm/filemap.c:777
__generic_file_fsync+0x6e/0x190 fs/libfs.c:1132
generic_file_fsync+0x6f/0xe0 fs/libfs.c:1173
generic_write_sync include/linux/fs.h:2882 [inline]
udf_file_write_iter+0x4d6/0x5f0 fs/udf/file.c:176
call_write_iter include/linux/fs.h:2186 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x7b5/0xbb0 fs/read_write.c:584
ksys_write+0x19b/0x2c0 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff9967027f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff98e3852f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ff996780790 RCX: 00007ff9967027f9
RDX: 0000000000000008 RSI: 0000000020000040 RDI: 0000000000000004
RBP: 00007ff99678079c R08: 00007ff98e385700 R09: 0000000000000000
R10: 00007ff98e385700 R11: 0000000000000246 R12: 00007ff99674cd70
R13: 00007ff99674c180 R14: 0000000020000c80 R15: 00007ff996780798
</TASK>
Modules linked in:
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0018:ffffc9000be3f6a8 EFLAGS: 00010246
RAX: 1ffffffff1659874 RBX: ffffea0001bf0e00 RCX: ffff8880183c57c0
RDX: 0000000000000000 RSI: ffffc9000be3fb00 RDI: ffffea0001bf0e00
RBP: ffffffff8b2cc3a0 R08: ffffffff81bf03f6 R09: fffffbfff1d200ae
R10: fffffbfff1d200ae R11: 1ffffffff1d200ad R12: dffffc0000000000
R13: ffffea0001bf0e00 R14: ffff8880738dbd28 R15: ffffc9000be3fb00
FS: 00007ff98e385700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000001ca8e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,
Jan 19, 2023, 11:31:24 AM1/19/23
to ak...@linux-foundation.org, h...@lst.de, ja...@suse.com, ja...@suse.cz, linki...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, wi...@infradead.org
syzbot has bisected this issue to:

commit 36273e5b4e3a934c6d346c8f0b16b97e018094af
Author: Christoph Hellwig <h...@lst.de>
Date: Sun Nov 13 16:29:02 2022 +0000

udf: remove ->writepage

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15176e66480000
start commit: 77856d911a8c Merge tag 'arm64-fixes' of git://git.kernel.o..
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=17176e66480000
console output: https://syzkaller.appspot.com/x/log.txt?x=13176e66480000
Reported-by: syzbot+c27475...@syzkaller.appspotmail.com
Fixes: 36273e5b4e3a ("udf: remove ->writepage")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Christoph Hellwig

unread,
Jan 23, 2023, 4:35:45 AM1/23/23
to syzbot, ak...@linux-foundation.org, h...@lst.de, ja...@suse.com, ja...@suse.cz, linki...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, wi...@infradead.org
I looked into this and got really confused. We should never end
up in generic_writepages if ->writepages is set, which this patch
obviously does.

Then I took a closer look at udf, and it seems to switch a_aops around
at run time, and it seems like we're hitting just that case, and the
patch just seems to narrow down that window.

I suspect the right fix is to remove this runtime switching of aops,
and just do conditionals inside the methods.

Jan Kara

unread,
Jan 23, 2023, 12:41:24 PM1/23/23
to Christoph Hellwig, syzbot, ak...@linux-foundation.org, ja...@suse.com, ja...@suse.cz, linki...@gmail.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, wi...@infradead.org
Interestingly for me it crashes like:

[ 338.085616] general protection fault, probably for non-canonical address 0x40
00000000002068: 0000 [#1] PREEMPT SMP PTI
[ 338.086959] CPU: 4 PID: 31292 Comm: syz-repro11 Not tainted 6.1.0-xen+ #705
[ 338.087941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1
.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 338.089470] RIP: 0010:bio_associate_blkg_from_css+0x31d/0x860
[ 338.092626] RSP: 0018:ffffc90003bb7958 EFLAGS: 00010202
[ 338.093274] RAX: 0000000000000001 RBX: 4000000000002030 RCX: 000000005d6692ad
[ 338.094149] RDX: 0000000092c5763f RSI: ffffffff81eb2e65 RDI: ffffffff81ec3d71
[ 338.095023] RBP: ffff888100c98cc0 R08: 0000000000000001 R09: 0000000000020022
[ 338.095953] R10: 0000000000000000 R11: ffff888108da2fe8 R12: ffffffff831db0e0
[ 338.096884] R13: ffff888100c98cc0 R14: ffffea0004692380 R15: ffffffff831da338
[ 338.097760] FS: 00007f9c59cc0700(0000) GS:ffff888fffd00000(0000) knlGS:00000
00000000000
[ 338.098755] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 338.102194] Call Trace:
[ 338.102496] <TASK>
[ 338.102757] ? bio_associate_blkg_from_css+0x2d2/0x860
[ 338.103390] bio_associate_blkg+0x68/0x130
[ 338.103955] ? bio_associate_blkg+0x9/0x130
[ 338.104538] bio_init+0x7f/0xd0
[ 338.104926] bio_alloc_bioset+0x1f5/0x320
[ 338.106364] __mpage_writepage+0x4dc/0x780
[ 338.110045] write_cache_pages+0x113/0x470
[ 338.111635] mpage_writepages+0x5b/0xb0
[ 338.112854] do_writepages+0xd3/0x1a0
[ 338.113782] filemap_fdatawrite_wbc+0x84/0xb0
[ 338.114793] __filemap_fdatawrite_range+0x58/0x80
[ 338.115374] udf_expand_file_adinicb+0xfa/0x420 [udf]
[ 338.116109] udf_file_write_iter+0x1a9/0x1d0 [udf]

which is actually inside:
bio_associate_blkg_from_css+0x31d/0x860:
__ref_is_percpu at include/linux/percpu-refcount.h:174
(inlined by) percpu_ref_get_many at include/linux/percpu-refcount.h:204
(inlined by) percpu_ref_get at include/linux/percpu-refcount.h:222
(inlined by) blkg_get at block/blk-cgroup.h:322
(inlined by) bio_associate_blkg_from_css at block/blk-cgroup.c:1938

so bdev_get_queue(bio->bi_bdev)->root_blkg is bogus (0x4000000000002030).
Likely the request_queue is already dead. Not sure how this could be caused
by any problem in UDF.

Anyway, I tend to agree with you that switching aops is hairy and we should
probably get rid of it in UDF. But this particular crash seems to be
related to something else...

Honza
--
Jan Kara <ja...@suse.com>
SUSE Labs, CR
Reply all
Reply to author
Forward
0 new messages