[syzbot] [ext4?] general protection fault in jbd2__journal_start

14 views
Skip to first unread message

syzbot

unread,
Jan 26, 2024, 4:05:30 AMJan 26
to adilger...@dilger.ca, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hello,

syzbot found the following issue on:

HEAD commit: 7a396820222d Merge tag 'v6.8-rc-part2-smb-client' of git:/..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15fca78fe80000
kernel config: https://syzkaller.appspot.com/x/.config?x=7059b09d0488022
dashboard link: https://syzkaller.appspot.com/bug?extid=cdee56dbcdf0096ef605
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/da73c2c8f5fe/disk-7a396820.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/10d2d2be8831/vmlinux-7a396820.xz
kernel image: https://storage.googleapis.com/syzbot-assets/939406fd4919/bzImage-7a396820.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cdee56...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc000a8a4829: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x0000000054524148-0x000000005452414f]
CPU: 0 PID: 3394 Comm: syz-executor.5 Not tainted 6.7.0-syzkaller-12991-g7a396820222d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:jbd2__journal_start+0x87/0x5d0 fs/jbd2/transaction.c:496
Code: 74 63 48 8b 1b 48 85 db 74 79 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 23 46 8f ff 48 8b 2b 48 89 e8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 ef e8 0a 46 8f ff 4c 39 65 00 0f 85 1a
RSP: 0018:ffffc900154d65c8 EFLAGS: 00010203
RAX: 000000000a8a4829 RBX: ffff8880234e7618 RCX: 0000000000040000
RDX: ffffc9000a3a1000 RSI: 000000000000195c RDI: 000000000000195d
RBP: 000000005452414e R08: 0000000000000c40 R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed1005541071 R12: ffff88802aa0a000
R13: dffffc0000000000 R14: 0000000000000c40 R15: 0000000000000002
FS: 00007fbf47a2a6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 0000000030c1a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__ext4_journal_start_sb+0x215/0x5b0 fs/ext4/ext4_jbd2.c:112
__ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
ext4_dirty_inode+0x92/0x110 fs/ext4/inode.c:5969
__mark_inode_dirty+0x305/0xda0 fs/fs-writeback.c:2452
generic_update_time fs/inode.c:1905 [inline]
inode_update_time fs/inode.c:1918 [inline]
__file_update_time fs/inode.c:2106 [inline]
file_update_time+0x39b/0x3e0 fs/inode.c:2136
ext4_page_mkwrite+0x207/0xdf0 fs/ext4/inode.c:6090
do_page_mkwrite+0x197/0x470 mm/memory.c:2966
wp_page_shared mm/memory.c:3353 [inline]
do_wp_page+0x20e3/0x4c80 mm/memory.c:3493
handle_pte_fault mm/memory.c:5160 [inline]
__handle_mm_fault+0x26a3/0x72b0 mm/memory.c:5285
handle_mm_fault+0x27e/0x770 mm/memory.c:5450
do_user_addr_fault arch/x86/mm/fault.c:1415 [inline]
handle_page_fault arch/x86/mm/fault.c:1507 [inline]
exc_page_fault+0x2ad/0x870 arch/x86/mm/fault.c:1563
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0010:rep_movs_alternative+0x4a/0x70 arch/x86/lib/copy_user_64.S:71
Code: 75 f1 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 8b 06 48 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 df 83 f9 08 73 e8 eb c9 <f3> a4 c3 48 89 c8 48 c1 e9 03 83 e0 07 f3 48 a5 89 c1 85 c9 75 b3
RSP: 0018:ffffc900154d70f8 EFLAGS: 00050202
RAX: ffffffff848bfd01 RBX: 0000000020020040 RCX: 0000000000000040
RDX: 0000000000000000 RSI: ffff88802cded190 RDI: 0000000020020000
RBP: 1ffff92002a9af26 R08: ffff88802cded1cf R09: 1ffff110059bda39
R10: dffffc0000000000 R11: ffffed10059bda3a R12: 00000000000000c0
R13: dffffc0000000000 R14: 000000002001ff80 R15: ffff88802cded110
copy_user_generic arch/x86/include/asm/uaccess_64.h:112 [inline]
raw_copy_to_user arch/x86/include/asm/uaccess_64.h:133 [inline]
_copy_to_user+0x86/0xa0 lib/usercopy.c:41
copy_to_user include/linux/uaccess.h:191 [inline]
xfs_bulkstat_fmt+0x4f/0x120 fs/xfs/xfs_ioctl.c:744
xfs_bulkstat_one_int+0xd8b/0x12e0 fs/xfs/xfs_itable.c:161
xfs_bulkstat_iwalk+0x72/0xb0 fs/xfs/xfs_itable.c:239
xfs_iwalk_ag_recs+0x4c3/0x820 fs/xfs/xfs_iwalk.c:220
xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
xfs_iwalk+0x360/0x6f0 fs/xfs/xfs_iwalk.c:584
xfs_bulkstat+0x4f8/0x6c0 fs/xfs/xfs_itable.c:308
xfs_ioc_bulkstat+0x3d0/0x450 fs/xfs/xfs_ioctl.c:867
xfs_file_ioctl+0x6a5/0x1980 fs/xfs/xfs_ioctl.c:1994
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fbf46c7cda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbf47a2a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fbf46dabf80 RCX: 00007fbf46c7cda9
RDX: 000000002001fc40 RSI: 000000008040587f RDI: 0000000000000006
RBP: 00007fbf46cc947a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fbf46dabf80 R15: 00007ffee39fcd08
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:jbd2__journal_start+0x87/0x5d0 fs/jbd2/transaction.c:496
Code: 74 63 48 8b 1b 48 85 db 74 79 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 23 46 8f ff 48 8b 2b 48 89 e8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 ef e8 0a 46 8f ff 4c 39 65 00 0f 85 1a
RSP: 0018:ffffc900154d65c8 EFLAGS: 00010203
RAX: 000000000a8a4829 RBX: ffff8880234e7618 RCX: 0000000000040000
RDX: ffffc9000a3a1000 RSI: 000000000000195c RDI: 000000000000195d
RBP: 000000005452414e R08: 0000000000000c40 R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed1005541071 R12: ffff88802aa0a000
R13: dffffc0000000000 R14: 0000000000000c40 R15: 0000000000000002
FS: 00007fbf47a2a6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 0000000030c1a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 74 63 je 0x65
2: 48 8b 1b mov (%rbx),%rbx
5: 48 85 db test %rbx,%rbx
8: 74 79 je 0x83
a: 48 89 d8 mov %rbx,%rax
d: 48 c1 e8 03 shr $0x3,%rax
11: 42 80 3c 28 00 cmpb $0x0,(%rax,%r13,1)
16: 74 08 je 0x20
18: 48 89 df mov %rbx,%rdi
1b: e8 23 46 8f ff call 0xff8f4643
20: 48 8b 2b mov (%rbx),%rbp
23: 48 89 e8 mov %rbp,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 80 3c 28 00 cmpb $0x0,(%rax,%r13,1) <-- trapping instruction
2f: 74 08 je 0x39
31: 48 89 ef mov %rbp,%rdi
34: e8 0a 46 8f ff call 0xff8f4643
39: 4c 39 65 00 cmp %r12,0x0(%rbp)
3d: 0f .byte 0xf
3e: 85 1a test %ebx,(%rdx)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Jan 30, 2024, 9:52:23 AMJan 30
to adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
syzbot has found a reproducer for the following issue on:

HEAD commit: 861c0981648f Merge tag 'jfs-6.8-rc3' of github.com:kleikam..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=13ca8d97e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=b0b9993d7d6d1990
dashboard link: https://syzkaller.appspot.com/bug?extid=cdee56dbcdf0096ef605
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=104393efe80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1393b90fe80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7c6cc521298d/disk-861c0981.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6203c94955db/vmlinux-861c0981.xz
kernel image: https://storage.googleapis.com/syzbot-assets/17e76e12b58c/bzImage-861c0981.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/d31d4eed2912/mount_3.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cdee56...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 0xdffffc000a8a4829: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x0000000054524148-0x000000005452414f]
CPU: 1 PID: 5065 Comm: syz-executor260 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:jbd2__journal_start+0x87/0x5d0 fs/jbd2/transaction.c:496
Code: 74 63 48 8b 1b 48 85 db 74 79 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 63 4d 8f ff 48 8b 2b 48 89 e8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 ef e8 4a 4d 8f ff 4c 39 65 00 0f 85 1a
RSP: 0018:ffffc900043265c8 EFLAGS: 00010203
RAX: 000000000a8a4829 RBX: ffff8880205fa3a8 RCX: ffff8880235dbb80
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88801c1a6000
RBP: 000000005452414e R08: 0000000000000c40 R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed1003834871 R12: ffff88801c1a6000
R13: dffffc0000000000 R14: 0000000000000c40 R15: 0000000000000002
FS: 0000555556f90380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 0000000021fed000 CR4: 00000000003506f0
RSP: 0018:ffffc900043270f8 EFLAGS: 00050202
RAX: ffffffff848cda01 RBX: 0000000020020040 RCX: 0000000000000040
RDX: 0000000000000000 RSI: ffff8880131873b0 RDI: 0000000020020000
RBP: 1ffff92000864f26 R08: ffff8880131873ef R09: 1ffff11002630e7d
R10: dffffc0000000000 R11: ffffed1002630e7e R12: 00000000000000c0
R13: dffffc0000000000 R14: 000000002001ff80 R15: ffff888013187330
copy_user_generic arch/x86/include/asm/uaccess_64.h:112 [inline]
raw_copy_to_user arch/x86/include/asm/uaccess_64.h:133 [inline]
_copy_to_user+0x86/0xa0 lib/usercopy.c:41
copy_to_user include/linux/uaccess.h:191 [inline]
xfs_bulkstat_fmt+0x4f/0x120 fs/xfs/xfs_ioctl.c:744
xfs_bulkstat_one_int+0xd8b/0x12e0 fs/xfs/xfs_itable.c:161
xfs_bulkstat_iwalk+0x72/0xb0 fs/xfs/xfs_itable.c:239
xfs_iwalk_ag_recs+0x4c3/0x820 fs/xfs/xfs_iwalk.c:220
xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
xfs_iwalk+0x360/0x6f0 fs/xfs/xfs_iwalk.c:584
xfs_bulkstat+0x4f8/0x6c0 fs/xfs/xfs_itable.c:308
xfs_ioc_bulkstat+0x3d0/0x450 fs/xfs/xfs_ioctl.c:867
xfs_file_ioctl+0x6a5/0x1980 fs/xfs/xfs_ioctl.c:1994
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f02d4018b59
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdbe0deb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0030656c69662f2e RCX: 00007f02d4018b59
RDX: 000000002001fc40 RSI: 000000008040587f RDI: 0000000000000004
RBP: 00000000000116e3 R08: 0000000000000000 R09: 0000555556f914c0
R10: 0000000020000300 R11: 0000000000000246 R12: 00007ffdbe0debc0
R13: 00007ffdbe0debac R14: 431bde82d7b634db R15: 00007f02d406103b
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:jbd2__journal_start+0x87/0x5d0 fs/jbd2/transaction.c:496
Code: 74 63 48 8b 1b 48 85 db 74 79 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 63 4d 8f ff 48 8b 2b 48 89 e8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 ef e8 4a 4d 8f ff 4c 39 65 00 0f 85 1a
RSP: 0018:ffffc900043265c8 EFLAGS: 00010203
RAX: 000000000a8a4829 RBX: ffff8880205fa3a8 RCX: ffff8880235dbb80
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88801c1a6000
RBP: 000000005452414e R08: 0000000000000c40 R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed1003834871 R12: ffff88801c1a6000
R13: dffffc0000000000 R14: 0000000000000c40 R15: 0000000000000002
FS: 0000555556f90380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 0000000021fed000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 74 63 je 0x65
2: 48 8b 1b mov (%rbx),%rbx
5: 48 85 db test %rbx,%rbx
8: 74 79 je 0x83
a: 48 89 d8 mov %rbx,%rax
d: 48 c1 e8 03 shr $0x3,%rax
11: 42 80 3c 28 00 cmpb $0x0,(%rax,%r13,1)
16: 74 08 je 0x20
18: 48 89 df mov %rbx,%rdi
1b: e8 63 4d 8f ff call 0xff8f4d83
20: 48 8b 2b mov (%rbx),%rbp
23: 48 89 e8 mov %rbp,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 80 3c 28 00 cmpb $0x0,(%rax,%r13,1) <-- trapping instruction
2f: 74 08 je 0x39
31: 48 89 ef mov %rbp,%rdi
34: e8 4a 4d 8f ff call 0xff8f4d83
39: 4c 39 65 00 cmp %r12,0x0(%rbp)
3d: 0f .byte 0xf
3e: 85 1a test %ebx,(%rdx)


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Dave Chinner

unread,
Jan 30, 2024, 6:37:25 PMJan 30
to syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
^^^^^^^^
Hmmmm - TRAN. That's looks suspicious, I'll come back to that.
EXt4 is triggering a BUG_ON:

handle_t *handle = journal_current_handle();
int err;

if (!journal)
return ERR_PTR(-EROFS);

if (handle) {
>>>>>>>>> J_ASSERT(handle->h_transaction->t_journal == journal);
handle->h_ref++;
return handle;
}

via a journal assert failure. It appears that current->journal_info
isn't what it is supposed to be. It's finding something with TRAN in
it, I think. I'll come back to this.

What syzbot is doing is creating a file on it's root filesystem and
write()ing 0x208e24b bytes (zeroes from an anonymous mmap() region,
I think) to it to initialise it's contents.

Then it mmap()s the ext4 file for 0xb36000 bytes and copies the raw
test filesystem image in the source code into it. It then creates a
memfd that it decompresses the data in the mapped ext4 file into and
creates a loop device that points to that memfd. It then mounts the
loop device and we get an XFS filesystem which doesn't appear to
contain any corruptions in it at all. It then runs a bulkstat pass
on the the XFS filesystem.

This is where it gets interesting. The user buffer that XFS
copies the inode data into points to a memory address inside the
range of the ext4 file that the filesystem image was copied to.
It does not overlap with the filesystem image.

Hence when XFS goes to copy the inodes into the user buffer, it
triggers write page faults on the ext4 backing file.

That's this part of the trace:
What is interesting here is this is running in an empty XFS
transaction context so that the bulkstat operation garbage collects
all the buffers it accesses without us having to explicit cleanup -
they all get released when we cancel the transaction context at the
end of the process.

But that means copy-out is running inside a transaction context, and
that means current->journal_info contains a pointer to the current
struct xfs_trans the bulkstat operation is using.

Guess what we have as the first item in a struct xfs_trans? That's
right, it's a magic number, and that magic number is:

#define XFS_TRANS_HEADER_MAGIC 0x5452414e /* TRAN */

It should be obvious what has happened now -
current->journal_info is not null, so ext4 thinks it owns the
structure attached there and panics when it finds that it isn't an
ext4 journal handle being held there.

I don't think there are any clear rules as to how filesystems can
and can't use current->journal_info. In general, a task can't jump
from one filesystem to another inside a transaction context like
this, so there's never been a serious concern about nested
current->journal_info assignments like this in the past.

XFS is doing nothing wrong - we're allowed to define transaction
contexts however we want and use current->journal_info in this way.
However, we have to acknowledge that ext4 has also done nothing
wrong by assuming current->journal_info should below to it if it is
not null. Indeed, XFS does the same thing.

The question here is what to do about this? The obvious solution is
to have save/restore semantics in the filesystem code that
sets/clears current->journal_info, and then filesystems can also do
the necessary "recursion into same filesystem" checks they need to
ensure that they aren't nesting transactions in a way that can
deadlock.

Maybe there are other options - should filesystems even be allowed to
trigger page faults when they have set current->journal_info?

What other potential avenues are there that could cause this sort of
transaction context nesting that we haven't realised exist? Does
ext4 data=jounral have problems like this in the data copy-in path?
What other filesystems allow page faults in transaction contexts?

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Darrick J. Wong

unread,
Jan 30, 2024, 10:46:36 PMJan 30
to Dave Chinner, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Getting late here, so this will be pretty terse--

Thinking narrowly about just xfs, I think this means that the
bulkstat/inumbers implementations need to allocate a bounce buffer to
format records into so that it can copy_to_user without any locks held.
We have no idea if the destination page is a file page or anonymous
memory or whatever. Or: Do we really need to set current->journal_info
for empty transactions?

> The question here is what to do about this? The obvious solution is
> to have save/restore semantics in the filesystem code that
> sets/clears current->journal_info, and then filesystems can also do
> the necessary "recursion into same filesystem" checks they need to
> ensure that they aren't nesting transactions in a way that can
> deadlock.

We don't know what locks might be held by whatever code set
journal_info. I don't see how we could push it aside sanely?

> Maybe there are other options - should filesystems even be allowed to
> trigger page faults when they have set current->journal_info?

I wonder if we ought to be checking current->journal_info in our own
page_mkwrite handler and throwing back an errno? I don't think we want
to go down the rabbithele of "someone else was running a transaction,
maybe we can proceed with an update anyway???".

> What other potential avenues are there that could cause this sort of
> transaction context nesting that we haven't realised exist? Does
> ext4 data=jounral have problems like this in the data copy-in path?
> What other filesystems allow page faults in transaction contexts?

Ugh, whackamole. :/

--D

Theodore Ts'o

unread,
Jan 30, 2024, 11:58:29 PMJan 30
to Dave Chinner, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, Jan 31, 2024 at 10:37:18AM +1100, Dave Chinner wrote:
> It should be obvious what has happened now -
> current->journal_info is not null, so ext4 thinks it owns the
> structure attached there and panics when it finds that it isn't an
> ext4 journal handle being held there.
>
> I don't think there are any clear rules as to how filesystems can
> and can't use current->journal_info. In general, a task can't jump
> from one filesystem to another inside a transaction context like
> this, so there's never been a serious concern about nested
> current->journal_info assignments like this in the past.
>
> XFS is doing nothing wrong - we're allowed to define transaction
> contexts however we want and use current->journal_info in this way.
> However, we have to acknowledge that ext4 has also done nothing
> wrong by assuming current->journal_info should below to it if it is
> not null. Indeed, XFS does the same thing.

Nice analysis. Fundamentally the current usage of
current->journal_info assumes that a process would only be calling
into one file system at a time. But obviously that's not going to be
true in the case of one file system writing to memory which then
triggers a page fault.

As far as other potential avenues that could cause this kind of
nesting, the other one which comes to mind might be sendfile(2) --
although in general the reader side won't trigger a transaction since
the atime update tends to be done lazily.

> The question here is what to do about this? The obvious solution is
> to have save/restore semantics in the filesystem code that
> sets/clears current->journal_info, and then filesystems can also do
> the necessary "recursion into same filesystem" checks they need to
> ensure that they aren't nesting transactions in a way that can
> deadlock.
>
> Maybe there are other options - should filesystems even be allowed to
> trigger page faults when they have set current->journal_info?

Hmm, could XFS pre-fault target memory buffer for the bulkstat output
before starting its transaction? Alternatively, ext4 could do a save
of current->journal_info before starting to process the page fault,
and restore it when it is done. Both of these seem a bit hacky, and
the question is indeed, are there other avenues that might cause the
transaction context nesting, such that a more general solution is
called for?

- Ted

Matthew Wilcox

unread,
Jan 31, 2024, 12:20:16 AMJan 31
to Theodore Ts'o, Dave Chinner, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com
On Tue, Jan 30, 2024 at 11:58:22PM -0500, Theodore Ts'o wrote:
> Hmm, could XFS pre-fault target memory buffer for the bulkstat output
> before starting its transaction? Alternatively, ext4 could do a save
> of current->journal_info before starting to process the page fault,
> and restore it when it is done. Both of these seem a bit hacky, and
> the question is indeed, are there other avenues that might cause the
> transaction context nesting, such that a more general solution is
> called for?

I'd suggest that saving off current->journal_info is risky because
it might cover a real problem where you've taken a pagefault inside
a transaction (eg ext4 faulting while in the middle of a transaction on
the same filesystem that contains the faulting file).

Seems to me that we shouldn't be writing to userspace while in the
middle of a transaction. We could even assert that in copy_to_user()?

Christoph Hellwig

unread,
Jan 31, 2024, 12:47:46 AMJan 31
to Matthew Wilcox, Theodore Ts'o, Dave Chinner, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, Jan 31, 2024 at 05:20:10AM +0000, Matthew Wilcox wrote:
> I'd suggest that saving off current->journal_info is risky because
> it might cover a real problem where you've taken a pagefault inside
> a transaction (eg ext4 faulting while in the middle of a transaction on
> the same filesystem that contains the faulting file).

Agreed.

> Seems to me that we shouldn't be writing to userspace while in the
> middle of a transaction. We could even assert that in copy_to_user()?

That sounds useful, but also rather expensive.

Dave Chinner

unread,
Jan 31, 2024, 1:02:30 AMJan 31
to Matthew Wilcox, Theodore Ts'o, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, Jan 31, 2024 at 05:20:10AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 30, 2024 at 11:58:22PM -0500, Theodore Ts'o wrote:
> > Hmm, could XFS pre-fault target memory buffer for the bulkstat output
> > before starting its transaction? Alternatively, ext4 could do a save
> > of current->journal_info before starting to process the page fault,
> > and restore it when it is done. Both of these seem a bit hacky, and
> > the question is indeed, are there other avenues that might cause the
> > transaction context nesting, such that a more general solution is
> > called for?
>
> I'd suggest that saving off current->journal_info is risky because
> it might cover a real problem where you've taken a pagefault inside
> a transaction (eg ext4 faulting while in the middle of a transaction on
> the same filesystem that contains the faulting file).

Depends. Look at it from the POV of XFS:

1. we can identify current->journal_info as belonging to XFS because
of the TRAN magic number in the first 32 bits of the structure.

2. the struct xfs_trans has a pointer to the xfs_mount which points
to the VFS superblock, which means we can identify exactly which
filesystem instance the transaction belongs to.

3. We can determine if the transaction has a journal reservation
from the log ticket the transaction holds.

From these three things, we can identify if we are about to recurse
into the same filesystem, and if so, determine if it is safe to run
new transactions from within that context.

> Seems to me that we shouldn't be writing to userspace while in the
> middle of a transaction. We could even assert that in copy_to_user()?

On further thinking I don't think the problem is taking page faults
within a filesystem transaction context is the problem here. We're
using the transaction context for automated garbage collection so we
never have to care about leaking object references in the code, not
because we are running a modification and holding a journal
reservation. It's the journals reservations that cannot be allowed
to nest in XFS, not the transactions themselves.

IOWs, the real problem is that multiple filesystems use the same
task field for their own individual purposes, and that then leads
these sorts of recursion problems when a task jumps from one
filesystem context to another and the task field is then
mis-interpretted.

I've had a look at the XFS usage of current->journal_info, and I
think we can just remove it.

It's used for warning that we are running a writepages operations
from within transaction context (which should never happen from XFS
nor memory reclaim). it's also used in the ->destroy_inode path to
determine if we are running in a context where we cannot block on
filesystem locks or transaction reservations.

The former we can remove with no loss, the latter we can simply
check PF_MEMALLOC_NOFS as that is set by transaction contexts.

IOWs, I think that, in general, page faults in user buffers are fine
in empty XFS transaction contexts as long as we aren't using some
structure that some other filesystem might then to process the
page fault....

This may not be true for other filesystems, but I don't think we
can really say "page faults in any filesystem transaction are unsafe
and should be banned"....

-Dave.
--
Dave Chinner
da...@fromorbit.com

Christoph Hellwig

unread,
Jan 31, 2024, 1:18:07 AMJan 31
to Dave Chinner, Matthew Wilcox, Theodore Ts'o, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, Jan 31, 2024 at 05:02:25PM +1100, Dave Chinner wrote:
> This may not be true for other filesystems, but I don't think we
> can really say "page faults in any filesystem transaction are unsafe
> and should be banned"....

I think the point is page faults with current->journal_info set is
unsafe, as the can recurse into another file system using it. If we
don't set current->journal_info (and your ideas for that sound sensible
to me), the question of page faults in transactions is decoupled from
that. We just need to ensure we never recurse into a transaction in
the same or a dependent fs, which ot me suggest we'd better avoid it
if we easily can.

Edward Adam Davis

unread,
Jan 31, 2024, 2:40:51 AMJan 31
to syzbot+cdee56...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
please test general protection fault in jbd2__journal_start

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index cb0b8d6fc0c6..702312cd5392 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -493,6 +493,9 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
return ERR_PTR(-EROFS);

if (handle) {
+ if (handle->saved_alloc_context & ~PF_MEMALLOC_NOFS)
+ return ERR_PTR(-EBUSY);
+

syzbot

unread,
Jan 31, 2024, 6:17:06 AMJan 31
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+cdee56...@syzkaller.appspotmail.com

Tested on:

commit: 1bbb19b6 Merge tag 'erofs-for-6.8-rc3-fixes' of git://..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=16f6172fe80000
kernel config: https://syzkaller.appspot.com/x/.config?x=b0b9993d7d6d1990
dashboard link: https://syzkaller.appspot.com/bug?extid=cdee56dbcdf0096ef605
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=17477cb0180000

Note: testing is done by a robot and is best-effort only.

Jan Kara

unread,
Jan 31, 2024, 7:02:36 AMJan 31
to Dave Chinner, syzbot, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Indeed, thanks for the great analysis.

<snip analysis>

> The question here is what to do about this? The obvious solution is
> to have save/restore semantics in the filesystem code that
> sets/clears current->journal_info, and then filesystems can also do
> the necessary "recursion into same filesystem" checks they need to
> ensure that they aren't nesting transactions in a way that can
> deadlock.

As others have mentioned, this seems potentially dangerous because that
just hides potential deadlocks. E.g. for ext4 taking a page fault while
having a transaction started violates lock ordering requirements
(mapping->invalidate_lock > transaction start). OTOH we have lockdep
tracking for this anyway so I guess we don't care too much for ext4.

> Maybe there are other options - should filesystems even be allowed to
> trigger page faults when they have set current->journal_info?

For ext4 it would definitely be a bug if this happens and it is not only
about usage of current->journal_info as I wrote above.

> What other potential avenues are there that could cause this sort of
> transaction context nesting that we haven't realised exist? Does
> ext4 data=jounral have problems like this in the data copy-in path?
> What other filesystems allow page faults in transaction contexts?

So I'm reasonably confident we aren't hitting any such path in ext4 as
lockdep would tell us about it (we treat transaction start as lock
acquisition in jbd2 and tell lockdep about it). For the write path, we are
relying on VFS prefaulting pages before calling ->write_begin (where we
start a transaction) and then doing atomic copy. For the read path we don't
start any transaction (except for possible atime update but that's just a
tiny transaction on the side after the read completes). So ext4 on its own
should be fine. But we have also btrfs, ceph, gfs2, nilfs2, ocfs2, and
reiserfs using current->journal_info so overall chances for interaction are
... non-trivial.

Honza
--
Jan Kara <ja...@suse.com>
SUSE Labs, CR

Edward Adam Davis

unread,
Jan 31, 2024, 7:10:49 AMJan 31
to syzbot+cdee56...@syzkaller.appspotmail.com, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Before reusing the handle, it is necessary to confirm that the transaction is
ready.

Reported-and-tested-by: syzbot+cdee56...@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <ead...@qq.com>
---
fs/jbd2/transaction.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index cb0b8d6fc0c6..702312cd5392 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -493,6 +493,9 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
return ERR_PTR(-EROFS);

if (handle) {
+ if (handle->saved_alloc_context & ~PF_MEMALLOC_NOFS)
+ return ERR_PTR(-EBUSY);
+
J_ASSERT(handle->h_transaction->t_journal == journal);
handle->h_ref++;
return handle;
--
2.43.0

Jan Kara

unread,
Jan 31, 2024, 10:41:57 AMJan 31
to Edward Adam Davis, syzbot+cdee56...@syzkaller.appspotmail.com, adilger...@dilger.ca, chanda...@oracle.com, ja...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
On Wed 31-01-24 20:04:27, Edward Adam Davis wrote:
> Before reusing the handle, it is necessary to confirm that the transaction is
> ready.
>
> Reported-and-tested-by: syzbot+cdee56...@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <ead...@qq.com>

Sorry but no. Dave found a way to fix this particular problem in XFS and
your patch would not really improve anything because we'd just crash
when dereferencing handle->saved_alloc_context.

Honza


> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index cb0b8d6fc0c6..702312cd5392 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -493,6 +493,9 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
> return ERR_PTR(-EROFS);
>
> if (handle) {
> + if (handle->saved_alloc_context & ~PF_MEMALLOC_NOFS)
> + return ERR_PTR(-EBUSY);
> +
> J_ASSERT(handle->h_transaction->t_journal == journal);
> handle->h_ref++;
> return handle;
> --
> 2.43.0
>
Reply all
Reply to author
Forward
0 new messages