[syzbot] [mm?] BUG: stack guard page was hit in v9fs_file_read_iter

19 views
Skip to first unread message

syzbot

unread,
Nov 6, 2024, 9:08:27 AM11/6/24
to asma...@codewreck.org, eri...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lu...@ionkov.net, syzkall...@googlegroups.com, v9...@lists.linux.dev
Hello,

syzbot found the following issue on:

HEAD commit: 2e1b3cc9d7f7 Merge tag 'arm-fixes-6.12-2' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=164f56a7980000
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15cf8e30580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17a27587980000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-2e1b3cc9.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6887ff647109/vmlinux-2e1b3cc9.xz
kernel image: https://storage.googleapis.com/syzbot-assets/958ab0c29314/bzImage-2e1b3cc9.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1fc6f6...@syzkaller.appspotmail.com

BUG: TASK stack guard page was hit at ffffc9000482ff48 (stack is ffffc90004830000..ffffc90004838000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 6237 Comm: syz-executor663 Not tainted 6.12.0-rc6-syzkaller-00077-g2e1b3cc9d7f7 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
Code: 90 90 90 90 90 55 48 89 e5 41 57 41 56 41 89 d6 48 ba 00 00 00 00 00 fc ff df 41 55 41 54 53 48 83 e4 f0 48 81 ec 10 01 00 00 <48> c7 44 24 30 b3 8a b5 41 48 8d 44 24 30 48 c7 44 24 38 c0 4d 7a
RSP: 0018:ffffc9000482ff50 EFLAGS: 00010086
RAX: 000000000000000c RBX: ffff8880306c2fba RCX: 0000000000000002
RDX: dffffc0000000000 RSI: ffff8880306c2f98 RDI: ffff8880306c2440
RBP: ffffc90004830088 R08: 0000000000000000 R09: 0000000000000006
R10: ffffffff96e2dd27 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff8880306c2f98 R14: 0000000000000008 R15: ffff8880306c2440
FS: 00007fedf3b6e6c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000482ff48 CR3: 000000002c910000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<#DF>
</#DF>
<TASK>
mark_usage kernel/locking/lockdep.c:4646 [inline]
__lock_acquire+0x906/0x3ce0 kernel/locking/lockdep.c:5156
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825
local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
___slab_alloc+0x123/0x1880 mm/slub.c:3695
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
kmem_cache_alloc_noprof+0x2a7/0x2f0 mm/slub.c:4141
radix_tree_node_alloc.constprop.0+0x1e8/0x350 lib/radix-tree.c:253
idr_get_free+0x528/0xa40 lib/radix-tree.c:1506
idr_alloc_u32+0x191/0x2f0 lib/idr.c:46
idr_alloc+0xc1/0x130 lib/idr.c:87
p9_tag_alloc+0x394/0x870 net/9p/client.c:321
p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
p9_client_zc_rpc.constprop.0+0x105/0x880 net/9p/client.c:793
p9_client_read_once+0x443/0x820 net/9p/client.c:1570
p9_client_read+0x13f/0x1b0 net/9p/client.c:1534
v9fs_issue_read+0x115/0x310 fs/9p/vfs_addr.c:74
netfs_retry_read_subrequests fs/netfs/read_retry.c:60 [inline]
netfs_retry_reads+0x153a/0x1d00 fs/netfs/read_retry.c:232
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline]
netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline]
netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221
netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256
v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361
do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832
vfs_readv+0x4cf/0x890 fs/read_write.c:1025
do_preadv fs/read_write.c:1142 [inline]
__do_sys_preadv fs/read_write.c:1192 [inline]
__se_sys_preadv fs/read_write.c:1187 [inline]
__x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fedf3bd4dd9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fedf3b6e168 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
RAX: ffffffffffffffda RBX: 00007fedf3c5e318 RCX: 00007fedf3bd4dd9
RDX: 0000000000000001 RSI: 00000000200015c0 RDI: 0000000000000003
RBP: 00007fedf3c5e310 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fedf3c5e31c
R13: 000000000000000b R14: 00007fffe9d355b0 R15: 00007fffe9d35698
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
Code: 90 90 90 90 90 55 48 89 e5 41 57 41 56 41 89 d6 48 ba 00 00 00 00 00 fc ff df 41 55 41 54 53 48 83 e4 f0 48 81 ec 10 01 00 00 <48> c7 44 24 30 b3 8a b5 41 48 8d 44 24 30 48 c7 44 24 38 c0 4d 7a
RSP: 0018:ffffc9000482ff50 EFLAGS: 00010086
RAX: 000000000000000c RBX: ffff8880306c2fba RCX: 0000000000000002
RDX: dffffc0000000000 RSI: ffff8880306c2f98 RDI: ffff8880306c2440
RBP: ffffc90004830088 R08: 0000000000000000 R09: 0000000000000006
R10: ffffffff96e2dd27 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff8880306c2f98 R14: 0000000000000008 R15: ffff8880306c2440
FS: 00007fedf3b6e6c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000482ff48 CR3: 000000002c910000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 55 push %rbp
6: 48 89 e5 mov %rsp,%rbp
9: 41 57 push %r15
b: 41 56 push %r14
d: 41 89 d6 mov %edx,%r14d
10: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx
17: fc ff df
1a: 41 55 push %r13
1c: 41 54 push %r12
1e: 53 push %rbx
1f: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
23: 48 81 ec 10 01 00 00 sub $0x110,%rsp
* 2a: 48 c7 44 24 30 b3 8a movq $0x41b58ab3,0x30(%rsp) <-- trapping instruction
31: b5 41
33: 48 8d 44 24 30 lea 0x30(%rsp),%rax
38: 48 rex.W
39: c7 .byte 0xc7
3a: 44 24 38 rex.R and $0x38,%al
3d: c0 .byte 0xc0
3e: 4d rex.WRB
3f: 7a .byte 0x7a


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Lizhi Xu

unread,
Nov 6, 2024, 10:12:43 PM11/6/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
add resched to avoid retry too frequently when rreq need to retry

#syz test

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..079ba61e24d1 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -512,9 +512,13 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ if (!was_async)
+ cond_resched();
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ if (!was_async)
+ cond_resched();
} else {
__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
error = -ENODATA;

syzbot

unread,
Nov 6, 2024, 10:25:05 PM11/6/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: stack guard page was hit in v9fs_file_read_iter

BUG: TASK stack guard page was hit at ffffc90003ed7ef8 (stack is ffffc90003ed8000..ffffc90003ee0000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 6479 Comm: syz.1.23 Not tainted 6.12.0-rc6-syzkaller-gff7afaeca1a1-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
Code: 90 90 90 90 90 55 48 89 e5 41 57 41 56 41 89 d6 48 ba 00 00 00 00 00 fc ff df 41 55 41 54 53 48 83 e4 f0 48 81 ec 10 01 00 00 <48> c7 44 24 30 b3 8a b5 41 48 8d 44 24 30 48 c7 44 24 38 b0 4d 7a
RSP: 0018:ffffc90003ed7f00 EFLAGS: 00010086
RAX: 0000000000000004 RBX: ffff888031338b2a RCX: 0000000000000080
RDX: dffffc0000000000 RSI: ffff888031338b08 RDI: ffff888031338000
RBP: ffffc90003ed8040 R08: 0000000000000000 R09: fffffbfff2dc5b88
R10: ffffffff96e2dc47 R11: 0000000000000000 R12: 0000000000000002
R13: ffff888031338b08 R14: 0000000000000002 R15: ffff888031338000
FS: 00007f29143fe6c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90003ed7ef8 CR3: 00000000320be000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<#DF>
</#DF>
<TASK>
mark_usage kernel/locking/lockdep.c:4634 [inline]
__lock_acquire+0x8a3/0x3ce0 kernel/locking/lockdep.c:5156
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825
__raw_spin_trylock include/linux/spinlock_api_smp.h:90 [inline]
_raw_spin_trylock+0x63/0x80 kernel/locking/spinlock.c:138
spin_trylock include/linux/spinlock.h:361 [inline]
rmqueue_pcplist mm/page_alloc.c:3012 [inline]
rmqueue mm/page_alloc.c:3056 [inline]
get_page_from_freelist+0x34e/0x2d10 mm/page_alloc.c:3454
__alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab mm/slub.c:2578 [inline]
new_slab+0x2c9/0x410 mm/slub.c:2631
___slab_alloc+0xdac/0x1880 mm/slub.c:3818
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__do_kmalloc_node mm/slub.c:4263 [inline]
__kmalloc_noprof+0x367/0x400 mm/slub.c:4276
kmalloc_noprof include/linux/slab.h:882 [inline]
p9_fcall_init+0x97/0x260 net/9p/client.c:233
p9_tag_alloc+0x21c/0x870 net/9p/client.c:300
netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline]
netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline]
netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221
netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256
v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361
do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832
vfs_readv+0x4cf/0x890 fs/read_write.c:1025
do_preadv fs/read_write.c:1142 [inline]
__do_sys_preadv fs/read_write.c:1192 [inline]
__se_sys_preadv fs/read_write.c:1187 [inline]
__x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f291517e719
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f29143fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
RAX: ffffffffffffffda RBX: 00007f2915335f80 RCX: 00007f291517e719
RDX: 0000000000000001 RSI: 00000000200015c0 RDI: 0000000000000003
RBP: 00007f29151f139e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f2915335f80 R15: 00007ffdecdb72b8
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
Code: 90 90 90 90 90 55 48 89 e5 41 57 41 56 41 89 d6 48 ba 00 00 00 00 00 fc ff df 41 55 41 54 53 48 83 e4 f0 48 81 ec 10 01 00 00 <48> c7 44 24 30 b3 8a b5 41 48 8d 44 24 30 48 c7 44 24 38 b0 4d 7a
RSP: 0018:ffffc90003ed7f00 EFLAGS: 00010086
RAX: 0000000000000004 RBX: ffff888031338b2a RCX: 0000000000000080
RDX: dffffc0000000000 RSI: ffff888031338b08 RDI: ffff888031338000
RBP: ffffc90003ed8040 R08: 0000000000000000 R09: fffffbfff2dc5b88
R10: ffffffff96e2dc47 R11: 0000000000000000 R12: 0000000000000002
R13: ffff888031338b08 R14: 0000000000000002 R15: ffff888031338000
FS: 00007f29143fe6c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90003ed7ef8 CR3: 00000000320be000 CR4: 0000000000352ef0
3d: b0 4d mov $0x4d,%al
3f: 7a .byte 0x7a


Tested on:

commit: ff7afaec Merge tag 'nfs-for-6.12-3' of git://git.linux..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=169d5f40580000
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=169cf6a7980000

Lizhi Xu

unread,
Nov 6, 2024, 11:10:26 PM11/6/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
add resched to avoid retry too frequently when rreq need to retry

#syz test

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..eb4fc1f62000 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -366,6 +366,7 @@ static void netfs_rreq_assess(struct netfs_io_request *rreq)
//netfs_rreq_is_still_valid(rreq);

if (test_and_clear_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags)) {
+ cond_resched();
netfs_retry_reads(rreq);
return;
}
@@ -512,9 +513,15 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ printk("async: %d, r: %p, %s\n", was_async, rreq, __func__);
+ if (!was_async)
+ cond_resched();
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ printk("async: %d, r: %p, %s\n", was_async, rreq, __func__);

syzbot

unread,
Nov 6, 2024, 11:23:04 PM11/6/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: stack guard page was hit in v9fs_file_read_iter

BUG: TASK stack guard page was hit at ffffc9000e0e7ff8 (stack is ffffc9000e0e8000..ffffc9000e0f0000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 9051 Comm: syz.2.783 Not tainted 6.12.0-rc6-syzkaller-gff7afaeca1a1-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:kasan_check_range+0x1a/0x1a0 mm/kasan/generic.c:188
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 85 f6 0f 84 50 01 00 00 48 89 f8 41 54 44 0f b6 c2 48 01 f0 55 <53> 0f 82 c6 00 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 f8 0f 83
RSP: 0018:ffffc9000e0e8000 EFLAGS: 00010086
RAX: ffffc9000e0e80e8 RBX: ffffc9000e0e8088 RCX: ffffffff813d6ebe
RDX: 0000000000000001 RSI: 0000000000000060 RDI: ffffc9000e0e8088
RBP: 0000000000000060 R08: 0000000000000001 R09: 0000000000000000
R10: ffff888050fd0d48 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc9000e0e8148 R14: ffffc9000e0e8088 R15: ffffc9000e0e80b0
FS: 00007fee299f36c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000e0e7ff8 CR3: 000000004eb82000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<#DF>
</#DF>
<TASK>
__asan_memset+0x23/0x50 mm/kasan/shadow.c:84
__unwind_start+0x2e/0x7f0 arch/x86/kernel/unwind_orc.c:688
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0x74/0x100 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0x95/0xd0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:319 [inline]
__kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
radix_tree_node_alloc.constprop.0+0x1e8/0x350 lib/radix-tree.c:253
idr_get_free+0x528/0xa40 lib/radix-tree.c:1506
idr_alloc_u32+0x191/0x2f0 lib/idr.c:46
idr_alloc+0xc1/0x130 lib/idr.c:87
p9_tag_alloc+0x394/0x870 net/9p/client.c:321
p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
p9_client_zc_rpc.constprop.0+0x105/0x880 net/9p/client.c:793
p9_client_read_once+0x443/0x820 net/9p/client.c:1570
p9_client_read+0x13f/0x1b0 net/9p/client.c:1534
v9fs_issue_read+0x115/0x310 fs/9p/vfs_addr.c:74
netfs_retry_read_subrequests fs/netfs/read_retry.c:60 [inline]
netfs_retry_reads+0x153a/0x1d00 fs/netfs/read_retry.c:232
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5eb/0x890 fs/netfs/read_collect.c:372
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:408
netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline]
netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline]
netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221
netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256
v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361
do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832
vfs_readv+0x4cf/0x890 fs/read_write.c:1025
do_preadv fs/read_write.c:1142 [inline]
__do_sys_preadv fs/read_write.c:1192 [inline]
__se_sys_preadv fs/read_write.c:1187 [inline]
__x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee28b7e719
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fee299f3038 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
RAX: ffffffffffffffda RBX: 00007fee28d36058 RCX: 00007fee28b7e719
RDX: 0000000000000001 RSI: 00000000200015c0 RDI: 0000000000000003
RBP: 00007fee28bf139e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fee28d36058 R15: 00007fffcb1ce608
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kasan_check_range+0x1a/0x1a0 mm/kasan/generic.c:188
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 85 f6 0f 84 50 01 00 00 48 89 f8 41 54 44 0f b6 c2 48 01 f0 55 <53> 0f 82 c6 00 00 00 48 b8 ff ff ff ff ff 7f ff ff 48 39 f8 0f 83
RSP: 0018:ffffc9000e0e8000 EFLAGS: 00010086
RAX: ffffc9000e0e80e8 RBX: ffffc9000e0e8088 RCX: ffffffff813d6ebe
RDX: 0000000000000001 RSI: 0000000000000060 RDI: ffffc9000e0e8088
RBP: 0000000000000060 R08: 0000000000000001 R09: 0000000000000000
R10: ffff888050fd0d48 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc9000e0e8148 R14: ffffc9000e0e8088 R15: ffffc9000e0e80b0
FS: 00007fee299f36c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000e0e7ff8 CR3: 000000004eb82000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 66 0f 1f 00 nopw (%rax)
14: 48 85 f6 test %rsi,%rsi
17: 0f 84 50 01 00 00 je 0x16d
1d: 48 89 f8 mov %rdi,%rax
20: 41 54 push %r12
22: 44 0f b6 c2 movzbl %dl,%r8d
26: 48 01 f0 add %rsi,%rax
29: 55 push %rbp
* 2a: 53 push %rbx <-- trapping instruction
2b: 0f 82 c6 00 00 00 jb 0xf7
31: 48 b8 ff ff ff ff ff movabs $0xffff7fffffffffff,%rax
38: 7f ff ff
3b: 48 39 f8 cmp %rdi,%rax
3e: 0f .byte 0xf
3f: 83 .byte 0x83


Tested on:

commit: ff7afaec Merge tag 'nfs-for-6.12-3' of git://git.linux..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15134d87980000
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=14f73d5f980000

Lizhi Xu

unread,
Nov 7, 2024, 12:58:23 AM11/7/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
add limit to avoid retry too frequently when rreq need to retry

#syz test

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index c40e226053cc..b09a22442de6 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -182,6 +182,7 @@ static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error
{
struct netfs_io_subrequest *subreq = priv;

+ printk("subreq: %p, transfed: %ld, %s\n", priv, transferred_or_error, __func__);
if (transferred_or_error < 0) {
netfs_read_subreq_terminated(subreq, transferred_or_error, was_async);
return;
@@ -295,6 +296,7 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq)
netfs_stat(&netfs_n_rh_zero);
slice = netfs_prepare_read_iterator(subreq);
__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+ printk("1subreq: %p, transfed: %ld, %s\n", subreq, __func__);
netfs_read_subreq_terminated(subreq, 0, false);
goto done;
}
@@ -302,6 +304,7 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq)
if (source == NETFS_READ_FROM_CACHE) {
trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
slice = netfs_prepare_read_iterator(subreq);
+ printk("subreq: %p, transfed: %ld, %s\n", subreq, __func__);
netfs_read_cache_to_pagecache(rreq, subreq);
goto done;
}
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..f75429a4e743 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -465,6 +465,7 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
int error, bool was_async)
{
struct netfs_io_request *rreq = subreq->rreq;
+ static int rtt = 0;

switch (subreq->source) {
case NETFS_READ_FROM_CACHE:
@@ -506,15 +507,24 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
if (!error && subreq->transferred < subreq->len) {
if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) {
trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof);
+ rtt = 0;
} else {
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
if (subreq->transferred > subreq->consumed) {
+ rtt++;
+ if (rtt < 16) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
+ printk("1async: %d, r: %p, transed: %lu, sub req length: %lu, retry times: %d, %s\n", was_async, rreq, subreq->transferred, subreq->len, rtt, __func__);
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
+ rtt++;
+ if (rtt < 16) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
+ printk("async: %d, r: %p, transed: %lu, sub req length: %lu, retry times: %d, %s\n", was_async, rreq, subreq->transferred, subreq->len, rtt, __func__);
} else {
__set_bit(NETFS_SREQ_FAILED, &subreq->flags);
error = -ENODATA;
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 0b8086f58ad5..d80af1aa74e4 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -714,7 +714,7 @@ p9_virtio_create(struct p9_client *client, const char *devname, char *args)
mutex_unlock(&virtio_9p_lock);

if (!found) {
- pr_err("no channels available for device %s\n", devname);
+ pr_err_ratelimited("no channels available for device %s\n", devname);
return ret;
}

syzbot

unread,
Nov 7, 2024, 1:16:05 AM11/7/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+1fc6f6...@syzkaller.appspotmail.com
Tested-by: syzbot+1fc6f6...@syzkaller.appspotmail.com

Tested on:

commit: ff7afaec Merge tag 'nfs-for-6.12-3' of git://git.linux..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=105cbd5f980000
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1500cd87980000

Note: testing is done by a robot and is best-effort only.

Lizhi Xu

unread,
Nov 7, 2024, 4:35:59 AM11/7/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
add limit to avoid retry too frequently when rreq need to retry

#syz test

diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index b1a66a6e6bc2..1863258cd9db 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -87,6 +87,7 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)

netfs_prepare_dio_read_iterator(subreq);
slice = subreq->len;
+ printk("subrq: %p, %s\n", subreq, __func__);
rreq->netfs_ops->issue_read(subreq);

size -= slice;
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 72a435e5fc6d..ac9ca11b091f 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -63,6 +63,7 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
pg_size = array_size(max_pages, sizeof(*pages));
pages = (void *)bv + bv_size - pg_size;

+ printk("bvsize: %lu, pg_size: %lu, cnt: %lu, np: %u, max_p: %u, %s\n", bv_size, pg_size, count, npages, max_pages, __func__);
while (count && npages < max_pages) {
ret = iov_iter_extract_pages(orig, &pages, count,
max_pages - npages, extraction_flags,
@@ -98,6 +99,7 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
}

iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count);
+ printk("ret: %d, npages: %u, orig len: %lu, count: %lu, %s\n", ret, npages, orig_len, count, __func__);
return npages;
}
EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..4e244dfb23bf 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -465,6 +465,7 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
int error, bool was_async)
{
struct netfs_io_request *rreq = subreq->rreq;
+ static int rtt = 0;

switch (subreq->source) {
case NETFS_READ_FROM_CACHE:
@@ -506,12 +507,18 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
if (!error && subreq->transferred < subreq->len) {
if (test_bit(NETFS_SREQ_HIT_EOF, &subreq->flags)) {
trace_netfs_sreq(subreq, netfs_sreq_trace_hit_eof);
+ rtt = 0;
} else {
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
if (subreq->transferred > subreq->consumed) {
+ rtt++;
+ if (rtt < 50) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
+ printk("subreq: %p, 1async: %d, r: %p, transed: %lu, sub req length: %lu, retry times: %d, subreq consume: %d, subreq list empty: %d, %s\n",
+ subreq, was_async, rreq, subreq->transferred, subreq->len, rtt, subreq->consumed, list_empty(&rreq->subrequests), __func__);
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 819c75233235..b7d22f04593c 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -83,6 +83,7 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
if (!err)
subreq->transferred += total;

+ printk("subreq: %p, err: %d, total: %d, transfed: %d, %s\n", subreq, err, total, subreq->transferred, __func__);
netfs_read_subreq_terminated(subreq, err, false);

syzbot

unread,
Nov 7, 2024, 4:53:05 AM11/7/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel panic: corrupted stack end in corrupted

subreq: ffff88802e249680, 1async: 0, r: ffff888023be2d80, transed: 4096, sub req length: 16777088, retry times: 47, subreq consume: 0, subreq list empty: 0, netfs_read_subreq_terminated
subreq: ffff88802e249680, err: 0, total: 0, transfed: 4096, v9fs_issue_read
Kernel panic - not syncing: corrupted stack end detected inside scheduler
CPU: 1 UID: 0 PID: 19770 Comm: syz.0.14673 Not tainted 6.12.0-rc6-syzkaller-gff7afaeca1a1-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>


Tested on:

commit: ff7afaec Merge tag 'nfs-for-6.12-3' of git://git.linux..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1195bd5f980000
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=17b00ea7980000

Lizhi Xu

unread,
Nov 7, 2024, 8:47:53 PM11/7/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
if we didn't read new data then abandon retry

#syz test

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index c40e226053cc..a233412ba08f 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -233,6 +233,7 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq)

subreq->start = start;
subreq->len = size;
+ subreq->rretry_times = 0;

atomic_inc(&rreq->nr_outstanding);
spin_lock_bh(&rreq->lock);
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index b1a66a6e6bc2..beb81e06d13b 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -66,6 +66,7 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->start = start;
subreq->len = size;
+ subreq->rretry_times = 0;

atomic_inc(&rreq->nr_outstanding);
spin_lock_bh(&rreq->lock);
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..b2c8d5df73f9 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -509,9 +509,15 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
} else {
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
if (subreq->transferred > subreq->consumed) {
- __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
- set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ /* if we didn't read new data, abandon retry*/
+ if (subreq->rretry_times && subreq->fresh_len) {
+ __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
+ printk("subreq: %p, 1async: %d, rreq: %p, rreq transferred: %lu, sub req transed: %lu, "
+ "sub req length: %lu, retry times: %d, subreq consume: %d, subreq list empty: %d, %s\n",
+ subreq, was_async, rreq, rreq->transferred, subreq->transferred, subreq->len, rtt, subreq->consumed, list_empty(&rreq->subrequests), __func__);
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index 0350592ea804..d549b54de6ec 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -23,6 +23,8 @@ static void netfs_reissue_read(struct netfs_io_request *rreq,
atomic_inc(&rreq->nr_outstanding);
__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+ printk("rq: %p, subrq: %p, len: %lu, consumed: %d, transfed: %lu, %s\n",
+ rreq, subreq, subreq->len, subreq->consumed, subreq->transferred, __func__);
subreq->rreq->netfs_ops->issue_read(subreq);
}

@@ -52,10 +54,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
!test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags)) {
struct netfs_io_subrequest *subreq;

+ printk("rrq: %p, %s\n", rreq, __func__);
list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
break;
if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ subreq->rretry_times++;
netfs_reset_iter(subreq);
netfs_reissue_read(rreq, subreq);
}
@@ -183,6 +187,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
goto abandon;
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->start = start;
+ subreq->rretry_times = 0;

/* We get two refs, but need just one. */
netfs_put_subrequest(subreq, false, netfs_sreq_trace_new);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5eaceef41e6c..c0b1f058f09a 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -191,6 +191,8 @@ struct netfs_io_subrequest {
unsigned char curr_folio_order; /* Order of folio */
struct folio_queue *curr_folioq; /* Queue segment in which current folio resides */
unsigned long flags;
+ size_t fresh_len; /* The length of the data just read */
+ u8 rretry_times; /* The times of retry read */
#define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */
#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */
#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 819c75233235..6e33a3dfec40 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -80,8 +80,13 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
if (pos + total >= i_size_read(rreq->inode))
__set_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);

- if (!err)
+ if (!err) {
subreq->transferred += total;
+ subreq->fresh_len = total;
+ } else
+ subreq->fresh_len = 0;
+
+ printk("subreq: %p, sub rq len: %lu, err: %d, total: %d, transfed: %d, %s\n", subreq, subreq->len, err, total, subreq->transferred, __func__);

netfs_read_subreq_terminated(subreq, err, false);
}

syzbot

unread,
Nov 7, 2024, 8:54:05 PM11/7/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

fs/netfs/read_collect.c:522:119: error: 'rtt' undeclared (first use in this function)


Tested on:

commit: 906bd684 Merge tag 'spi-fix-v6.12-rc6' of git://git.ke..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=c0b2fb415081f288
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=13f96e30580000

Lizhi Xu

unread,
Nov 7, 2024, 9:18:56 PM11/7/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
+ subreq, was_async, rreq, rreq->transferred, subreq->transferred, subreq->len, subreq->rretry_times, subreq->consumed, list_empty(&rreq->subrequests), __func__);

syzbot

unread,
Nov 7, 2024, 9:36:04 PM11/7/24
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+1fc6f6...@syzkaller.appspotmail.com
Tested-by: syzbot+1fc6f6...@syzkaller.appspotmail.com

Tested on:

commit: 906bd684 Merge tag 'spi-fix-v6.12-rc6' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=162d6e30580000
kernel config: https://syzkaller.appspot.com/x/.config?x=20d60fe605153ebe
dashboard link: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=108bfd5f980000

Lizhi Xu

unread,
Nov 7, 2024, 10:40:34 PM11/7/24
to syzbot+1fc6f6...@syzkaller.appspotmail.com, asma...@codewreck.org, eri...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lu...@ionkov.net, syzkall...@googlegroups.com, v9...@lists.linux.dev
syzkaller reported a three-level circle calls (netfs_rreq_assess,
netfs_retry_reads, netfs_rreq_terminated), during an unbuffered or direct
I/O read. [1]

netfs_rreq_terminated() only checks that subreq's transferred is greater
than consumed and then sets the retry flag. There is no limit on the number
of retries, and there is no judgment on whether the retry is effective in
reading new data. This hitting the stack guard page.

To avoid the issue, let's add retry read times and the length of the data
just read in struct netfs_io_subrequest, use them to assess the state of a
read request and decide what to do retry.

[1]
...
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-and-tested-by: syzbot+1fc6f6...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
Signed-off-by: Lizhi Xu <lizh...@windriver.com>
---
fs/9p/vfs_addr.c | 5 ++++-
fs/netfs/buffered_read.c | 1 +
fs/netfs/direct_read.c | 1 +
fs/netfs/read_collect.c | 9 ++++++---
fs/netfs/read_retry.c | 2 ++
include/linux/netfs.h | 2 ++
6 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 819c75233235..9fcc77bc77bd 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -80,8 +80,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
if (pos + total >= i_size_read(rreq->inode))
__set_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);

- if (!err)
+ if (!err) {
subreq->transferred += total;
+ subreq->fresh_len = total;
+ } else
+ subreq->fresh_len = 0;

netfs_read_subreq_terminated(subreq, err, false);
}
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index c40e226053cc..a233412ba08f 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -233,6 +233,7 @@ static void netfs_read_to_pagecache(struct netfs_io_request *rreq)

subreq->start = start;
subreq->len = size;
+ subreq->rretry_times = 0;

atomic_inc(&rreq->nr_outstanding);
spin_lock_bh(&rreq->lock);
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index b1a66a6e6bc2..beb81e06d13b 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -66,6 +66,7 @@ static int netfs_dispatch_unbuffered_reads(struct netfs_io_request *rreq)
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->start = start;
subreq->len = size;
+ subreq->rretry_times = 0;

atomic_inc(&rreq->nr_outstanding);
spin_lock_bh(&rreq->lock);
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index b18c65ba5580..805e8f400797 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -509,9 +509,12 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
} else {
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
if (subreq->transferred > subreq->consumed) {
- __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
- set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ /* if we didn't read new data, abandon retry*/
+ if (subreq->rretry_times && subreq->fresh_len) {
+ __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
} else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index 0350592ea804..7aa0420cb4c4 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -56,6 +56,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
break;
if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ subreq->rretry_times++;
netfs_reset_iter(subreq);
netfs_reissue_read(rreq, subreq);
}
@@ -183,6 +184,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
goto abandon;
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->start = start;
+ subreq->rretry_times = 0;

/* We get two refs, but need just one. */
netfs_put_subrequest(subreq, false, netfs_sreq_trace_new);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5eaceef41e6c..c0b1f058f09a 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -191,6 +191,8 @@ struct netfs_io_subrequest {
unsigned char curr_folio_order; /* Order of folio */
struct folio_queue *curr_folioq; /* Queue segment in which current folio resides */
unsigned long flags;
+ size_t fresh_len; /* The length of the data just read */
+ u8 rretry_times; /* The times of retry read */
#define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */
#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */
#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */
--
2.43.0

asma...@codewreck.org

unread,
Nov 16, 2024, 1:34:47 AM11/16/24
to David Howells, Lizhi Xu, syzbot+1fc6f6...@syzkaller.appspotmail.com, eri...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lu...@ionkov.net, Christian Brauner, syzkall...@googlegroups.com, v9...@lists.linux.dev
David,

I see now you weren't in Cc, does that patch make sense to you?
https://lkml.kernel.org/r/20241108034020.3...@windriver.com


Lizhi Xu wrote on Fri, Nov 08, 2024 at 11:40:20AM +0800:
Subject: [PATCH] netfs: If didn't read new data then abandon retry

Since this is ended up a netfs patch please run get_maintainers.pl on it
and add appropriate maintainers neext time.

(You might want to reword that subject line as well to make it more
direct ("fix infinite recursion" or something), but if English isn't
your forte maintainers probably can fix it up anyway...)
Dominique Martinet | Asmadeus

David Howells

unread,
Dec 9, 2024, 10:53:17 AM12/9/24
to Lizhi Xu, dhow...@redhat.com, syzbot+1fc6f6...@syzkaller.appspotmail.com, asma...@codewreck.org, eri...@kernel.org, Christian Brauner, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lu...@ionkov.net, syzkall...@googlegroups.com, v9...@lists.linux.dev
Hi Lizhi,

I looked at your patch, but I think we can do a bit better as the retry
counter makes one of the subrequest flags redundant. Also, we should use
the retry counter on the write side - so see my attached take.

Note that this patch (and your patch) does not fix the actual root cause,
which is this bit:

if (atomic_dec_and_test(&rreq->nr_outstanding))
netfs_rreq_terminated(rreq, ...);

When we do a retry (or initial transmission, for that matter), we bump the
rreq->nr_outstanding counter to prevent the final cleanup phase running
before we've finished issuing the subrequests. The problem is if we hit 0,
we have to do the cleanup phase - but at the point we're hitting, we're
*in* the cleanup phase and end up repeating the retry cycle, hence the
recursion.

So I think it is still possible to trigger the issue if each retry reads,
say, a byte of data.

So, should I set a just set a hard limit on retry_count in both read and
write? Say it hits 50, we always abandon it. However, this needs to be
tuned both for the stack size and the config options (e.g. KASAN) that are
employed.

I don't think I can just dump the reiteration of the cleanup phase off to a
work item because multiple reiterations may stack up (at least, I think
they do), and I have to correctly balance the cleanup phases with
nr_outstanding counter.

The correct solution is the one I've implemented on my netfs-writeback
branch which gets rid of nr_outstanding entirely and has a single work item
to do all the collection for a request rather than trying to coordinate
between multiple concurrent work items (one for each subreq).

David
---
commit d0906b4a4611709c02de610d3c34d6172aa28aaf
Author: David Howells <dhow...@redhat.com>
Date: Fri Nov 8 11:40:20 2024 +0800

netfs: Work around recursion by abandoning retry if nothing read

syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
during an unbuffered or direct I/O read.

There are a number of issues:

(1) There is no limit on the number of retries.

(2) A subrequest is supposed to be abandoned if it does not transfer
anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
circumstances.

(3) The actual root cause, which is this:

if (atomic_dec_and_test(&rreq->nr_outstanding))
netfs_rreq_terminated(rreq, ...);

When we do a retry, we bump the rreq->nr_outstanding counter to
prevent the final cleanup phase running before we've finished
dispatching the retries. The problem is if we hit 0, we have to do
the cleanup phase - but we're in the cleanup phase and end up
repeating the retry cycle, hence the recursion.

Work around the problem by limiting the number of retries. This is based
on Lizhi Xu's patch[1], and makes the following changes:

(1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
the filesystem set it if it managed to read or write at least one byte
of data. Clear this bit before issuing a subrequest.

(2) Add a ->retry_count member to the subrequest and increment it any time
we do a retry.

(3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
->retry_count. If the latter is non-zero, we're doing a retry.

(4) Abandon a subrequest if retry_count is non-zero and we made no
progress.

(5) Use ->retry_count in both the write-side and the read-size.

The oops generated by KASAN looks something like:

BUG: TASK stack guard page was hit at ffffc9000482ff48 (stack is ffffc90004830000..ffffc90004838000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
...
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
...
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Closes: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
Signed-off-by: David Howells <dhow...@redhat.com>
Suggested-by: Lizhi Xu <lizh...@windriver.com>
cc: Dominique Martinet <asma...@codewreck.org>
cc: Jeff Layton <jla...@kernel.org>
cc: v9...@lists.linux.dev
cc: ne...@lists.linux.dev
cc: linux-...@vger.kernel.org
Link: https://lore.kernel.org/r/20241108034020.3...@windriver.com/ [1]

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 819c75233235..3bc9ce6c575e 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -57,6 +57,8 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
int err, len;

len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
+ if (len > 0)
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
netfs_write_subrequest_terminated(subreq, len ?: err, false);
}

@@ -80,8 +82,10 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
if (pos + total >= i_size_read(rreq->inode))
__set_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);

- if (!err)
+ if (!err) {
subreq->transferred += total;
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
+ }

netfs_read_subreq_terminated(subreq, err, false);
}
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 34107b55f834..ccb6aa8027c5 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -122,7 +122,7 @@ static void afs_issue_write_worker(struct work_struct *work)
if (subreq->debug_index == 3)
return netfs_write_subrequest_terminated(subreq, -ENOANO, false);

- if (!test_bit(NETFS_SREQ_RETRYING, &subreq->flags)) {
+ if (!subreq->retry_count) {
set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
return netfs_write_subrequest_terminated(subreq, -EAGAIN, false);
}
@@ -149,6 +149,9 @@ static void afs_issue_write_worker(struct work_struct *work)
afs_wait_for_operation(op);
ret = afs_put_operation(op);
switch (ret) {
+ case 0:
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
+ break;
case -EACCES:
case -EPERM:
case -ENOKEY:
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 46ce3b7adf07..47ed3a5044e2 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -438,7 +438,7 @@ void netfs_read_subreq_progress(struct netfs_io_subrequest *subreq,
rreq->origin == NETFS_READPAGE ||
rreq->origin == NETFS_READ_FOR_WRITE)) {
netfs_consume_read_data(subreq, was_async);
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
}
}
EXPORT_SYMBOL(netfs_read_subreq_progress);
@@ -497,7 +497,7 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
rreq->origin == NETFS_READPAGE ||
rreq->origin == NETFS_READ_FOR_WRITE)) {
netfs_consume_read_data(subreq, was_async);
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
}
rreq->transferred += subreq->transferred;
}
@@ -511,10 +511,13 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq,
} else {
trace_netfs_sreq(subreq, netfs_sreq_trace_short);
if (subreq->transferred > subreq->consumed) {
- __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
- set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
- } else if (!__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
+ /* If we didn't read new data, abandon retry. */
+ if (subreq->retry_count &&
+ test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags)) {
+ __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
+ set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
+ }
+ } else if (test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags)) {
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
set_bit(NETFS_RREQ_NEED_RETRY, &rreq->flags);
} else {
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index 0350592ea804..0e72e9226fc8 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -56,6 +56,8 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
break;
if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+ __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
+ subreq->retry_count++;
netfs_reset_iter(subreq);
netfs_reissue_read(rreq, subreq);
}
@@ -137,7 +139,8 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
stream0->sreq_max_len = subreq->len;

__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+ __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
+ subreq->retry_count++;

spin_lock_bh(&rreq->lock);
list_add_tail(&subreq->rreq_link, &rreq->subrequests);
@@ -213,7 +216,6 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
subreq->error = -ENOMEM;
__clear_bit(NETFS_SREQ_FAILED, &subreq->flags);
__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __clear_bit(NETFS_SREQ_RETRYING, &subreq->flags);
}
spin_lock_bh(&rreq->lock);
list_splice_tail_init(&queue, &rreq->subrequests);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 82290c92ba7a..ca3a11ed9b54 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -179,7 +179,6 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
struct iov_iter source = subreq->io_iter;

iov_iter_revert(&source, subreq->len - source.count);
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
netfs_reissue_write(stream, subreq, &source);
}
@@ -234,7 +233,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
/* Renegotiate max_len (wsize) */
trace_netfs_sreq(subreq, netfs_sreq_trace_retry);
__clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+ subreq->retry_count++;
stream->prepare_write(subreq);

part = min(len, stream->sreq_max_len);
@@ -279,7 +278,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
subreq->start = start;
subreq->debug_index = atomic_inc_return(&wreq->subreq_counter);
subreq->stream_nr = to->stream_nr;
- __set_bit(NETFS_SREQ_RETRYING, &subreq->flags);
+ subreq->retry_count = 1;

trace_netfs_sreq_ref(wreq->debug_id, subreq->debug_index,
refcount_read(&subreq->ref),
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index bf6d507578e5..ff0e82505a0b 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -244,6 +244,8 @@ void netfs_reissue_write(struct netfs_io_stream *stream,
iov_iter_advance(source, size);
iov_iter_truncate(&subreq->io_iter, size);

+ subreq->retry_count++;
+ __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
netfs_do_issue_write(stream, subreq);
}
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index bd42a419458e..6cb1e81993f8 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1319,14 +1319,16 @@ cifs_readv_callback(struct mid_q_entry *mid)
}

if (rdata->result == -ENODATA) {
- __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
rdata->result = 0;
+ __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
} else {
size_t trans = rdata->subreq.transferred + rdata->got_bytes;
if (trans < rdata->subreq.len &&
rdata->subreq.start + trans == ictx->remote_i_size) {
- __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
rdata->result = 0;
+ __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
+ } else if (rdata->got_bytes > 0) {
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &rdata->subreq.flags);
}
}

@@ -1670,10 +1672,13 @@ cifs_writev_callback(struct mid_q_entry *mid)
if (written > wdata->subreq.len)
written &= 0xFFFF;

- if (written < wdata->subreq.len)
+ if (written < wdata->subreq.len) {
result = -ENOSPC;
- else
+ } else {
result = written;
+ if (written > 0)
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &wdata->subreq.flags);
+ }
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index 010eae9d6c47..458b53d1f9cb 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -4615,6 +4615,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
__set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
rdata->result = 0;
}
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &rdata->subreq.flags);
}
trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, rdata->credits.value,
server->credits, server->in_flight,
@@ -4840,10 +4841,12 @@ smb2_writev_callback(struct mid_q_entry *mid)
if (written > wdata->subreq.len)
written &= 0xFFFF;

- if (written < wdata->subreq.len)
+ if (written < wdata->subreq.len) {
wdata->result = -ENOSPC;
- else
+ } else if (written > 0) {
wdata->subreq.len = written;
+ __set_bit(NETFS_SREQ_MADE_PROGRESS, &wdata->subreq.flags);
+ }
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
@@ -5012,7 +5015,7 @@ smb2_async_writev(struct cifs_io_subrequest *wdata)
}
#endif

- if (test_bit(NETFS_SREQ_RETRYING, &wdata->subreq.flags))
+ if (wdata->subreq.retry_count > 0)
smb2_set_replay(server, &rqst);

cifs_dbg(FYI, "async write at %llu %u bytes iter=%zx\n",
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5eaceef41e6c..4083d77e3f39 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -185,6 +185,7 @@ struct netfs_io_subrequest {
short error; /* 0 or error that occurred */
unsigned short debug_index; /* Index in list (for debugging output) */
unsigned int nr_segs; /* Number of segs in io_iter */
+ u8 retry_count; /* The number of retries (0 on initial pass) */
enum netfs_io_source source; /* Where to read from/write to */
unsigned char stream_nr; /* I/O stream this belongs to */
unsigned char curr_folioq_slot; /* Folio currently being read */
@@ -194,14 +195,13 @@ struct netfs_io_subrequest {
#define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */
#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */
#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */
-#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */
+#define NETFS_SREQ_MADE_PROGRESS 4 /* Set if we transferred at least some data */
#define NETFS_SREQ_ONDEMAND 5 /* Set if it's from on-demand read mode */
#define NETFS_SREQ_BOUNDARY 6 /* Set if ends on hard boundary (eg. ceph object) */
#define NETFS_SREQ_HIT_EOF 7 /* Set if short due to EOF */
#define NETFS_SREQ_IN_PROGRESS 8 /* Unlocked when the subrequest completes */
#define NETFS_SREQ_NEED_RETRY 9 /* Set if the filesystem requests a retry */
-#define NETFS_SREQ_RETRYING 10 /* Set if we're retrying */
-#define NETFS_SREQ_FAILED 11 /* Set if the subreq failed unretryably */
+#define NETFS_SREQ_FAILED 10 /* Set if the subreq failed unretryably */
};

enum netfs_io_origin {

Lizhi Xu

unread,
Dec 13, 2024, 2:27:09 AM12/13/24
to dhow...@redhat.com, asma...@codewreck.org, bra...@kernel.org, eri...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lizh...@windriver.com, lu...@ionkov.net, syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, v9...@lists.linux.dev
Will there be conflicts when reading and writing use the same flag to mark?
>
> (2) Add a ->retry_count member to the subrequest and increment it any time
> we do a retry.
>
> (3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
> ->retry_count. If the latter is non-zero, we're doing a retry.
>
> (4) Abandon a subrequest if retry_count is non-zero and we made no
> progress.
>
> (5) Use ->retry_count in both the write-side and the read-size.

BR,
Lizhi

David Howells

unread,
Dec 13, 2024, 3:41:56 AM12/13/24
to Lizhi Xu, dhow...@redhat.com, asma...@codewreck.org, bra...@kernel.org, eri...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linu...@crudebyte.com, lu...@ionkov.net, syzbot+1fc6f6...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, v9...@lists.linux.dev
Lizhi Xu <lizh...@windriver.com> wrote:

> > (1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
> > the filesystem set it if it managed to read or write at least one byte
> > of data. Clear this bit before issuing a subrequest.
> Will there be conflicts when reading and writing use the same flag to mark?

No, because, at the moment, a read done by a write (e.g. RMW with crypto) or a
write done by a read (e.g. writing just-read data to the cache) are handled
with an additional request structure since the set regions involved may differ
(RMW only needs read the unmodified ends for example).

David

syzbot

unread,
Mar 31, 2025, 3:48:14 AM3/31/25
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages