Re: [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings

0 views
Skip to first unread message

Lai, Yi

unread,
Dec 4, 2025, 10:01:30 PMĀ (6 hours ago)Ā Dec 4
to Brian Foster, linux-...@vger.kernel.org, linu...@vger.kernel.org, linu...@kvack.org, h...@infradead.org, djw...@kernel.org, wi...@infradead.org, bra...@kernel.org, yi1...@intel.com, syzkall...@googlegroups.com
On Fri, Oct 03, 2025 at 09:46:39AM -0400, Brian Foster wrote:
> Use the iomap folio batch mechanism to select folios to zero on zero
> range of unwritten mappings. Trim the resulting mapping if the batch
> is filled (unlikely for current use cases) to distinguish between a
> range to skip and one that requires another iteration due to a full
> batch.
>
> Signed-off-by: Brian Foster <bfo...@redhat.com>
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: "Darrick J. Wong" <djw...@kernel.org>
> ---
> fs/xfs/xfs_iomap.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6a05e04ad5ba..535bf3b8705d 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1702,6 +1702,8 @@ xfs_buffered_write_iomap_begin(
> struct iomap *iomap,
> struct iomap *srcmap)
> {
> + struct iomap_iter *iter = container_of(iomap, struct iomap_iter,
> + iomap);
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> @@ -1773,6 +1775,7 @@ xfs_buffered_write_iomap_begin(
> */
> if (flags & IOMAP_ZERO) {
> xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> + u64 end;
>
> if (isnullstartblock(imap.br_startblock) &&
> offset_fsb >= eof_fsb)
> @@ -1780,6 +1783,26 @@ xfs_buffered_write_iomap_begin(
> if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
> end_fsb = eof_fsb;
>
> + /*
> + * Look up dirty folios for unwritten mappings within EOF.
> + * Providing this bypasses the flush iomap uses to trigger
> + * extent conversion when unwritten mappings have dirty
> + * pagecache in need of zeroing.
> + *
> + * Trim the mapping to the end pos of the lookup, which in turn
> + * was trimmed to the end of the batch if it became full before
> + * the end of the mapping.
> + */
> + if (imap.br_state == XFS_EXT_UNWRITTEN &&
> + offset_fsb < eof_fsb) {
> + loff_t len = min(count,
> + XFS_FSB_TO_B(mp, imap.br_blockcount));
> +
> + end = iomap_fill_dirty_folios(iter, offset, len);
> + end_fsb = min_t(xfs_fileoff_t, end_fsb,
> + XFS_B_TO_FSB(mp, end));
> + }
> +
> xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> }
>
> --
> 2.51.0
>

Hi Brian Foster,

Greetings!

I used Syzkaller and found that there is possible deadlock in xfs_ilock in linux-next next-20251203.

After bisection and the first bad commit is:
"
77c475692c5e xfs: fill dirty folios on zero range of unwritten mappings
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/251204_221645_xfs_ilock/bzImage_b2c27842ba853508b0da00187a7508eb3a96c8f7
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/251204_221645_xfs_ilock/b2c27842ba853508b0da00187a7508eb3a96c8f7_dmesg.log

"
[ 21.088994] ======================================================
[ 21.089362] WARNING: possible circular locking dependency detected
[ 21.089726] 6.18.0-next-20251203-b2c27842ba85 #1 Not tainted
[ 21.090060] ------------------------------------------------------
[ 21.090417] kswapd0/58 is trying to acquire lock:
[ 21.090697] ffff888028ff1f18 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_ilock+0x30f/0x390
[ 21.091235]
[ 21.091235] but task is already holding lock:
[ 21.091575] ffffffff8784b580 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb7e/0x15c0
[ 21.092058]
[ 21.092058] which lock already depends on the new lock.
[ 21.092058]
[ 21.092524]
[ 21.092524] the existing dependency chain (in reverse order) is:
[ 21.092949]
[ 21.092949] -> #1 (fs_reclaim){+.+.}-{0:0}:
[ 21.093290] fs_reclaim_acquire+0x116/0x160
[ 21.093579] __kmalloc_cache_noprof+0x53/0x7e0
[ 21.093886] iomap_fill_dirty_folios+0x118/0x2c0
[ 21.094204] xfs_buffered_write_iomap_begin+0xf18/0x2150
[ 21.094552] iomap_iter+0x551/0xf40
[ 21.094798] iomap_zero_range+0x20b/0xa90
[ 21.095075] xfs_zero_range+0xb5/0x100
[ 21.095335] xfs_reflink_remap_prep+0x3d3/0xa90
[ 21.095643] xfs_file_remap_range+0x23c/0xdc0
[ 21.095944] vfs_clone_file_range+0x2b1/0xda0
[ 21.096243] ioctl_file_clone+0x6e/0x110
[ 21.096521] do_vfs_ioctl+0xcab/0x14d0
[ 21.096786] __x64_sys_ioctl+0x127/0x220
[ 21.097057] x64_sys_call+0x1280/0x21b0
[ 21.097331] do_syscall_64+0x6d/0x1180
[ 21.097607] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 21.097936]
[ 21.097936] -> #0 (&xfs_nondir_ilock_class){++++}-{4:4}:
[ 21.098334] __lock_acquire+0x14d1/0x2210
[ 21.098615] lock_acquire+0x170/0x2f0
[ 21.098869] down_write_nested+0x9a/0x210
[ 21.099145] xfs_ilock+0x30f/0x390
[ 21.099385] xfs_icwalk_ag+0xaec/0x1b60
[ 21.099652] xfs_icwalk+0x56/0xc0
[ 21.099892] xfs_reclaim_inodes_nr+0x1d3/0x2d0
[ 21.100192] xfs_fs_free_cached_objects+0x6a/0x90
[ 21.100506] super_cache_scan+0x415/0x570
[ 21.100794] do_shrink_slab+0x408/0x1030
[ 21.101069] shrink_slab+0x348/0x12f0
[ 21.101329] shrink_node+0xacc/0x2670
[ 21.101587] balance_pgdat+0xa2d/0x15c0
[ 21.101860] kswapd+0x5b9/0xab0
[ 21.102093] kthread+0x464/0x980
[ 21.102329] ret_from_fork+0x780/0x8f0
[ 21.102596] ret_from_fork_asm+0x1a/0x30
[ 21.102873]
[ 21.102873] other info that might help us debug this:
[ 21.102873]
[ 21.103335] Possible unsafe locking scenario:
[ 21.103335]
[ 21.103683] CPU0 CPU1
[ 21.103955] ---- ----
[ 21.104225] lock(fs_reclaim);
[ 21.104428] lock(&xfs_nondir_ilock_class);
[ 21.104823] lock(fs_reclaim);
[ 21.105158] lock(&xfs_nondir_ilock_class);
[ 21.105416]
[ 21.105416] *** DEADLOCK ***
[ 21.105416]
[ 21.105762] 2 locks held by kswapd0/58:
[ 21.105993] #0: ffffffff8784b580 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb7e/0x15c0
[ 21.106487] #1: ffff88800fd580e0 (&type->s_umount_key#53){.+.+}-{4:4}, at: super_cache_scan+0x9f/0x570
[ 21.107047]
[ 21.107047] stack backtrace:
[ 21.107307] CPU: 1 UID: 0 PID: 58 Comm: kswapd0 Not tainted 6.18.0-next-20251203-b2c27842ba85 #1 PREEMPT(volu
[ 21.107319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.q4
[ 21.107326] Call Trace:
[ 21.107335] <TASK>
[ 21.107338] dump_stack_lvl+0xea/0x150
[ 21.107352] dump_stack+0x19/0x20
[ 21.107359] print_circular_bug+0x283/0x350
[ 21.107370] check_noncircular+0x12d/0x150
[ 21.107383] __lock_acquire+0x14d1/0x2210
[ 21.107398] lock_acquire+0x170/0x2f0
[ 21.107407] ? xfs_ilock+0x30f/0x390
[ 21.107420] ? __cond_resched+0x37/0x50
[ 21.107434] down_write_nested+0x9a/0x210
[ 21.107445] ? xfs_ilock+0x30f/0x390
[ 21.107456] ? __pfx_down_write_nested+0x10/0x10
[ 21.107468] ? xfs_icwalk_ag+0xadf/0x1b60
[ 21.107482] ? xfs_icwalk_ag+0xaec/0x1b60
[ 21.107497] ? xfs_icwalk_ag+0xaec/0x1b60
[ 21.107510] xfs_ilock+0x30f/0x390
[ 21.107523] xfs_icwalk_ag+0xaec/0x1b60
[ 21.107542] ? __pfx_xfs_icwalk_ag+0x10/0x10
[ 21.107561] ? __pfx_xa_find+0x10/0x10
[ 21.107581] ? xfs_group_grab_next_mark+0x26a/0x520
[ 21.107605] ? __this_cpu_preempt_check+0x21/0x30
[ 21.107616] ? lock_release+0x14f/0x2a0
[ 21.107628] ? xfs_group_grab_next_mark+0x274/0x520
[ 21.107643] ? __pfx_xfs_group_grab_next_mark+0x10/0x10
[ 21.107662] ? __pfx_try_to_wake_up+0x10/0x10
[ 21.107678] ? lock_release+0x14f/0x2a0
[ 21.107689] xfs_icwalk+0x56/0xc0
[ 21.107704] xfs_reclaim_inodes_nr+0x1d3/0x2d0
[ 21.107718] ? __pfx_xfs_reclaim_inodes_nr+0x10/0x10
[ 21.107734] ? __this_cpu_preempt_check+0x21/0x30
[ 21.107744] ? __pfx_prune_icache_sb+0x10/0x10
[ 21.107762] xfs_fs_free_cached_objects+0x6a/0x90
[ 21.107777] super_cache_scan+0x415/0x570
[ 21.107794] do_shrink_slab+0x408/0x1030
[ 21.107813] shrink_slab+0x348/0x12f0
[ 21.107831] ? shrink_slab+0x160/0x12f0
[ 21.107845] ? __pfx_shrink_slab+0x10/0x10
[ 21.107866] shrink_node+0xacc/0x2670
[ 21.107888] ? __pfx_shrink_node+0x10/0x10
[ 21.107900] ? preempt_schedule_common+0x49/0xd0
[ 21.107913] balance_pgdat+0xa2d/0x15c0
[ 21.107929] ? __pfx_balance_pgdat+0x10/0x10
[ 21.107941] ? rcu_watching_snap_stopped_since+0x20/0xf0
[ 21.107975] kswapd+0x5b9/0xab0
[ 21.107990] ? __pfx_kswapd+0x10/0x10
[ 21.108002] ? _raw_spin_unlock_irqrestore+0x35/0x70
[ 21.108017] ? trace_hardirqs_on+0x26/0x130
[ 21.108040] ? __pfx_autoremove_wake_function+0x10/0x10
[ 21.108060] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[ 21.108080] ? __kthread_parkme+0x1bc/0x260
[ 21.108094] ? __pfx_kswapd+0x10/0x10
[ 21.108107] ? __pfx_kswapd+0x10/0x10
[ 21.108120] kthread+0x464/0x980
[ 21.108128] ? __pfx_kthread+0x10/0x10
[ 21.108135] ? trace_hardirqs_on+0x26/0x130
[ 21.108149] ? _raw_spin_unlock_irq+0x3c/0x60
[ 21.108158] ? __pfx_kthread+0x10/0x10
[ 21.108167] ret_from_fork+0x780/0x8f0
[ 21.108177] ? __pfx_ret_from_fork+0x10/0x10
[ 21.108186] ? native_load_tls+0x16/0x50
[ 21.108199] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 21.108213] ? __switch_to+0x823/0x10b0
[ 21.108232] ? __pfx_kthread+0x10/0x10
[ 21.108240] ret_from_fork_asm+0x1a/0x30
[ 21.108257] </TASK>
[ 21.592826] repro: page allocation failure: order:0, mode:0x10cc0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null),0
[ 21.593533] CPU: 1 UID: 0 PID: 727 Comm: repro Not tainted 6.18.0-next-20251203-b2c27842ba85 #1 PREEMPT(volun
[ 21.593545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.q4
[ 21.593551] Call Trace:
[ 21.593554] <TASK>
[ 21.593557] dump_stack_lvl+0x121/0x150
[ 21.593572] dump_stack+0x19/0x20
[ 21.593582] warn_alloc+0x216/0x360
[ 21.593595] ? __pfx_warn_alloc+0x10/0x10
[ 21.593607] ? __pfx___alloc_pages_direct_compact+0x10/0x10
[ 21.593618] ? __drain_all_pages+0x27d/0x480
[ 21.593628] __alloc_pages_slowpath.constprop.0+0x1340/0x2230
[ 21.593644] ? __pfx___alloc_pages_slowpath.constprop.0+0x10/0x10
[ 21.593657] ? __might_sleep+0x108/0x160
[ 21.593680] __alloc_frozen_pages_noprof+0x47f/0x550
[ 21.593690] ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
[ 21.593702] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10
[ 21.593716] ? policy_nodemask+0xf9/0x450
[ 21.593734] alloc_pages_mpol+0x236/0x4c0
[ 21.593746] ? __pfx_alloc_pages_mpol+0x10/0x10
[ 21.593758] ? alloc_frozen_pages_noprof+0x48/0x180
[ 21.593766] ? alloc_frozen_pages_noprof+0x51/0x180
[ 21.593775] alloc_frozen_pages_noprof+0xa9/0x180
[ 21.593783] alloc_pages_noprof+0x27/0xa0
[ 21.593791] kimage_alloc_pages+0x78/0x240
[ 21.593809] kimage_alloc_control_pages+0x1ca/0xa60
[ 21.593819] ? __pfx_kimage_alloc_control_pages+0x10/0x10
[ 21.593827] ? __sanitizer_cov_trace_cmp8+0x1c/0x30
[ 21.593844] do_kexec_load+0x39b/0x8c0
[ 21.593851] ? __might_fault+0xf1/0x1b0
[ 21.593868] ? __pfx_do_kexec_load+0x10/0x10
[ 21.593876] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 21.593887] ? _copy_from_user+0x75/0xa0
[ 21.593904] __x64_sys_kexec_load+0x1cc/0x240
[ 21.593913] x64_sys_call+0x1c90/0x21b0
[ 21.593922] do_syscall_64+0x6d/0x1180
[ 21.593930] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 21.593938] RIP: 0033:0x7f347b83ee5d
[ 21.593952] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 88
[ 21.593959] RSP: 002b:00007ffc6cb1d938 EFLAGS: 00000207 ORIG_RAX: 00000000000000f6
[ 21.593972] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f347b83ee5d
[ 21.593977] RDX: 0000200000000180 RSI: 0000000000000003 RDI: 0000000000000000
[ 21.593982] RBP: 00007ffc6cb1d950 R08: 00007ffc6cb1d3c0 R09: 00007ffc6cb1d950
[ 21.593986] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffc6cb1daa8
[ 21.593991] R13: 00000000004030f5 R14: 000000000040ee08 R15: 00007f347bb26000
[ 21.594000] </TASK>
"

Hope this cound be insightful to you.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Reply all
Reply to author
Forward
0 new messages