[syzbot] [gfs2?] memory leak in gfs2_trans_begin (2)

5 views
Skip to first unread message

syzbot

unread,
Nov 7, 2025, 2:30:43 AM (2 days ago) Nov 7
to agru...@redhat.com, gf...@lists.linux.dev, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: c2c2ccfd4ba7 Merge tag 'net-6.18-rc5' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11a39084580000
kernel config: https://syzkaller.appspot.com/x/.config?x=cb128cd5cb439809
dashboard link: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=171a7812580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1375dbcd980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b0451ba3fe41/disk-c2c2ccfd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d3e8c67119ab/vmlinux-c2c2ccfd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/1d8e176e5054/bzImage-c2c2ccfd.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/1af9667b349a/mount_0.gz
fsck result: failed (log: https://syzkaller.appspot.com/x/fsck.log?x=131a7812580000)

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+63ba84...@syzkaller.appspotmail.com

BUG: memory leak
unreferenced object 0xffff888126cf1000 (size 144):
comm "syz.2.26", pid 6030, jiffies 4294942626
hex dump (first 32 bytes):
c0 ef 59 82 ff ff ff ff 05 00 00 00 db 1a 00 00 ..Y.............
0b 00 00 00 00 00 00 00 06 00 00 00 00 00 00 00 ................
backtrace (crc f56b339f):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4975 [inline]
slab_alloc_node mm/slub.c:5280 [inline]
kmem_cache_alloc_noprof+0x397/0x5a0 mm/slub.c:5287
gfs2_trans_begin+0x29/0xa0 fs/gfs2/trans.c:115
alloc_dinode fs/gfs2/inode.c:418 [inline]
gfs2_create_inode+0xca0/0x1890 fs/gfs2/inode.c:807
gfs2_atomic_open+0x98/0x190 fs/gfs2/inode.c:1387
atomic_open fs/namei.c:3656 [inline]
lookup_open fs/namei.c:3767 [inline]
open_last_lookups fs/namei.c:3895 [inline]
path_openat+0x13ef/0x1eb0 fs/namei.c:4131
do_filp_open+0x102/0x1f0 fs/namei.c:4161
do_sys_openat2+0xc1/0x140 fs/open.c:1437
do_sys_open fs/open.c:1452 [inline]
__do_sys_openat fs/open.c:1468 [inline]
__se_sys_openat fs/open.c:1463 [inline]
__x64_sys_openat+0xb2/0x100 fs/open.c:1463
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

connection error: failed to recv *flatrpc.ExecutorMessageRawT: EOF


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Edward Adam Davis

unread,
Nov 8, 2025, 2:05:19 AM (yesterday) Nov 8
to syzbot+63ba84...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
#syz test

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 115c4ac457e9..7bba7951dbdb 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -1169,11 +1169,13 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
* never queued onto any of the ail lists. Here we add it to
* ail1 just so that ail_drain() will find and free it.
*/
- spin_lock(&sdp->sd_ail_lock);
- if (tr && list_empty(&tr->tr_list))
- list_add(&tr->tr_list, &sdp->sd_ail1_list);
- spin_unlock(&sdp->sd_ail_lock);
- tr = NULL;
+ if (gfs2_withdrawing(sdp)) {
+ spin_lock(&sdp->sd_ail_lock);
+ if (tr && list_empty(&tr->tr_list))
+ list_add(&tr->tr_list, &sdp->sd_ail1_list);
+ spin_unlock(&sdp->sd_ail_lock);
+ tr = NULL;
+ }
goto out_end;
}


syzbot

unread,
Nov 8, 2025, 2:34:04 AM (yesterday) Nov 8
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+63ba84...@syzkaller.appspotmail.com
Tested-by: syzbot+63ba84...@syzkaller.appspotmail.com

Tested on:

commit: e811c33b Merge tag 'drm-fixes-2025-11-08' of https://g..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14e60b42580000
kernel config: https://syzkaller.appspot.com/x/.config?x=cb128cd5cb439809
dashboard link: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1549117c580000

Note: testing is done by a robot and is best-effort only.

Edward Adam Davis

unread,
Nov 8, 2025, 4:13:44 AM (24 hours ago) Nov 8
to syzbot+63ba84...@syzkaller.appspotmail.com, agru...@redhat.com, gf...@lists.linux.dev, linux-...@vger.kernel.org, syzkall...@googlegroups.com
According to log [1], a "bad magic number" was found when checking the
metatype, which caused gfs2 withdraw.

The root cause of the problem is: log flush treats non-delayed withdraw
as withdraw, resulting in no one reclaiming the memory of transaction.
See the call stack below for details.

CPU1 CPU2
==== ====
gfs2_meta_buffer()
gfs2_metatype_check()
gfs2_metatype_check_i()
gfs2_metatype_check_ii() gfs2_log_flush()
gfs2_withdraw() tr = sdp->sd_log_tr
signal_our_withdraw() sdp->sd_log_tr = NULL
gfs2_ail_drain() goto out_withdraw
spin_unlock(&sdp->sd_ail_lock) trans_drain()
spin_lock(&sdp->sd_ail_lock)
list_add(&tr->tr_list, &sdp->sd_ail1_list)
tr = NULL
goto out_end

The original text suggests adding a delayed withdraw check to handle
transaction cases to avoid similar memory leaks.

syzbot reported:
[1]
gfs2: fsid=syz:syz.0: fatal: invalid metadata block - bh = 9381 (bad magic number), function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499

[2]
BUG: memory leak
unreferenced object 0xffff888126cf1000 (size 144):
backtrace (crc f56b339f):
gfs2_trans_begin+0x29/0xa0 fs/gfs2/trans.c:115
alloc_dinode fs/gfs2/inode.c:418 [inline]
gfs2_create_inode+0xca0/0x1890 fs/gfs2/inode.c:807


Fixes: f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")
Reported-by: syzbot+63ba84...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
Tested-by: syzbot+63ba84...@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <ead...@qq.com>
---
fs/gfs2/log.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--
2.43.0

Andreas Gruenbacher

unread,
Nov 8, 2025, 3:00:51 PM (13 hours ago) Nov 8
to Edward Adam Davis, syzbot+63ba84...@syzkaller.appspotmail.com, gf...@lists.linux.dev, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

On Sat, Nov 8, 2025 at 10:13 AM Edward Adam Davis <ead...@qq.com> wrote:
> According to log [1], a "bad magic number" was found when checking the
> metatype, which caused gfs2 withdraw.
>
> The root cause of the problem is: log flush treats non-delayed withdraw
> as withdraw, resulting in no one reclaiming the memory of transaction.
> See the call stack below for details.
>
> CPU1 CPU2
> ==== ====
> gfs2_meta_buffer()
> gfs2_metatype_check()
> gfs2_metatype_check_i()
> gfs2_metatype_check_ii() gfs2_log_flush()
> gfs2_withdraw() tr = sdp->sd_log_tr
> signal_our_withdraw() sdp->sd_log_tr = NULL
> gfs2_ail_drain() goto out_withdraw
> spin_unlock(&sdp->sd_ail_lock) trans_drain()
> spin_lock(&sdp->sd_ail_lock)
> list_add(&tr->tr_list, &sdp->sd_ail1_list)
> tr = NULL
> goto out_end
>

this bug report is against upstream commit c2c2ccfd4ba7, which
precedes the withdraw rework on gfs2's for-next branch. With those
patches, the race you are describing is no longer possible because
do_withdraw() now uses sdp->sd_log_flush_lock and the SDF_JOURNAL_LIVE
flag to synchronize with gfs2_log_flush().

I don't know why Bob chose to push the transaction onto the ail1 list
instead of freeing it in gfs2_log_flush(); that's something to clean
up. I've pushed an untested patch doing that to for-later.

Related commits:
58e08e8d83ab ("gfs2: fix trans slab error when withdraw occurs inside
log_flush")
f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")

Thanks,
Andreas

syzbot

unread,
Nov 8, 2025, 5:43:12 PM (10 hours ago) Nov 8
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
For archival purposes, forwarding an incoming command email to
linux-...@vger.kernel.org, syzkall...@googlegroups.com.

***

Subject: Re: [syzbot] [gfs2?] memory leak in gfs2_trans_begin (2)
Author: agru...@redhat.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
withdraw

syzbot

unread,
Nov 8, 2025, 5:59:05 PM (10 hours ago) Nov 8
to agru...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in do_xmote

gfs2: fsid=syz:syz.0: H: s:EX f:nW e:0 p:7821 [syz.6.189] gfs2_iomap_begin_write fs/gfs2/bmap.c:1040 [inline]
gfs2: fsid=syz:syz.0: H: s:EX f:nW e:0 p:7821 [syz.6.189] gfs2_iomap_begin+0x3e6/0x8a0 fs/gfs2/bmap.c:1133
gfs2: fsid=syz:syz.0: R: n:8336 f:80000000 b:70/70 i:7 q:0 r:0 e:7055
------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:674!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 0 UID: 0 PID: 7389 Comm: kworker/0:2H Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Workqueue: gfs2-glock/syz:syz glock_work_func
RIP: 0010:do_xmote+0x33d/0x360 fs/gfs2/glock.c:674
Code: 03 00 e9 cf fd ff ff e8 c1 85 09 ff 83 43 24 01 e9 53 ff ff ff e8 b3 85 09 ff ba 01 00 00 00 48 89 de 31 ff e8 f4 c9 ff ff 90 <0f> 0b e8 9c 85 09 ff ba 01 00 00 00 48 89 de 31 ff e8 dd c9 ff ff
RSP: 0018:ffffc9000a073d88 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88812e8a7728 RCX: ffffffff825ac696
RDX: ffff888102c61180 RSI: ffffffff8257e401 RDI: ffff88812d388afc
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 205d393833375420 R12: ffff8881087d0000
R13: 0000000000000001 R14: ffffffff857d0580 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8881b25c4000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000c1b000 CR3: 0000000119f7c000 CR4: 00000000003526f0
Call Trace:
<TASK>
run_queue+0x21a/0x310 fs/gfs2/glock.c:793
glock_work_func+0xac/0x280 fs/gfs2/glock.c:1002
process_one_work+0x26b/0x620 kernel/workqueue.c:3263
process_scheduled_works kernel/workqueue.c:3346 [inline]
worker_thread+0x2c4/0x4f0 kernel/workqueue.c:3427
kthread+0x15b/0x310 kernel/kthread.c:463
ret_from_fork+0x210/0x240 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:do_xmote+0x33d/0x360 fs/gfs2/glock.c:674
Code: 03 00 e9 cf fd ff ff e8 c1 85 09 ff 83 43 24 01 e9 53 ff ff ff e8 b3 85 09 ff ba 01 00 00 00 48 89 de 31 ff e8 f4 c9 ff ff 90 <0f> 0b e8 9c 85 09 ff ba 01 00 00 00 48 89 de 31 ff e8 dd c9 ff ff
RSP: 0018:ffffc9000a073d88 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88812e8a7728 RCX: ffffffff825ac696
RDX: ffff888102c61180 RSI: ffffffff8257e401 RDI: ffff88812d388afc
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 205d393833375420 R12: ffff8881087d0000
R13: 0000000000000001 R14: ffffffff857d0580 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8881b25c4000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000c1b000 CR3: 0000000119f7c000 CR4: 00000000003526f0


Tested on:

commit: 17448d78 gfs2: Clean up SDF_JOURNAL_LIVE flag handling
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git withdraw
console output: https://syzkaller.appspot.com/x/log.txt?x=12da8b42580000
kernel config: https://syzkaller.appspot.com/x/.config?x=cb128cd5cb439809
dashboard link: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.
Reply all
Reply to author
Forward
0 new messages