[syzbot] [mm?] WARNING: bad unlock balance in do_wp_page

1 view
Skip to first unread message

syzbot

unread,
4:17 AM (10 hours ago) 4:17 AM
to Liam.H...@oracle.com, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, shakee...@linux.dev, sur...@google.com, syzkall...@googlegroups.com, vba...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000
kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb
dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-6596a02b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c5a50c04af50/vmlinux-6596a02b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/70da0dbf8561/bzImage-6596a02b.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7d60b3...@syzkaller.appspotmail.com

=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
dhcpcd-run-hook/5941 is trying to release lock (rcu_read_lock) at:
[<ffffffff8258a32d>] rcu_read_unlock+0x2d/0xb0 include/linux/rcupdate.h:867
but there are no more locks to release!

other info that might help us debug this:
1 lock held by dhcpcd-run-hook/5941:
#0: ffff8880440f3d48 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0x11d/0x590 mm/mmap_lock.c:310

stack backtrace:
CPU: 2 UID: 0 PID: 5941 Comm: dhcpcd-run-hook Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
print_unlock_imbalance_bug.part.0+0xfb/0x106 kernel/locking/lockdep.c:5298
print_unlock_imbalance_bug kernel/locking/lockdep.c:5278 [inline]
__lock_release kernel/locking/lockdep.c:5537 [inline]
lock_release kernel/locking/lockdep.c:5889 [inline]
lock_release+0x28d/0x310 kernel/locking/lockdep.c:5875
rcu_read_unlock+0x32/0xb0 include/linux/rcupdate.h:867
pte_unmap include/linux/pgtable.h:117 [inline]
wp_page_copy mm/memory.c:3960 [inline]
do_wp_page+0x13d7/0x4350 mm/memory.c:4320
handle_pte_fault mm/memory.c:6427 [inline]
__handle_mm_fault+0x1ab6/0x2a00 mm/memory.c:6549
handle_mm_fault+0x36d/0xa20 mm/memory.c:6718
do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0033:0x7f7501c33f87
Code: 5c 25 28 49 8b 57 10 48 85 db 74 27 8b 74 24 0c 23 73 08 44 39 f6 75 16 48 39 c2 75 05 e8 62 ff ff ff 48 8b 53 10 48 83 c0 08 <48> 89 50 f8 48 8b 1b eb d0 49 83 c4 08 49 81 fc 38 01 00 00 75 be
RSP: 002b:00007ffd29921270 EFLAGS: 00010212
RAX: 00005645dcc81140 RBX: 00005645dcc716d0 RCX: 0000000000000002
RDX: 00007ffd29925f83 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000008
R13: 00005645dcc70e60 R14: 0000000000000001 R15: 00005645dcc70c30
</TASK>
------------[ cut here ]------------
rrln < 0 || rrln > RCU_NEST_PMAX
WARNING: kernel/rcu/tree_plugin.h:443 at __rcu_read_unlock kernel/rcu/tree_plugin.h:443 [inline], CPU#3: dhcpcd-run-hook/5941
WARNING: kernel/rcu/tree_plugin.h:443 at __rcu_read_unlock+0x235/0x5e0 kernel/rcu/tree_plugin.h:430, CPU#3: dhcpcd-run-hook/5941
Modules linked in:
CPU: 3 UID: 0 PID: 5941 Comm: dhcpcd-run-hook Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__rcu_read_unlock kernel/rcu/tree_plugin.h:443 [inline]
RIP: 0010:__rcu_read_unlock+0x235/0x5e0 kernel/rcu/tree_plugin.h:430
Code: 74 11 c7 45 58 01 00 00 00 bf 09 00 00 00 e8 12 a5 da ff e8 9d e2 22 00 9c 58 f6 c4 02 0f 85 dd 02 00 00 fb e9 57 fe ff ff 90 <0f> 0b 90 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d9 14 ad 09 e8 44 64 87
RSP: 0000:ffffc900049c7af0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff888029ec4a00 RCX: ffffffff81e80bfe
RDX: 0000000000000000 RSI: ffffffff8df2fec2 RDI: ffff888029ec4ec4
RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000012 R12: ffff88802ad93408
R13: ffffea0000ad7800 R14: 0000000000000000 R15: ffffea0000ad7800
FS: 00007f7501945c80(0000) GS:ffff8880973e2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005645dcc70950 CR3: 00000000297d6000 CR4: 0000000000352ef0
Call Trace:
<TASK>
pte_unmap include/linux/pgtable.h:117 [inline]
wp_page_copy mm/memory.c:3960 [inline]
do_wp_page+0x13d7/0x4350 mm/memory.c:4320
handle_pte_fault mm/memory.c:6427 [inline]
__handle_mm_fault+0x1ab6/0x2a00 mm/memory.c:6549
handle_mm_fault+0x36d/0xa20 mm/memory.c:6718
do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0033:0x7f7501c33f87
Code: 5c 25 28 49 8b 57 10 48 85 db 74 27 8b 74 24 0c 23 73 08 44 39 f6 75 16 48 39 c2 75 05 e8 62 ff ff ff 48 8b 53 10 48 83 c0 08 <48> 89 50 f8 48 8b 1b eb d0 49 83 c4 08 49 81 fc 38 01 00 00 75 be
RSP: 002b:00007ffd29921270 EFLAGS: 00010212
RAX: 00005645dcc81140 RBX: 00005645dcc716d0 RCX: 0000000000000002
RDX: 00007ffd29925f83 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000008
R13: 00005645dcc70e60 R14: 0000000000000001 R15: 00005645dcc70c30
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Andrew Morton

unread,
6:49 AM (8 hours ago) 6:49 AM
to syzbot, Liam.H...@oracle.com, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, shakee...@linux.dev, sur...@google.com, syzkall...@googlegroups.com, vba...@kernel.org, Muchun Song, Qi Zheng
On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b3...@syzkaller.appspotmail.com> wrote:

> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb
> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c
> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.

argh, that dreaded sentence.

Thanks.

Something's definitely amiss. This is at least the fifth report of
rcu_read_lock() imbalance post-7.0. Others:

https://lore.kernel.org/69eab803.a00a022...@google.com
https://lore.kernel.org/69eab803.a00a022...@google.com
https://lore.kernel.org/69eafb0e.a00a02...@google.com
https://lore.kernel.org/69ebcbe2.a00a02...@google.com

In some cases we released it too often, in other cases we failed to
release it.

The first one is slightly more useful in that it tells us that the
not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave().

Muchun & Qi: you played with that rcu locking in 31b54a5e8916. Can you
please double-check that we didn't miss something?

Qi Zheng

unread,
11:58 AM (2 hours ago) 11:58 AM
to Andrew Morton, shakee...@linux.dev, syzbot, Liam.H...@oracle.com, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vba...@kernel.org, Muchun Song
Hi Andrew,

On 4/26/26 6:49 PM, Andrew Morton wrote:
> On Sun, 26 Apr 2026 01:17:25 -0700 syzbot <syzbot+7d60b3...@syzkaller.appspotmail.com> wrote:
>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 6596a02b2078 Merge tag 'drm-next-2026-04-22' of https://gi..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12483702580000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=24c8da4692f901cb
>> dashboard link: https://syzkaller.appspot.com/bug?extid=7d60b33a8a546263da7c
>> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
>> userspace arch: i386
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>
> argh, that dreaded sentence.
>
> Thanks.
>
> Something's definitely amiss. This is at least the fifth report of
> rcu_read_lock() imbalance post-7.0. Others:
>
> https://lore.kernel.org/69eab803.a00a022...@google.com
> https://lore.kernel.org/69eab803.a00a022...@google.com
> https://lore.kernel.org/69eafb0e.a00a02...@google.com
> https://lore.kernel.org/69ebcbe2.a00a02...@google.com

All the kernel configs mentioned above include 'CONFIG_MEMCG_V1=y'.

Theoretically, a rebind_subsystems() can lead a rcu unbalance, see my
previous discussion with Shakeel for details:

https://lore.kernel.org/all/358c60e1-fa91-40a1...@linux.dev/

However, in a production environment, this is practically impossible.
So Shakeel and I chose to wait for a reproducer at the time. :(

>
> In some cases we released it too often, in other cases we failed to
> release it.
>
> The first one is slightly more useful in that it tells us that the
> not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave().

I double-checked some callers of folio_lruvec_lock_irqsave() (such as
folios_put_refs()), but didn't find anything suspicious. :(

Andrew Morton

unread,
1:55 PM (1 hour ago) 1:55 PM
to Qi Zheng, shakee...@linux.dev, syzbot, Liam.H...@oracle.com, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vba...@kernel.org, Muchun Song
Right, that looks similar.

The rcu locking under lruvec_stat_mod_folio() is very simple, and that
return in get_non_dying_memcg_end() does look super suspicious. Why
does it omit the unlock?

otoh, in
https://lore.kernel.org/all/69eafb0e.a00a02...@google.com/
we're trying to release an rcu_read_lock() which isn't presently held.
But if cgroup_subsys_on_dfl() were to become false between the
get_non_dying_memcg_start/end pair, that's what would happen.

So yup, I agree, concurrent rebind_subsystems() activity could cause
all of this. The reports are pretty common - is there some debugging
patch we can temporarily add to confirm this theory? And/or is it
possible to cook up a selftest which will trigger this?

> However, in a production environment, this is practically impossible.

Can you expand on this?

sysbot isn't a production environment ;)

> So Shakeel and I chose to wait for a reproducer at the time. :(
>
> >
> > In some cases we released it too often, in other cases we failed to
> > release it.
> >
> > The first one is slightly more useful in that it tells us that the
> > not-released rcu_read_lock() was taken in folio_lruvec_lock_irqsave().
>
> I double-checked some callers of folio_lruvec_lock_irqsave() (such as
> folios_put_refs()), but didn't find anything suspicious. :(

Right - it's rare and smells of a race condition.

Reply all
Reply to author
Forward
0 new messages