[syzbot] [kernel?] WARNING: bad unlock balance in copy_process

7 views
Skip to first unread message

syzbot

unread,
May 31, 2025, 6:34:29 AM5/31/25
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 785cdec46e92 Merge tag 'x86-core-2025-05-25' of git://git...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=132bbdf4580000
kernel config: https://syzkaller.appspot.com/x/.config?x=628e87e3a98ec1c4
dashboard link: https://syzkaller.appspot.com/bug?extid=80cb3cc5c14fad191a10
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-785cdec4.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/259338148f62/vmlinux-785cdec4.xz
kernel image: https://storage.googleapis.com/syzbot-assets/436abe9bf6f7/bzImage-785cdec4.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+80cb3c...@syzkaller.appspotmail.com

=====================================
WARNING: bad unlock balance detected!
6.15.0-syzkaller-01958-g785cdec46e92 #0 Not tainted
-------------------------------------
syz.1.441/7809 is trying to release lock (&sighand->siglock) at:
[<ffffffff817a389e>] spin_unlock include/linux/spinlock.h:391 [inline]
[<ffffffff817a389e>] copy_process+0x5d6e/0x9170 kernel/fork.c:2686
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz.1.441/7809:
#0: ffffffff8e41c350 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: copy_process+0x3de8/0x9170 kernel/fork.c:2528

stack backtrace:
CPU: 0 UID: 0 PID: 7809 Comm: syz.1.441 Not tainted 6.15.0-syzkaller-01958-g785cdec46e92 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_unlock_imbalance_bug kernel/locking/lockdep.c:5301 [inline]
print_unlock_imbalance_bug+0x11b/0x130 kernel/locking/lockdep.c:5275
__lock_release kernel/locking/lockdep.c:5540 [inline]
lock_release+0x242/0x2f0 kernel/locking/lockdep.c:5892
__raw_spin_unlock include/linux/spinlock_api_smp.h:141 [inline]
_raw_spin_unlock+0x16/0x50 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:391 [inline]
copy_process+0x5d6e/0x9170 kernel/fork.c:2686
kernel_clone+0xfc/0x960 kernel/fork.c:2859
__do_sys_clone3+0x212/0x290 kernel/fork.c:3163
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6443fc31c9
Code: bf 08 00 48 8d 3d dc bf 08 00 e8 e2 28 f6 ff 66 90 b8 ea ff ff ff 48 85 ff 74 2c 48 85 d2 74 27 49 89 c8 b8 b3 01 00 00 0f 05 <48> 85 c0 7c 18 74 01 c3 31 ed 48 83 e4 f0 4c 89 c7 ff d2 48 89 c7
RSP: 002b:00007ffeb834dbb8 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
RAX: ffffffffffffffda RBX: 00007f6443f455b0 RCX: 00007f6443fc31c9
RDX: 00007f6443f455b0 RSI: 0000000000000058 RDI: 00007ffeb834dc00
RBP: 00007f6444dee6c0 R08: 00007f6444dee6c0 R09: 00007ffeb834dce7
R10: 0000000000000008 R11: 0000000000000202 R12: ffffffffffffffa8
R13: 000000000000000b R14: 00007ffeb834dc00 R15: 00007ffeb834dce8
</TASK>
------------[ cut here ]------------
pvqspinlock: lock 0xffff8880245e8940 has corrupted value 0x0!
WARNING: CPU: 0 PID: 7809 at kernel/locking/qspinlock_paravirt.h:504 __pv_queued_spin_unlock_slowpath+0x237/0x330 kernel/locking/qspinlock_paravirt.h:504
Modules linked in:
CPU: 0 UID: 0 PID: 7809 Comm: syz.1.441 Not tainted 6.15.0-syzkaller-01958-g785cdec46e92 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__pv_queued_spin_unlock_slowpath+0x237/0x330 kernel/locking/qspinlock_paravirt.h:504
Code: 03 0f b6 14 02 4c 89 e8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 67 41 8b 55 00 4c 89 ee 48 c7 c7 a0 79 8d 8b e8 ba e4 02 f6 90 <0f> 0b 90 90 e9 64 ff ff ff 90 0f 0b 48 89 df 4c 89 04 24 e8 31 e6
RSP: 0018:ffffc900034478d8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8880245e8940 RCX: ffffffff817a98c8
RDX: ffff8880247b0000 RSI: ffffffff817a98d5 RDI: 0000000000000001
RBP: ffff8880245e8948 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 6c6e697073717670 R12: ffff8880245e8950
R13: ffff8880245e8940 R14: 00000000fffffff4 R15: ffffc90003447d60
FS: 000055557a112500(0000) GS:ffff8880d69a6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c2dcf78 CR3: 000000003467f000 CR4: 0000000000352ef0
Call Trace:
<TASK>
__raw_callee_save___pv_queued_spin_unlock_slowpath+0x15/0x30
.slowpath+0x9/0x18
pv_queued_spin_unlock arch/x86/include/asm/paravirt.h:562 [inline]
queued_spin_unlock arch/x86/include/asm/qspinlock.h:57 [inline]
do_raw_spin_unlock+0x172/0x230 kernel/locking/spinlock_debug.c:142
__raw_spin_unlock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_unlock+0x1e/0x50 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:391 [inline]
copy_process+0x5d6e/0x9170 kernel/fork.c:2686
kernel_clone+0xfc/0x960 kernel/fork.c:2859
__do_sys_clone3+0x212/0x290 kernel/fork.c:3163
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6443fc31c9
Code: bf 08 00 48 8d 3d dc bf 08 00 e8 e2 28 f6 ff 66 90 b8 ea ff ff ff 48 85 ff 74 2c 48 85 d2 74 27 49 89 c8 b8 b3 01 00 00 0f 05 <48> 85 c0 7c 18 74 01 c3 31 ed 48 83 e4 f0 4c 89 c7 ff d2 48 89 c7
RSP: 002b:00007ffeb834dbb8 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
RAX: ffffffffffffffda RBX: 00007f6443f455b0 RCX: 00007f6443fc31c9
RDX: 00007f6443f455b0 RSI: 0000000000000058 RDI: 00007ffeb834dc00
RBP: 00007f6444dee6c0 R08: 00007f6444dee6c0 R09: 00007ffeb834dce7
R10: 0000000000000008 R11: 0000000000000202 R12: ffffffffffffffa8
R13: 000000000000000b R14: 00007ffeb834dc00 R15: 00007ffeb834dce8
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Sep 17, 2025, 4:40:32 PM9/17/25
to Liam.H...@oracle.com, ak...@linux-foundation.org, bse...@google.com, da...@redhat.com, dietmar....@arm.com, juri....@redhat.com, ke...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, lorenzo...@oracle.com, mgo...@suse.de, mho...@suse.com, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, rp...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vba...@suse.cz, vincent...@linaro.org, vsch...@redhat.com
syzbot has found a reproducer for the following issue on:

HEAD commit: 6edf2885ebeb Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=16d14c7c580000
kernel config: https://syzkaller.appspot.com/x/.config?x=b8b6789b42526d72
dashboard link: https://syzkaller.appspot.com/bug?extid=80cb3cc5c14fad191a10
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=179d9f62580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11d14c7c580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c72239eb6d76/disk-6edf2885.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b67e9820b2be/vmlinux-6edf2885.xz
kernel image: https://storage.googleapis.com/syzbot-assets/0c4ab7e562f6/Image-6edf2885.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+80cb3c...@syzkaller.appspotmail.com

=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
[<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
[<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz.1.48/6865:
#0: ffff80008fa00450 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: copy_process+0x2228/0x31ec kernel/fork.c:2274

stack backtrace:
CPU: 0 UID: 0 PID: 6865 Comm: syz.1.48 Not tainted syzkaller #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
dump_stack+0x1c/0x28 lib/dump_stack.c:129
print_unlock_imbalance_bug+0xf4/0xfc kernel/locking/lockdep.c:5298
__lock_release kernel/locking/lockdep.c:-1 [inline]
lock_release+0x244/0x39c kernel/locking/lockdep.c:5889
__raw_spin_unlock include/linux/spinlock_api_smp.h:141 [inline]
_raw_spin_unlock+0x24/0x78 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:391 [inline]
copy_process+0x22d4/0x31ec kernel/fork.c:2432
kernel_clone+0x1d8/0x84c kernel/fork.c:2605
__do_sys_clone kernel/fork.c:2748 [inline]
__se_sys_clone kernel/fork.c:2716 [inline]
__arm64_sys_clone+0x144/0x1a0 kernel/fork.c:2716
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Vlastimil Babka

unread,
Sep 18, 2025, 4:35:28 AM9/18/25
to syzbot, Liam.H...@oracle.com, ak...@linux-foundation.org, bse...@google.com, da...@redhat.com, dietmar....@arm.com, juri....@redhat.com, ke...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, lorenzo...@oracle.com, mgo...@suse.de, mho...@suse.com, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, rp...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com, Sebastian Andrzej Siewior
On 9/17/25 22:40, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 6edf2885ebeb Merge branch 'for-next/core' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=16d14c7c580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=b8b6789b42526d72
> dashboard link: https://syzkaller.appspot.com/bug?extid=80cb3cc5c14fad191a10
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> userspace arch: arm64
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=179d9f62580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11d14c7c580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/c72239eb6d76/disk-6edf2885.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b67e9820b2be/vmlinux-6edf2885.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/0c4ab7e562f6/Image-6edf2885.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+80cb3c...@syzkaller.appspotmail.com
>
> =====================================
> WARNING: bad unlock balance detected!
> syzkaller #0 Not tainted
> -------------------------------------
> syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
> [<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
> [<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432

bad_fork_core_free:
sched_core_free(p);
spin_unlock(&current->sighand->siglock); <- here

Sebastian, I think it's your 7c4f75a21f63 ("futex: Allow automatic
allocation of process wide futex hash") adding a "goto bad_fork_core_free;"
from a place that doesn't yet have current->sighand->siglock locked?

Sebastian Andrzej Siewior

unread,
Sep 18, 2025, 4:48:33 AM9/18/25
to Vlastimil Babka, syzbot, Liam.H...@oracle.com, ak...@linux-foundation.org, bse...@google.com, da...@redhat.com, dietmar....@arm.com, juri....@redhat.com, ke...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, lorenzo...@oracle.com, mgo...@suse.de, mho...@suse.com, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, rp...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com
On 2025-09-18 10:35:24 [+0200], Vlastimil Babka wrote:
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+80cb3c...@syzkaller.appspotmail.com
> >
> > =====================================
> > WARNING: bad unlock balance detected!
> > syzkaller #0 Not tainted
> > -------------------------------------
> > syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
> > [<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
> > [<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432
>
> bad_fork_core_free:
> sched_core_free(p);
> spin_unlock(&current->sighand->siglock); <- here
>
> Sebastian, I think it's your 7c4f75a21f63 ("futex: Allow automatic
> allocation of process wide futex hash") adding a "goto bad_fork_core_free;"
> from a place that doesn't yet have current->sighand->siglock locked?

Yes. Judging from -rc6, if futex_hash_allocate_default() fails we hold
neither siglock nor tasklist_lock. sched_core_free() looks also bad as
the cookie was allocated later in sched_core_fork(). sched_cgroup_fork()
does nothing special. So it should be

diff --git a/kernel/fork.c b/kernel/fork.c
index c4ada32598bd5..6ca8689a83b5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2295,7 +2295,7 @@ __latent_entropy struct task_struct *copy_process(
if (need_futex_hash_allocate_default(clone_flags)) {
retval = futex_hash_allocate_default();
if (retval)
- goto bad_fork_core_free;
+ goto bad_fork_cancel_cgroup;
/*
* If we fail beyond this point we don't free the allocated
* futex hash map. We assume that another thread will be created

Sebastian

Sebastian Andrzej Siewior

unread,
Sep 18, 2025, 9:09:52 AM9/18/25
to Vlastimil Babka, Thomas Gleixner, Peter Zijlstra, syzbot, Liam.H...@oracle.com, ak...@linux-foundation.org, bse...@google.com, da...@redhat.com, dietmar....@arm.com, juri....@redhat.com, ke...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, lorenzo...@oracle.com, mgo...@suse.de, mho...@suse.com, mi...@redhat.com, pet...@infradead.org, ros...@goodmis.org, rp...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com
copy_process() uses the wrong error exit path from
futex_hash_allocate_default().
After exiting from futex_hash_allocate_default(), neither tasklist_lock
nor siglock has been acquired. The exit label bad_fork_core_free unlocks
both of these locks which is wrong.

The previous label, bad_fork_cancel_cgroup, is the correct exit.
sched_cgroup_fork() did not allocate any resources that need to freed.

Use bad_fork_cancel_cgroup on error exit from
futex_hash_allocate_default().

Fixes: 7c4f75a21f636 ("futex: Allow automatic allocation of process wide futex hash")
Reported-by: syzbot+80cb3c...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68cb1cbd.050a022...@google.com
Signed-off-by: Sebastian Andrzej Siewior <big...@linutronix.de>
---

That private-futex code was marked BROKEN in v6.16 and re-enabled in
v6.17. It could use
56180dd20c19e ("futex: Use RCU-based per-CPU reference counting instead of rcuref_t")

as Fixes: instead to avoid backporting to v6.16.

kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c4ada32598bd5..6ca8689a83b5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2295,7 +2295,7 @@ __latent_entropy struct task_struct *copy_process(
if (need_futex_hash_allocate_default(clone_flags)) {
retval = futex_hash_allocate_default();
if (retval)
- goto bad_fork_core_free;
+ goto bad_fork_cancel_cgroup;
/*
* If we fail beyond this point we don't free the allocated
* futex hash map. We assume that another thread will be created
--
2.51.0

Steven Rostedt

unread,
Sep 18, 2025, 11:29:22 AM9/18/25
to Sebastian Andrzej Siewior, Vlastimil Babka, Thomas Gleixner, Peter Zijlstra, syzbot, Liam.H...@oracle.com, ak...@linux-foundation.org, bse...@google.com, da...@redhat.com, dietmar....@arm.com, juri....@redhat.com, ke...@kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, lorenzo...@oracle.com, mgo...@suse.de, mho...@suse.com, mi...@redhat.com, rp...@kernel.org, sur...@google.com, syzkall...@googlegroups.com, vincent...@linaro.org, vsch...@redhat.com
On Thu, 18 Sep 2025 15:09:45 +0200
Sebastian Andrzej Siewior <big...@linutronix.de> wrote:

> copy_process() uses the wrong error exit path from
> futex_hash_allocate_default().
> After exiting from futex_hash_allocate_default(), neither tasklist_lock
> nor siglock has been acquired. The exit label bad_fork_core_free unlocks
> both of these locks which is wrong.
>
> The previous label, bad_fork_cancel_cgroup, is the correct exit.
> sched_cgroup_fork() did not allocate any resources that need to freed.
>
> Use bad_fork_cancel_cgroup on error exit from
> futex_hash_allocate_default().

if (need_futex_hash_allocate_default(clone_flags)) {
retval = futex_hash_allocate_default();
if (retval)
goto bad_fork_core_free;
[..]
}
[..]
write_lock_irq(&tasklist_lock);
[..]
klp_copy_process(p);

sched_core_fork(p);

spin_lock(&current->sighand->siglock);

[..]

bad_fork_core_free:
sched_core_free(p);
spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
bad_fork_cancel_cgroup:
cgroup_cancel_fork(p, args);

Yep, looks bad to me!

Reviewed-by: Steven Rostedt (Google) <ros...@goodmis.org>

-- Steve
Reply all
Reply to author
Forward
0 new messages