[syzbot] [kernel?] WARNING in signal_wake_up_state

5 views
Skip to first unread message

syzbot

unread,
Jan 9, 2024, 1:18:35 PMJan 9
to ebie...@xmission.com, linux-...@vger.kernel.org, lu...@kernel.org, michael....@oracle.com, m...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, torv...@linux-foundation.org
Hello,

syzbot found the following issue on:

HEAD commit: 610a9b8f49fb Linux 6.7-rc8
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=134dee09e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=56c2c781bb4ee18
dashboard link: https://syzkaller.appspot.com/bug?extid=c6d438f2d77f96cae7c2
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10223829e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1121aeb5e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1e10270bc146/disk-610a9b8f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c6066a38235d/vmlinux-610a9b8f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e7df7096082d/bzImage-610a9b8f.xz

The issue was bisected to:

commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
Author: Mike Christie <michael....@oracle.com>
Date: Thu Jun 1 18:32:32 2023 +0000

fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15ff657ee80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=17ff657ee80000
console output: https://syzkaller.appspot.com/x/log.txt?x=13ff657ee80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c6d438...@syzkaller.appspotmail.com
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")

process 'syz-executor189' launched '/dev/fd/4' with NULL argv: empty string added
process 'memfd:��n�dR i5 ��ም[@8�� 9I �=��\'L�Ҏ�)JtTDq�ρ��1� � >�\� L�ϑ�M� ^T*' started with executable stack
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Modules linked in:
CPU: 1 PID: 5069 Comm: 4 Not tainted 6.7.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Code: 00 be ff ff ff ff 48 8d 78 18 e8 31 6c 2e 09 31 ff 41 89 c4 89 c6 e8 55 e8 35 00 45 85 e4 0f 85 62 ff ff ff e8 d7 ec 35 00 90 <0f> 0b 90 e9 54 ff ff ff 48 c7 c7 38 71 19 8f e8 12 96 8c 00 e9 2d
RSP: 0018:ffffc900039979f0 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff888020380000 RCX: ffffffff8151856b
RDX: ffff888023c40000 RSI: ffffffff81518579 RDI: 0000000000000005
RBP: 0000000000000108 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff888020380000 R15: ffff888023c40000
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000b7000000 CR3: 00000000288f3000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
signal_wake_up include/linux/sched/signal.h:448 [inline]
zap_process fs/coredump.c:373 [inline]
zap_threads fs/coredump.c:392 [inline]
coredump_wait fs/coredump.c:410 [inline]
do_coredump+0x784/0x3f70 fs/coredump.c:571
get_signal+0x242f/0x2790 kernel/signal.c:2890
arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0023:0xb7000000
Code: Unable to access opcode bytes at 0xb6ffffd6.
RSP: 002b:00000000ff8cdad0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Linus Torvalds

unread,
Jan 9, 2024, 2:06:06 PMJan 9
to syzbot, Oleg Nesterov, Eric W. Biederman, linux-...@vger.kernel.org, lu...@kernel.org, michael....@oracle.com, m...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
Oleg/Eric, can you make any sense of this?

On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438...@syzkaller.appspotmail.com> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080

Hmm. This smells more like a "that triggers the problem" than a cause.

Because the warning itself is

> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771

That's

lockdep_assert_held(&t->sighand->siglock);

at the top of the function, with the call trace being

> signal_wake_up include/linux/sched/signal.h:448 [inline]

just a wrapper setting 'state'.

> zap_process fs/coredump.c:373 [inline]

That's zap_process() that does a

for_each_thread(start, t) {

and then does a

signal_wake_up(t, 1);

on each thread.

> zap_threads fs/coredump.c:392 [inline]

And this is zap_threads(), which does

spin_lock_irq(&tsk->sighand->siglock);
...
nr = zap_process(tsk, exit_code);

Strange. The sighand->siglock is definitely taken.

The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a

list_for_each_entry_rcu(..)

walking over the tsk->signal->thread_head list.

But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:

if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
return ERR_PTR(-EINVAL);

so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.

So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.

Is there some odd exit race? The thread is removed with

list_del_rcu(&p->thread_node);

in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.

Thus bringing in Eric/Oleg to see if they see something I miss.

Original email at

https://lore.kernel.org/all/000000000000a4...@google.com/

for your pleasure.

Linus

Oleg Nesterov

unread,
Jan 10, 2024, 11:04:42 AMJan 10
to Linus Torvalds, syzbot, Eric W. Biederman, linux-...@vger.kernel.org, lu...@kernel.org, michael....@oracle.com, m...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
On 01/09, Linus Torvalds wrote:
>
> Oleg/Eric, can you make any sense of this?
>
> On Tue, 9 Jan 2024 at 10:18, syzbot
> <syzbot+c6d438...@syzkaller.appspotmail.com> wrote:
> >
> > The issue was bisected to:
> >
> > commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>
> Hmm. This smells more like a "that triggers the problem" than a cause.
>
> Because the warning itself is
>
> > WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>
> That's
>
> lockdep_assert_held(&t->sighand->siglock);

I have a fever, possibly I am totally confused, but this commit added

+ /* Don't require de_thread to wait for the vhost_worker */
+ if ((t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)
+ count++;

into zap_other_threads().

So it seems the caller can do unshare_sighand() before vhost thread exits and
actually unshare ->sighand because oldsighand->count > 1.

This is already very wrong (plus it seems this breaks the signal->notify_count
logic). IIRC I even tried to argue with this change... not sure.

And this can explain the warning, this task can start the coredump after exec
and hit vhost_worker with the old sighand != current->sighand.

Oleg.

Eric W. Biederman

unread,
Jan 10, 2024, 11:11:43 AMJan 10
to Linus Torvalds, syzbot, Oleg Nesterov, linux-...@vger.kernel.org, lu...@kernel.org, michael....@oracle.com, m...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
I expect this would take going through the rest of the reproducer
to see what is going on.
Hmm. The reproducer is in something other than C:

> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
> ioctl$int_in(r0, 0x40000000af01, 0x0)
> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)

If I read that correctly it is intending to put an elf32 executable into
a memfd and then execute it.

Exec will possibly unshare SIGHAND struct if there is still a reference
to it from somewhere else to ensure the new process has a clean one.

But all of the other threads should get shutdown by de_thread before any
of that happens. And de_thread should take care of the weird non-leader
execve case as well. So after that point the process really should
be single threaded. Which is why de_thread is the point of no return.

That whole interrupt comes in, and a fatal signal is processed
scenario is confusing.

Hmm. That weird vnet ioctl at the beginning must be what is starting
the vhost logic. So I guess it makes sense if the signal is received
by the magic vhost thread.

Perhaps there is some weird vhost logic where the thread lingers.

Ugh. I seem to remember a whole conversation about the vhost logic
(immediately after it was merged) and how it had a bug where it exited
differently from everything else. I remember people figuring it was
immediately ok, after the code was merged, and because everything had to
be performed as root, and no one actually uses the vhost logic like
that. It has been long enough I thought that would have been sorted
out by now.

Looking back to refresh my memory at the original conversation:
https://lore.kernel.org/all/20230601183232.8384...@oracle.com/

The bisect is 100% correct, and this was a known issue with that code at
the time it was merged.

I will let someone else take it from here.

Eric

Mike Christie

unread,
Jan 11, 2024, 12:20:09 PMJan 11
to Eric W. Biederman, Linus Torvalds, syzbot, Oleg Nesterov, linux-...@vger.kernel.org, lu...@kernel.org, m...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
The vhost code just wanted to wait for running IO from the vhost thread
context before exiting instead of working like io_uring which would wait
from a secondary thread.

> immediately ok, after the code was merged, and because everything had to
> be performed as root, and no one actually uses the vhost logic like
> that. It has been long enough I thought that would have been sorted
> out by now.
>
> Looking back to refresh my memory at the original conversation:
> https://lore.kernel.org/all/20230601183232.8384...@oracle.com/
>
> The bisect is 100% correct, and this was a known issue with that code at
> the time it was merged.
>

I have patches for this that I've been testing. Will post to the virt list
for the next feature window.


Reply all
Reply to author
Forward
0 new messages