WARNING in get_signal

14 views
Skip to first unread message

syzbot

unread,
Oct 2, 2020, 11:48:21ā€ÆAM10/2/20
to ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, linux-...@vger.kernel.org, liuzhi...@huawei.com, ol...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: fcadab74 Merge tag 'drm-fixes-2020-10-01-1' of git://anong..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=116865bd900000
kernel config: https://syzkaller.appspot.com/x/.config?x=89ab6a0c48f30b49
dashboard link: https://syzkaller.appspot.com/bug?extid=3485e3773f7da290eecc
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1211120b900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16474c67900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3485e3...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6899 at kernel/signal.c:2431 do_jobctl_trap kernel/signal.c:2431 [inline]
WARNING: CPU: 1 PID: 6899 at kernel/signal.c:2431 get_signal+0x1b5c/0x1f00 kernel/signal.c:2621
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 6899 Comm: syz-executor116 Not tainted 5.9.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
panic+0x382/0x7fb kernel/panic.c:231
__warn.cold+0x20/0x4b kernel/panic.c:600
report_bug+0x1bd/0x210 lib/bug.c:198
handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:do_jobctl_trap kernel/signal.c:2431 [inline]
RIP: 0010:get_signal+0x1b5c/0x1f00 kernel/signal.c:2621
Code: 00 48 c7 c2 40 da 8a 88 be d1 09 00 00 48 c7 c7 a0 da 8a 88 c6 05 09 8c 09 0a 01 e8 43 97 11 00 e9 42 f5 ff ff e8 14 78 2b 00 <0f> 0b 41 bc 00 80 00 00 e9 49 f9 ff ff 4c 89 ef e8 bf 4d 6c 00 e9
RSP: 0018:ffffc90005537ce8 EFLAGS: 00010093
RAX: 0000000000000000 RBX: 0000000100000000 RCX: ffffffff814abfc3
RDX: ffff88809315c580 RSI: ffffffff814ac67c RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88809315ca0f
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000008000
R13: 0000000000000000 R14: 0000000000000000 R15: dffffc0000000000
arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
exit_to_user_mode_loop kernel/entry/common.c:161 [inline]
exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:192
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:267
ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:287
RIP: 0033:0x446809
Code: e8 5c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fbb8cdd1db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
RAX: 0000000000000000 RBX: 00000000006dbc28 RCX: 0000000000446809
RDX: 9999999999999999 RSI: 0000000000000000 RDI: 000000000007a900
RBP: 00000000006dbc20 R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
R13: 00007ffeca1e9fef R14: 00007fbb8cdd29c0 R15: 20c49ba5e353f7cf
Shutting down cpus with NMI
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Eric W. Biederman

unread,
Oct 2, 2020, 11:56:55ā€ÆAM10/2/20
to syzbot, ax...@kernel.dk, chri...@brauner.io, linux-...@vger.kernel.org, liuzhi...@huawei.com, ol...@redhat.com, syzkall...@googlegroups.com, Tejun Heo
syzbot <syzbot+3485e3...@syzkaller.appspotmail.com> writes:

> Hello,
>
> syzbot found the following issue on:

So this is:

static void do_jobctl_trap(void)
{
struct signal_struct *signal = current->signal;
int signr = current->jobctl & JOBCTL_STOP_SIGMASK;

if (current->ptrace & PT_SEIZED) {
if (!signal->group_stop_count &&
!(signal->flags & SIGNAL_STOP_STOPPED))
signr = SIGTRAP;
WARN_ON_ONCE(!signr);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ptrace_do_notify(signr, signr | (PTRACE_EVENT_STOP << 8),
CLD_STOPPED);
} else {
WARN_ON_ONCE(!signr);
ptrace_stop(signr, CLD_STOPPED, 0, NULL);
current->exit_code = 0;
}
}

I have the state of this paged out of my head at the moment.

Oleg or Tejun do you remember what is supposed to keep signr from being
NULL?


It looks like this code was introduced in commit 73ddff2bee15 ("job
control: introduce JOBCTL_TRAP_STOP and use it for group stop trap").

Eric

Oleg Nesterov

unread,
Oct 5, 2020, 9:49:32ā€ÆAM10/5/20
to Eric W. Biederman, syzbot, ax...@kernel.dk, chri...@brauner.io, linux-...@vger.kernel.org, liuzhi...@huawei.com, syzkall...@googlegroups.com, Tejun Heo
On 10/02, Eric W. Biederman wrote:
>
> syzbot <syzbot+3485e3...@syzkaller.appspotmail.com> writes:
>
> > Hello,
> >
> > syzbot found the following issue on:
>
> So this is:
>
> static void do_jobctl_trap(void)
> {
> struct signal_struct *signal = current->signal;
> int signr = current->jobctl & JOBCTL_STOP_SIGMASK;
>
> if (current->ptrace & PT_SEIZED) {
> if (!signal->group_stop_count &&
> !(signal->flags & SIGNAL_STOP_STOPPED))
> signr = SIGTRAP;
> WARN_ON_ONCE(!signr);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ptrace_do_notify(signr, signr | (PTRACE_EVENT_STOP << 8),
> CLD_STOPPED);
> } else {
> WARN_ON_ONCE(!signr);
> ptrace_stop(signr, CLD_STOPPED, 0, NULL);
> current->exit_code = 0;
> }
> }
>
> I have the state of this paged out of my head at the moment.
>
> Oleg or Tejun do you remember what is supposed to keep signr from being
> NULL?

This nearly killed me, but I seem to understand whats going on.
ptrace_init_task() does task_set_jobctl_pending(JOBCTL_TRAP_STOP) while
SIGNAL_STOP_STOPPED, child->jobctl & JOBCTL_STOP_SIGMASK == 0.

> It looks like this code was introduced in commit 73ddff2bee15 ("job
> control: introduce JOBCTL_TRAP_STOP and use it for group stop trap").

Yes, but I bet this was broken later, _may be_ by 924de3b8c9410c4.

I need to take a rest and read this code again. I too forgot how this
all supposed to work.

Oleg.

Oleg Nesterov

unread,
Oct 5, 2020, 12:30:28ā€ÆPM10/5/20
to Eric W. Biederman, syzbot, ax...@kernel.dk, chri...@brauner.io, linux-...@vger.kernel.org, liuzhi...@huawei.com, syzkall...@googlegroups.com, Tejun Heo
On 10/05, Oleg Nesterov wrote:
>
> > It looks like this code was introduced in commit 73ddff2bee15 ("job
> > control: introduce JOBCTL_TRAP_STOP and use it for group stop trap").
>
> Yes, but I bet this was broken later, _may be_ by 924de3b8c9410c4.

No, it seems this bug is really old. I'll try to make the fix tomorrow.

Oleg.

Oleg Nesterov

unread,
Oct 6, 2020, 1:05:32ā€ÆPM10/6/20
to Eric W. Biederman, syzbot, ax...@kernel.dk, chri...@brauner.io, linux-...@vger.kernel.org, liuzhi...@huawei.com, syzkall...@googlegroups.com, Tejun Heo
I still do not see a good fix. I am crying ;)

For the moment, lets forget about this problem. 924de3b8c9410c4 was wrong
anyway, task_join_group_stop() should be fixed:

- if current is traced, "jobctl & JOBCTL_STOP_PENDING" is not
enough, we need to check SIGNAL_STOP_STOPPED/group_stop_count

- if the new thread is traced, task_join_group_stop() should do
nothing, we should rely on ptrace_init_task()


Now lets return to this bug report. This (incomplete) test-case

void *tf(void *arg)
{
return NULL;
}

int main(void)
{
int pid = fork();
if (!pid) {
setpgrp();
kill(getpid(), SIGTSTP);

pthread_t th;
pthread_create(&th, NULL, tf, NULL);

return 0;
}

waitpid(pid, NULL, WSTOPPED);

ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);

ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);

int status;
int thr = waitpid(-1, &status, 0);
printf("pids: %d %d status: %x\n", pid, thr, status);

return 0;
}

triggers WARN_ON_ONCE(!signr) in do_jobctl_trap() and shows that the
auto-attached sub-thread reports the wrong status.

This patch

--- x/include/linux/ptrace.h
+++ x/include/linux/ptrace.h
@@ -218,7 +218,7 @@ static inline void ptrace_init_task(stru
__ptrace_link(child, current->parent, current->ptracer_cred);

if (child->ptrace & PT_SEIZED)
- task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);
+ task_set_jobctl_pending(child, JOBCTL_TRAP_STOP|SIGTRAP);
else
sigaddset(&child->pending.signal, SIGSTOP);
}

should fix the problem, but it is not enough even if we forget about
task_join_group_stop().

- it is not clear to me if the new thread should join the group stop
after (say) PTRACE_CONT. If yes, it is not clear how can we do this.

- in any case it should stop after ptrace_detach(), but in this case
jobctl & JOBCTL_STOP_SIGMASK == SIGTRAP doesn't look right.

So perhaps we can change the patch above to use
current->jobctl & JOBCTL_STOP_SIGMASK instead of SIGTRAP ?

This too doesn't look good, the 1st ptrace_stop() should probably
always report SIGTRAP...

Oleg.

Oleg Nesterov

unread,
Oct 15, 2020, 8:49:35ā€ÆAM10/15/20
to Eric W. Biederman, syzbot, ax...@kernel.dk, chri...@brauner.io, linux-...@vger.kernel.org, liuzhi...@huawei.com, syzkall...@googlegroups.com, Tejun Heo
On 10/06, Oleg Nesterov wrote:
>
> I still do not see a good fix. I am crying ;)

Sorry for delay... Finally I think I have a simple and clean fix.
We can leave ptrace_init_task() alone and fix task_join_group_stop().

I need to test it a bit and write the changelog, do you see any problem
in the patch below?

(TODO: SIGCONT should clear JOBCTL_STOP_SIGMASK, needs another patch)

Oleg.

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -391,16 +391,17 @@ static bool task_participate_group_stop(struct task_struct *task)

void task_join_group_stop(struct task_struct *task)
{
+ struct signal_struct *sig = current->signal;
+ unsigned long mask = current->jobctl & JOBCTL_STOP_SIGMASK;
+
+ if (sig->group_stop_count) {
+ sig->group_stop_count++;
+ mask |= JOBCTL_STOP_CONSUME;
+ } else if (!(sig->flags & SIGNAL_STOP_STOPPED))
+ return;
+
/* Have the new thread join an on-going signal group stop */
- unsigned long jobctl = current->jobctl;
- if (jobctl & JOBCTL_STOP_PENDING) {
- struct signal_struct *sig = current->signal;
- unsigned long signr = jobctl & JOBCTL_STOP_SIGMASK;
- unsigned long gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
- if (task_set_jobctl_pending(task, signr | gstop)) {
- sig->group_stop_count++;
- }
- }
+ task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING);
}

/*

Oleg Nesterov

unread,
Oct 19, 2020, 9:42:46ā€ÆAM10/19/20
to Andrew Morton, ax...@kernel.dk, chri...@brauner.io, ebie...@xmission.com, linux-...@vger.kernel.org, liuzhi...@huawei.com, syzkall...@googlegroups.com, Tejun Heo, syzbot
This testcase

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>

void *tf(void *arg)
{
return NULL;
}

int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);

pthread_t th;
pthread_create(&th, NULL, tf, NULL);

return 0;
}

waitpid(pid, NULL, WSTOPPED);

ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);

ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);

int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);

return 0;
}

fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().

This is because task_join_group_stop() has 2 problems when current is traced:

1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
can be woken up by debugger and it can clone another thread which
should join the group-stop.

We need to check group_stop_count || SIGNAL_STOP_STOPPED.

2. If SIGNAL_STOP_STOPPED is already set, we should not increment
sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
should stop without another do_notify_parent_cldstop() report.

To clarify, the problem is very old and we should blame ptrace_init_task().
But now that we have task_join_group_stop() it makes more sense to fix this
helper to avoid the code duplication.

Reported-by: syzbot+3485e3...@syzkaller.appspotmail.com
Cc: sta...@vger.kernel.org
Signed-off-by: Oleg Nesterov <ol...@redhat.com>
---
kernel/signal.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..ef8f2a28d37c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -391,16 +391,17 @@ static bool task_participate_group_stop(struct task_struct *task)

void task_join_group_stop(struct task_struct *task)
{
+ unsigned long mask = current->jobctl & JOBCTL_STOP_SIGMASK;
+ struct signal_struct *sig = current->signal;
+
+ if (sig->group_stop_count) {
+ sig->group_stop_count++;
+ mask |= JOBCTL_STOP_CONSUME;
+ } else if (!(sig->flags & SIGNAL_STOP_STOPPED))
+ return;
+
/* Have the new thread join an on-going signal group stop */
- unsigned long jobctl = current->jobctl;
- if (jobctl & JOBCTL_STOP_PENDING) {
- struct signal_struct *sig = current->signal;
- unsigned long signr = jobctl & JOBCTL_STOP_SIGMASK;
- unsigned long gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
- if (task_set_jobctl_pending(task, signr | gstop)) {
- sig->group_stop_count++;
- }
- }
+ task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING);
}

/*
--
2.25.1.362.g51ebf55


Reply all
Reply to author
Forward
0 new messages