INFO: task hung in perf_trace_event_unreg

32 views
Skip to first unread message

syzbot

unread,
Apr 2, 2018, 5:20:03 AM4/2/18
to linux-...@vger.kernel.org, mi...@redhat.com, ros...@goodmis.org, syzkall...@googlegroups.com
Hello,

syzbot hit the following crash on upstream commit
0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
Linux 4.16
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2dbc55...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
option "g �;e�K�׫>pquota"
INFO: task syz-executor3:10803 blocked for more than 120 seconds.
Not tainted 4.16.0+ #10
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3 D20944 10803 4492 0x80000002
Call Trace:
context_switch kernel/sched/core.c:2862 [inline]
__schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
schedule+0xf5/0x430 kernel/sched/core.c:3499
schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
do_wait_for_common kernel/sched/completion.c:86 [inline]
__wait_for_common kernel/sched/completion.c:107 [inline]
wait_for_common kernel/sched/completion.c:118 [inline]
wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
__wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
perf_trace_event_unreg.isra.2+0xb7/0x1f0
kernel/trace/trace_event_perf.c:161
perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
_free_event+0x3bd/0x10f0 kernel/events/core.c:4121
put_event+0x24/0x30 kernel/events/core.c:4204
perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
perf_release+0x37/0x50 kernel/events/core.c:4320
__fput+0x327/0x7e0 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x199/0x270 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x9bb/0x1ad0 kernel/exit.c:865
do_group_exit+0x149/0x400 kernel/exit.c:968
get_signal+0x73a/0x16d0 kernel/signal.c:2469
do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455269
RSP: 002b:00007f8976371ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000000 RBX: 000000000072bec8 RCX: 0000000000455269
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bec8
RBP: 000000000072bec8 R08: 0000000000000000 R09: 000000000072bea0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe793f79cf R14: 00007f89763729c0 R15: 0000000000000000

Showing all locks held in the system:
2 locks held by khungtaskd/876:
#0: (rcu_read_lock){....}, at: [<000000008f2bec4b>]
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: (rcu_read_lock){....}, at: [<000000008f2bec4b>] watchdog+0x1c5/0xd60
kernel/hung_task.c:249
#1: (tasklist_lock){.+.+}, at: [<0000000006b3009f>]
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
2 locks held by getty/4414:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4415:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4416:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4417:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4418:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4419:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4420:
#0: (&tty->ldisc_sem){++++}, at: [<00000000e51437c8>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000762a7320>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
1 lock held by syz-executor3/10803:
#0: (event_mutex){+.+.}, at: [<00000000c507b78a>]
perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
4 locks held by syz-executor5/10816:
#0: (&tty->legacy_mutex){+.+.}, at: [<00000000567b7b94>]
tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
#1: (&tty->legacy_mutex/1){+.+.}, at: [<00000000567b7b94>]
tty_lock+0x5d/0x90 drivers/tty/tty_mutex.c:19
#2: (&tty->ldisc_sem){++++}, at: [<000000002b6b6a29>]
tty_ldisc_ref+0x1b/0x80 drivers/tty/tty_ldisc.c:298
#3: (&o_tty->termios_rwsem/1){++++}, at: [<0000000007d9a7a4>]
n_tty_flush_buffer+0x21/0x320 drivers/tty/n_tty.c:357
1 lock held by syz-executor2/10827:
#0: (event_mutex){+.+.}, at: [<00000000c507b78a>]
perf_trace_destroy+0x28/0x100 kernel/trace/trace_event_perf.c:234
1 lock held by blkid/10832:
#0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor4/10835:
#0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor4/10845:
#0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<000000006e2f031e>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 876 Comm: khungtaskd Not tainted 4.16.0+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x90c/0xd60 kernel/hung_task.c:249
INFO: rcu_sched self-detected stall on CPU
0-....: (124996 ticks this GP) idle=75e/1/4611686018427387906
softirq=33205/33205 fqs=30980

(t=125000 jiffies g=17618 c=17617 q=921)
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 7457 Comm: kworker/u4:5 Not tainted 4.16.0+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: events_unbound flush_to_ldisc
RIP: 0010:__process_echoes+0x641/0x770 drivers/tty/n_tty.c:733
RSP: 0018:ffff8801af4ff078 EFLAGS: 00000217
RAX: 0000000000000000 RBX: ffffc90003673000 RCX: ffffffff8352d4c2
RDX: 0000000000000006 RSI: 1ffff10039602994 RDI: ffffc9000367515e
RBP: ffff8801af4ff0e0 R08: 1ffff10035e9fdb5 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000625628efd
R13: dffffc0000000000 R14: 0000000000000efe R15: 0000000000001b15
FS: 0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd5bfa4ca8 CR3: 000000000846a005 CR4: 00000000001606f0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
commit_echoes+0x147/0x1b0 drivers/tty/n_tty.c:764
n_tty_receive_char_fast drivers/tty/n_tty.c:1416 [inline]
n_tty_receive_buf_fast drivers/tty/n_tty.c:1576 [inline]
__receive_buf drivers/tty/n_tty.c:1611 [inline]
n_tty_receive_buf_common+0x1156/0x2520 drivers/tty/n_tty.c:1709
n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
tty_ldisc_receive_buf+0xa7/0x180 drivers/tty/tty_buffer.c:456
tty_port_default_receive_buf+0x106/0x160 drivers/tty/tty_port.c:38
receive_buf drivers/tty/tty_buffer.c:475 [inline]
flush_to_ldisc+0x3c4/0x590 drivers/tty/tty_buffer.c:524
process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
worker_thread+0x223/0x1990 kernel/workqueue.c:2247
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Code: 60 12 00 00 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 42 0f b6 04 28 38
d0 7f 08 84 c0 0f 85 21 01 00 00 42 80 bc 33 60 12 00 00 82 <74> 0f e8 48
90 1e fe 4d 8d 74 24 02 e9 58 ff ff ff e8 39 90 1e


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Steven Rostedt

unread,
Apr 2, 2018, 9:40:43 AM4/2/18
to syzbot, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, Peter Zijlstra, Paul E. McKenney
I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

Paul E. McKenney

unread,
Apr 2, 2018, 11:32:42 AM4/2/18
to Steven Rostedt, syzbot, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, Peter Zijlstra
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot <syzbot+2dbc55...@syzkaller.appspotmail.com> wrote:
>
> > Hello,
> >
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +0000)
> > Linux 4.16
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> >
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
> > option "g �;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> > Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3 D20944 10803 4492 0x80000002
> > Call Trace:
> > context_switch kernel/sched/core.c:2862 [inline]
> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> > schedule+0xf5/0x430 kernel/sched/core.c:3499
> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> > do_wait_for_common kernel/sched/completion.c:86 [inline]
> > __wait_for_common kernel/sched/completion.c:107 [inline]
> > wait_for_common kernel/sched/completion.c:118 [inline]
> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...
... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.
And the above is another good place to look.

Thanx, Paul

Dmitry Vyukov

unread,
Apr 2, 2018, 12:04:57 PM4/2/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180402153332.GM3948%40linux.vnet.ibm.com.
> For more options, visit https://groups.google.com/d/optout.

Paul E. McKenney

unread,
Apr 2, 2018, 12:20:48 PM4/2/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints. Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

Thanx, Paul

Dmitry Vyukov

unread,
Apr 2, 2018, 12:32:25 PM4/2/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, we need
to produce the right report first.
I am thinking maybe setting:
- rcu stalls at 1.5 minutes
- workqueue stalls at 2 minutes
- task hungs at 2.5 minutes
- and no output whatsoever at 3 minutes
Do I miss anything? I think at least spinlocks. Should they go before
or after rcu?

This will require fixing task hung. Have not yet looked at workqueue detector.
Does at least RCU respect the given timeout more or less precisely?

Paul E. McKenney

unread,
Apr 2, 2018, 12:38:24 PM4/2/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
That is what I know of, but the Linux kernel being what it is, there is
probably something more out there. If not now, in a few months. The
RCU CPU stall timeout can be set on the kernel-boot command line, but
you probably already knew that.

Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
was 1.5 -seconds-. ;-)

> This will require fixing task hung. Have not yet looked at workqueue detector.
> Does at least RCU respect the given timeout more or less precisely?

Assuming that there is at least one CPU capable of taking scheduling-clock
interrupts, it should respect the timeout to within a few jiffies.

Thanx, Paul

Dmitry Vyukov

unread,
Apr 2, 2018, 1:12:12 PM4/2/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
Well, it's all based solely on a large number of patches and stopgaps.
If we fix main problems for today, it's already good.


> Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> was 1.5 -seconds-. ;-)

Have you tried to instrument every basic block with a function call to
collect coverage, check every damn memory access for validity, enable
all thinkable and unthinkable debug configs and put the insanest load
one can imagine from a swarm of parallel threads? It makes things a
bit slower ;)


>> This will require fixing task hung. Have not yet looked at workqueue detector.
>> Does at least RCU respect the given timeout more or less precisely?
>
> Assuming that there is at least one CPU capable of taking scheduling-clock
> interrupts, it should respect the timeout to within a few jiffies.

This is good!

Paul E. McKenney

unread,
Apr 2, 2018, 1:22:36 PM4/2/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
Fair enough!

> > Just for comparison, back in DYNIX/ptx days the RCU CPU stall timeout
> > was 1.5 -seconds-. ;-)
>
> Have you tried to instrument every basic block with a function call to
> collect coverage, check every damn memory access for validity, enable
> all thinkable and unthinkable debug configs and put the insanest load
> one can imagine from a swarm of parallel threads? It makes things a
> bit slower ;)

Given that we wouldn't have had enough CPU or memory to accommodate
all of that back in DYNIX/ptx days, I am forced to answer "no". ;-)

> >> This will require fixing task hung. Have not yet looked at workqueue detector.
> >> Does at least RCU respect the given timeout more or less precisely?
> >
> > Assuming that there is at least one CPU capable of taking scheduling-clock
> > interrupts, it should respect the timeout to within a few jiffies.
>
> This is good!

;-)

Dmitry Vyukov

unread,
Apr 9, 2018, 8:54:42 AM4/9/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
Hi Paul,

Speaking of stalls and rcu, we are seeing lots of crashes that go like this:

INFO: rcu_sched self-detected stall on CPU[ 404.992530] INFO:
rcu_sched detected stalls on CPUs/tasks:
INFO: rcu_sched self-detected stall on CPU[ 454.347448] INFO:
rcu_sched detected stalls on CPUs/tasks:
INFO: rcu_sched self-detected stall on CPU[ 396.073634] INFO:
rcu_sched detected stalls on CPUs/tasks:

or like this:

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
softirq=57641/57641 fqs=31151
0-....: (125000 ticks this GP) idle=0ba/1/4611686018427387906
softirq=57641/57641 fqs=31151
(t=125002 jiffies g=31656 c=31655 q=910)

INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
softirq=65194/65194 fqs=31231
0-....: (125000 ticks this GP) idle=49a/1/4611686018427387906
softirq=65194/65194 fqs=31231
(t=125002 jiffies g=34421 c=34420 q=1119)
(detected by 1, t=125002 jiffies, g=34421, c=34420, q=1119)


and then there is an unintelligible mess of 2 reports. Such crashes go
to trash bin, because we can't even say which function hanged. It
seems that in all cases 2 different rcu stall detection facilities
race with each other. Is it possible to make them not race?

Paul E. McKenney

unread,
Apr 9, 2018, 12:19:55 PM4/9/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
How about the following (untested, not for mainline) patch? It suppresses
all but the "main" RCU flavor, which is rcu_sched for !PREEMPT builds and
rcu_preempt otherwise. Either way, this is the RCU flavor corresponding
to synchronize_rcu(). This works well in the common case where there
is almost always an RCU grace period in flight.

One reason that this patch is not for mainline is that I am working on
merging the RCU-bh, RCU-preempt, and RCU-sched flavors into one thing,
at which point there won't be any races. But that might be a couple
merge windows away from now.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 381b47a68ac6..31f7818f2d63 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1552,7 +1552,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
struct rcu_node *rnp;

if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
- !rcu_gp_in_progress(rsp))
+ !rcu_gp_in_progress(rsp) || rsp != rcu_state_p)
return;
rcu_stall_kick_kthreads(rsp);
j = jiffies;

Dmitry Vyukov

unread,
Apr 9, 2018, 12:28:38 PM4/9/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
But doesn't they both relate to the same rcu flavor? They both say
rcu_sched. I assumed that the difference is "self-detected" vs "on
CPUs/tasks", i.e. on the current CPU vs on other CPUs.

Paul E. McKenney

unread,
Apr 9, 2018, 2:10:07 PM4/9/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkall...@googlegroups.com, Peter Zijlstra, syzkaller
Right you are!

One approach would be to increase the value of RCU_STALL_RAT_DELAY,
which is currently two jiffies to (say) 20 jiffies. This is in
kernel/rcu/tree.h. But this would fail on a sufficiently overloaded
system -- and the failure of the two-jiffy delay is a bit of a surprise,
given interrupts disabled and all that. Are you by any chance loaded
heavily enough to see vCPU preemption?

I could avoid at least some of these timing issues instead using cmpxchg()
on ->jiffies_stall to allow only one CPU in, but leave the non-atomic
update to discourage overly long stall prints from running into the
next one. This is not perfect, either, and is roughly equivalent to
setting RCU_STALL_RAT_DELAY to many second's worth of jiffies, but
avoiding that minute's delay. But it should get rid of the duplication
in almost all cases, though it could allow a stall warning to overlap
with a later stall warning for that same grace period. Which can
already happen anyway. Also, a tens-of-seconds vCPU preemption can
still cause concurrent stall warnings, but if that is happening to you,
the concurrent stall warnings are probably the least of your problems.
Besides, we do need at least one CPU to actually report the stall, which
won't happen if that CPU's vCPU is indefinitely preempted. So there is
only so much I can do about that particular corner case.

So how does the following (untested) patch work for you?

Thanx, Paul

------------------------------------------------------------------------

commit 6a5ab1e68f8636d8823bb5a9aee35fc44c2be866
Author: Paul E. McKenney <pau...@linux.vnet.ibm.com>
Date: Mon Apr 9 11:04:46 2018 -0700

rcu: Exclude near-simultaneous RCU CPU stall warnings

There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU. This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.

This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters). This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.

Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Paul E. McKenney <pau...@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 381b47a68ac6..b7246bcbf633 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1429,8 +1429,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
}
- WRITE_ONCE(rsp->jiffies_stall,
- jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

/*
@@ -1481,6 +1479,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
sched_show_task(current);
}
}
+ /* Rewrite if needed in case of slow consoles. */
+ if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
+ WRITE_ONCE(rsp->jiffies_stall,
+ jiffies + 3 * rcu_jiffies_till_stall_check() + 3);

rcu_check_gp_kthread_starvation(rsp);

@@ -1525,6 +1527,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
rcu_dump_cpu_stacks(rsp);

raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ /* Rewrite if needed in case of slow consoles. */
if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
WRITE_ONCE(rsp->jiffies_stall,
jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
@@ -1548,6 +1551,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
unsigned long gpnum;
unsigned long gps;
unsigned long j;
+ unsigned long jn;
unsigned long js;
struct rcu_node *rnp;

@@ -1586,14 +1590,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
ULONG_CMP_GE(gps, js))
return; /* No stall or GP completed since entering function. */
rnp = rdp->mynode;
+ jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
if (rcu_gp_in_progress(rsp) &&
- (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
+ (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+ cmpxchg(&rsp->jiffies_stall, js, jn) == js) {

/* We haven't checked in, so go dump stack. */
print_cpu_stall(rsp);

} else if (rcu_gp_in_progress(rsp) &&
- ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
+ ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
+ cmpxchg(&rsp->jiffies_stall, js, jn) == js) {

/* They had a few time units to dump stack, so complain. */
print_other_cpu_stall(rsp, gpnum);

Dmitry Vyukov

unread,
Apr 10, 2018, 7:13:35 AM4/10/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
Looks good to me.

We run on VMs, so we can well have vCPU preemption.

Paul E. McKenney

unread,
Apr 10, 2018, 1:01:09 PM4/10/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
Very good! Please do get me a Tested-by when you get to that point.

Dmitry Vyukov

unread,
Apr 11, 2018, 6:06:49 AM4/11/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
Unfortunately I don't have a good way to test it until it's submitted
upstream. While we are seeing thousands of such instances, they happen
episodically on a farm of test machines. But they are still harmful,
especially when the system tries to reproduce a bug, because it's
mid-way through and thinks it got a hook, but then suddenly boom! it
gets some mess that it can't parse and now it does not know if it's
still the same bug, or maybe a different bug triggered by the same
program, so it does not know how to properly attribute the reproducer.
You can see these cases as they happen here (under report/log links in
the table):
https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
When the patch is submitted, the rate should go down.

Paul E. McKenney

unread,
Apr 11, 2018, 3:38:41 PM4/11/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
OK, I will bite... How do you test fixes to problems that syzkaller finds?

Dmitry Vyukov

unread,
Apr 12, 2018, 5:40:04 AM4/12/18
to Paul McKenney, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
I don't. I can't. No one can test that many fixes.

Normally syzbot provides reproducers for bugs. Then you have 2
choices: (1) test it yourself (if you debugged it, you probably
already have everything setup for this), or (2) ask syzbot to test the
patch on this particular reproducer.
Some bugs don't have reproducers. Then you either localize the bug and
write a test, or go with the old good "it must be correct, right?".
Even for the second case, syzbot will notify if the bug happens again
after the fix is landed, or it's silent, then presumably the fix
indeed fixed the bug.

Now, this is not a syzbot bug (syzbot reports bugs itself from own
email address). This is more like you looked at somebody else dmsg and
like "oh, this looks bad, let me copy-paste and report it".
So can also go with the old good "it must be correct, right?" and
assess how well it goes after few weeks when it reaches syzbot, or
someone needs to write a test for rcu.

This could have been handled with some kind of "cluster-wide" test,
but I don't see how it is feasible. See this for details:
https://groups.google.com/d/msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ
Especially the part that someone will need to go through and triage
hundreds of crashes and assess that they are not related to the new
patch, and do something with then afterwards.

Paul E. McKenney

unread,
Apr 12, 2018, 11:06:20 AM4/12/18
to Dmitry Vyukov, Steven Rostedt, syzbot, LKML, Ingo Molnar, syzkaller-bugs, Peter Zijlstra, syzkaller
Fair enough, and apologies for the hassle. I don't expect that the
patch will be controversial, so it should go into the next merge
window.

Thanx, Paul

Reply all
Reply to author
Forward
0 new messages