INFO: rcu detected stall in sys_exit_group

118 views
Skip to first unread message

syzbot

unread,
Sep 18, 2019, 7:19:08 AM9/18/19
to aarc...@redhat.com, ak...@linux-foundation.org, ava...@gmail.com, chri...@brauner.io, dbu...@suse.de, linux-...@vger.kernel.org, ol...@redhat.com, prs...@codeaurora.org, syzkall...@googlegroups.com, tg...@linutronix.de, t...@kernel.org
Hello,

syzbot found the following crash on:

HEAD commit: a7f89616 Merge branch 'for-5.3-fixes' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15c33079600000
kernel config: https://syzkaller.appspot.com/x/.config?x=861a6f31647968de
dashboard link: https://syzkaller.appspot.com/bug?extid=18379f2a19bc62c12565
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1066bb85600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13e9f75e600000

Bisection is inconclusive: the bug happens on the oldest tested release.

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=154d4969600000
final crash: https://syzkaller.appspot.com/x/report.txt?x=174d4969600000
console output: https://syzkaller.appspot.com/x/log.txt?x=134d4969600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+18379f...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (10499 ticks this GP) idle=63a/1/0x4000000000000002
softirq=10978/10978 fqs=0
(t=10501 jiffies g=10601 q=227)
rcu: rcu_preempt kthread starved for 10502 jiffies! g10601 f0x0
RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt I29040 10 2 0x80004000
Call Trace:
context_switch kernel/sched/core.c:3254 [inline]
__schedule+0x755/0x1580 kernel/sched/core.c:3880
schedule+0xd9/0x260 kernel/sched/core.c:3947
schedule_timeout+0x486/0xc50 kernel/time/timer.c:1807
rcu_gp_fqs_loop kernel/rcu/tree.c:1611 [inline]
rcu_gp_kthread+0x9b2/0x18c0 kernel/rcu/tree.c:1768
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.403
msecs
NMI backtrace for cpu 0
CPU: 0 PID: 10344 Comm: syz-executor933 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:hhf_dequeue+0x552/0xa20 net/sched/sch_hhf.c:436
Code: ff ff 45 31 ff e9 b0 02 00 00 e8 49 05 ac fb 48 8d 43 f0 41 be 01 00
00 00 49 8d 95 c0 02 00 00 48 39 c2 74 34 e8 2e 05 ac fb <49> 8d bd ac 03
00 00 48 89 f8 48 c1 e8 03 42 0f b6 14 20 48 89 f8
RSP: 0018:ffff8880ae809038 EFLAGS: 00000206
RAX: ffff8880a3970100 RBX: ffff8880a8b1d538 RCX: ffffffff85c66b39
RDX: 0000000000000100 RSI: ffffffff85c66fd2 RDI: 0000000000000005
RBP: ffff8880ae809088 R08: ffff8880a3970100 R09: 0000000000000000
R10: fffffbfff134afaf R11: ffff8880a3970100 R12: dffffc0000000000
R13: ffff8880a8b1d240 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006dab10 CR3: 0000000008c6d000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
dequeue_skb net/sched/sch_generic.c:258 [inline]
qdisc_restart net/sched/sch_generic.c:361 [inline]
__qdisc_run+0x1e7/0x19d0 net/sched/sch_generic.c:379
__dev_xmit_skb net/core/dev.c:3533 [inline]
__dev_queue_xmit+0x16f1/0x3650 net/core/dev.c:3838
dev_queue_xmit+0x18/0x20 net/core/dev.c:3902
br_dev_queue_push_xmit+0x3f3/0x5c0 net/bridge/br_forward.c:52
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
br_forward_finish+0xfa/0x400 net/bridge/br_forward.c:65
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
__br_forward+0x641/0xb00 net/bridge/br_forward.c:109
deliver_clone+0x61/0xc0 net/bridge/br_forward.c:125
maybe_deliver+0x2c7/0x390 net/bridge/br_forward.c:181
br_flood+0x13a/0x3d0 net/bridge/br_forward.c:223
br_dev_xmit+0x98c/0x15a0 net/bridge/br_device.c:100
__netdev_start_xmit include/linux/netdevice.h:4406 [inline]
netdev_start_xmit include/linux/netdevice.h:4420 [inline]
xmit_one net/core/dev.c:3280 [inline]
dev_hard_start_xmit+0x1a3/0x9c0 net/core/dev.c:3296
__dev_queue_xmit+0x2b15/0x3650 net/core/dev.c:3869
dev_queue_xmit+0x18/0x20 net/core/dev.c:3902
neigh_hh_output include/net/neighbour.h:500 [inline]
neigh_output include/net/neighbour.h:509 [inline]
ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290
ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_output+0x21f/0x640 net/ipv4/ip_output.c:432
dst_output include/net/dst.h:436 [inline]
ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125
igmpv3_sendpack+0x1b5/0x2c0 net/ipv4/igmp.c:426
igmpv3_send_cr net/ipv4/igmp.c:721 [inline]
igmp_ifc_timer_expire+0x687/0xa00 net/ipv4/igmp.c:809
call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x697/0x17a0 kernel/time/timer.c:1698
__do_softirq+0x262/0x98c kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x19b/0x1e0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:537 [inline]
smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
</IRQ>
RIP: 0010:__raw_write_unlock_irq include/linux/rwlock_api_smp.h:268 [inline]
RIP: 0010:_raw_write_unlock_irq+0x54/0x90 kernel/locking/spinlock.c:343
Code: c0 60 f4 d2 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00
75 33 48 83 3d 05 bf 94 01 00 74 20 fb 66 0f 1f 44 00 00 <bf> 01 00 00 00
e8 62 8c 10 fa 65 8b 05 93 c8 c3 78 85 c0 74 06 41
RSP: 0018:ffff8880a406fd70 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff11a5e8c RBX: ffff88809fbda740 RCX: 1ffffffff134b5ee
RDX: dffffc0000000000 RSI: ffffffff8177f15e RDI: ffffffff873e3538
RBP: ffff8880a406fd78 R08: ffff8880a3970100 R09: fffffbfff134afb0
R10: fffffbfff134afaf R11: ffffffff89a57d7f R12: ffffffff88c090c0
R13: 0000000000000011 R14: ffff8880a3970100 R15: 0000000000000000
exit_notify kernel/exit.c:745 [inline]
do_exit+0x13ab/0x2e50 kernel/exit.c:900
do_group_exit+0x135/0x360 kernel/exit.c:983
__do_sys_exit_group kernel/exit.c:994 [inline]
__se_sys_exit_group kernel/exit.c:992 [inline]
__x64_sys_exit_group+0x44/0x50 kernel/exit.c:992
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440f88
Code: Bad RIP value.
RSP: 002b:00007ffe302d02b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440f88
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004c6eb0 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d95e0 R14: 0000000000000000 R15: 0000000000000000
NMI backtrace for cpu 1
CPU: 1 PID: 10345 Comm: syz-executor933 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree_stall.h:254
print_cpu_stall kernel/rcu/tree_stall.h:455 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:529 [inline]
rcu_pending kernel/rcu/tree.c:2736 [inline]
rcu_sched_clock_irq.cold+0x4dd/0xc13 kernel/rcu/tree.c:2183
update_process_times+0x32/0x80 kernel/time/timer.c:1639
tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:167
tick_sched_timer+0x53/0x140 kernel/time/tick-sched.c:1296
__run_hrtimer kernel/time/hrtimer.c:1389 [inline]
__hrtimer_run_queues+0x364/0xe40 kernel/time/hrtimer.c:1451
hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
smp_apic_timer_interrupt+0x160/0x610 arch/x86/kernel/apic/apic.c:1135
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
</IRQ>
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:656 [inline]
RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:84 [inline]
RIP: 0010:native_queued_spin_lock_slowpath+0x132/0x9f0
kernel/locking/qspinlock.c:325
Code: 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 0f 85 37 07 00 00 48
81 c4 98 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f3 90 <e9> 73 ff ff ff
8b 45 98 4c 8d 65 d8 3d 00 01 00 00 0f 84 e5 00 00
RSP: 0018:ffff8880946ef2f8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff8880a8b1d328 RCX: ffffffff81595c17
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880a8b1d328
RBP: ffff8880946ef3b8 R08: 1ffff11015163a65 R09: ffffed1015163a66
R10: ffffed1015163a65 R11: ffff8880a8b1d32b R12: 0000000000000001
R13: 0000000000000003 R14: ffffed1015163a65 R15: 0000000000000001
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:654 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:81 [inline]
do_raw_spin_lock+0x20e/0x2e0 kernel/locking/spinlock_debug.c:113
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
_raw_spin_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:343 [inline]
sch_tree_lock include/net/sch_generic.h:570 [inline]
hhf_change+0x2e3/0xad0 net/sched/sch_hhf.c:537
qdisc_change net/sched/sch_api.c:1321 [inline]
tc_modify_qdisc+0xfcf/0x1c50 net/sched/sch_api.c:1623
rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x803/0x920 net/socket.c:2311
__sys_sendmsg+0x105/0x1d0 net/socket.c:2356
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg net/socket.c:2363 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x442399
Code: e8 9c 07 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffe302d02f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442399
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 000000000003c361 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004032f0 R14: 0000000000000000 R15: 0000000000000000


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Dmitry Vyukov

unread,
Sep 18, 2019, 11:05:41 AM9/18/19
to syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Paul E. McKenney, Tetsuo Handa
This should have been parsed as "INFO: rcu detected stall in
igmp_ifc_timer_expire" which was already reported:
https://syzkaller.appspot.com/bug?id=330ce4f7626354cc6444c457c9a5e82d8a8c5055
So let's do:
#syz fix: sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero

+Paul, Tetsuo

However, I cannot make sense of this kernel output (nor syzbot).
Here is full console output:
https://syzkaller.appspot.com/x/log.txt?x=15c33079600000

This is "self-detected stall" which was detected in rcu_gp_kthread (?
usually these are detected in interrupts, no?)
and then the kthread runs on CPU 1 on top of the igmp_ifc_timer_expire
handler running in an interrupt (how can a kthread run on the
interrupt stack?)
and then it does NMI traceback for CPU 0, but that runs on CPU 1
(shouldn't NMI traceback run on CPU 0 too?)

Any ideas what exactly happened here and how one can make sense of
such output to attribute it to some kernel activity that caused the
stall?
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000674b3d0592d2015b%40google.com.

Paul E. McKenney

unread,
Sep 19, 2019, 1:07:55 PM9/19/19
to Dmitry Vyukov, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
On Wed, Sep 18, 2019 at 05:05:26PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 18, 2019 at 1:19 PM syzbot
> <syzbot+18379f...@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: a7f89616 Merge branch 'for-5.3-fixes' of git://git.kernel...
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15c33079600000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=861a6f31647968de
> > dashboard link: https://syzkaller.appspot.com/bug?extid=18379f2a19bc62c12565
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1066bb85600000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13e9f75e600000
> >
> > Bisection is inconclusive: the bug happens on the oldest tested release.
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=154d4969600000
> > final crash: https://syzkaller.appspot.com/x/report.txt?x=174d4969600000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=134d4969600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+18379f...@syzkaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt self-detected stall on CPU
> > rcu: 1-...!: (10499 ticks this GP) idle=63a/1/0x4000000000000002
> > softirq=10978/10978 fqs=0
> > (t=10501 jiffies g=10601 q=227)
> > rcu: rcu_preempt kthread starved for 10502 jiffies! g10601 f0x0

The key point is the above line: RCU's grace-period kthread has not
had a chance to run for 10,502 jiffies.

> > RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0

And it is sleeping normally. The "RCU_GP_WAIT_FQS(5)" says that it was
doing a fixed-time wait, which is normally for three jiffies, but never
10,000 of them. Note that this kthread last ran on CPU 0.

> > rcu: RCU grace-period kthread stack dump:
> > rcu_preempt I29040 10 2 0x80004000
> > Call Trace:
> > context_switch kernel/sched/core.c:3254 [inline]
> > __schedule+0x755/0x1580 kernel/sched/core.c:3880
> > schedule+0xd9/0x260 kernel/sched/core.c:3947
> > schedule_timeout+0x486/0xc50 kernel/time/timer.c:1807
> > rcu_gp_fqs_loop kernel/rcu/tree.c:1611 [inline]
> > rcu_gp_kthread+0x9b2/0x18c0 kernel/rcu/tree.c:1768
> > kthread+0x361/0x430 kernel/kthread.c:255
> > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

This stack trace is expected: The RCU grace-period kthread is doing
a fixed-time wait. Note that this is a stack trace of this kthread,
not necessarily of the CPU it was last running on.

> > Sending NMI from CPU 1 to CPUs 0:
> > INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.403
> > msecs

This is surprising. Is this a guest OS? If so, is the vCPU for CPU 0
stuck somehow? Did it get a SIGSTOP or some such?

Clearly, if CPU 0 isn't running, RCU's grace-period kthread, which was
last seen on CPU 0, might not be doing so well.

OK, but we eventually did get a stack trace:
So this stack trace leads me to ask if networking has been hogging
the CPU for the past 10,000 jiffies. Perhaps there is a corner case
that is not being addressed by the code that is supposed to move
long-term processing from softirq to ksoftirqd? Or perhaps more
likely, the networking code isn't exiting its softirq handler, thus
preventing the timer softirq handler from running, thus preventing
RCU's grace-period kthread's sleep from ever ending.

Is this consistent with what you are seeing?

> This should have been parsed as "INFO: rcu detected stall in
> igmp_ifc_timer_expire" which was already reported:
> https://syzkaller.appspot.com/bug?id=330ce4f7626354cc6444c457c9a5e82d8a8c5055
> So let's do:
> #syz fix: sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero
>
> +Paul, Tetsuo
>
> However, I cannot make sense of this kernel output (nor syzbot).
> Here is full console output:
> https://syzkaller.appspot.com/x/log.txt?x=15c33079600000

I will bite... What are all the "executing program" outputs?

> This is "self-detected stall" which was detected in rcu_gp_kthread (?
> usually these are detected in interrupts, no?)

They are detected by the scheduling-clock interrupt handler, but
stalls can be generated both at process and at interrupt levels.

> and then the kthread runs on CPU 1 on top of the igmp_ifc_timer_expire
> handler running in an interrupt (how can a kthread run on the
> interrupt stack?)
> and then it does NMI traceback for CPU 0, but that runs on CPU 1
> (shouldn't NMI traceback run on CPU 0 too?)
>
> Any ideas what exactly happened here and how one can make sense of
> such output to attribute it to some kernel activity that caused the
> stall?

My best guess based on what I am seeing is that a softirq handler
is running for about ten seconds, which is too long.

Do you have means for tracking softirq-handler durations?

Thanx, Paul

Dmitry Vyukov

unread,
Sep 19, 2019, 1:39:17 PM9/19/19
to pau...@kernel.org, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
The "executing program" are produced by userspace. Kernel and
userspace outputs are multiplexed later to restore order of events.
Kernel output is prefixed with "[ 351.648071][ C1]".

Yes, the networking is stuck dead in an infinite loop. That's a known
bug, already fixed by:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4d6ec6dac07f263f06d847d6f732d6855522845

But I am interested why this report looks not like all other rcu
stalls and who should be the parsing logic to conclude that the stall
happened in igmp_ifc_timer_expire?
Why it's detected by the kthread? How it runs on top of an interrupt?
And why one cpu tracebacks another one?
As of now syzkaller parsed it as "in sys_exit_group", which lead to
the creation of a new bug and another email, which is suboptimal.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20190919170750.GO30224%40paulmck-ThinkPad-P72.

Paul E. McKenney

unread,
Sep 19, 2019, 4:12:05 PM9/19/19
to Dmitry Vyukov, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
Because the infinite loop happened to prevent the RCU grace-period
kthread from making progress. The odds of that are a bit low, so
most of the stall warnings would look different.

I do expect to be revisiting the RCU CPU stall warning logic at some
point. Or are you asking how you could modify a script that figures
this out?

> Why it's detected by the kthread?

It is not detected by the kthread. The kthread has been interrupted
indefinitely by the softirq processing, so is not in a position to
detect much of anything. It is instead detected as usual by the
scheduling-clock interrupt.

What is different is that the scheduling-clock interrupt detected
that the grace-period kthread had not been running for an extended
period of time. It can detect this because the grace period kthread
stores timestamps before each activity it undertakes.

> How it runs on top of an interrupt?

It is not running on top of an interrupt. Its stack was dumped
separately.

> And why one cpu tracebacks another one?

The usual reason is because neither CPU's quiescent state was reported
to the RCU core, so the stall-warning code dumped both stacks.

My turn. Why dothere appear to be multiple levels of interrupt, as
in one interrupt interrupting another interrupt?

> As of now syzkaller parsed it as "in sys_exit_group", which lead to
> the creation of a new bug and another email, which is suboptimal.

I suggest having syzkaller look for something like "rcu: rcu_[a-z]*
kthread starved for". I don't have advice for handling the apparent
multiple levels of interrupt.

Thanx, Paul

Dmitry Vyukov

unread,
Sep 20, 2019, 3:50:38 AM9/20/19
to pau...@kernel.org, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
But isn't it happens in all cases of infinite loops? Yet most other
reports look different and don't mention kthread. What's different
here?
Or is it another case of flaky reporting? We had that other case where
self-stall or remote-stall had the same timeout, so were reported
non-deterministically, or worse intermixed at the same time.


> I do expect to be revisiting the RCU CPU stall warning logic at some
> point. Or are you asking how you could modify a script that figures
> this out?

Both. Fixing parsing may be easier and will fix all old kernels too.
You know nobody uses kernel HEAD, people use down to 3.18 and maybe
older. You are not going to backport these changes all the way back,
right? :)

> > Why it's detected by the kthread?
>
> It is not detected by the kthread. The kthread has been interrupted
> indefinitely by the softirq processing, so is not in a position to
> detect much of anything. It is instead detected as usual by the
> scheduling-clock interrupt.

I don't see any rcu-liveness-checking interrupt handler in any
tracebacks. Where is it? Is it useful to traceback the ktread in such
case at all?

What we do is the following, we find first apic_timer_interrupt on the
current CPU (presumably that the rcu-health-checking interrupt) and
then take the next "anchor" frame after that. This does not work in
this case, because the rcu-health-checking interrupt in missing.


> What is different is that the scheduling-clock interrupt detected
> that the grace-period kthread had not been running for an extended
> period of time. It can detect this because the grace period kthread
> stores timestamps before each activity it undertakes.
>
> > How it runs on top of an interrupt?
>
> It is not running on top of an interrupt. Its stack was dumped
> separately.

I see. Usually the first stack is the traceback of the current stack.
So I was confused.

> > And why one cpu tracebacks another one?
>
> The usual reason is because neither CPU's quiescent state was reported
> to the RCU core, so the stall-warning code dumped both stacks.

But should the other CPU traceback _itself_? Rather than being traced
back by another CPU?
E.g. see this report:
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L61-L83
Here the overall problem was detected by C2, but then C1 traces back itself.

... however even in that case C0 and C3 are traced by C2:
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L84-L149
I can't understand this...
This makes understanding what happened harder because it's not easy to
exclude things on other CPUs.



>
> My turn. Why dothere appear to be multiple levels of interrupt, as
> in one interrupt interrupting another interrupt?

I don't understand the question.
I see 1 apic_timer_interrupt on CPU 0 and 1 apic_timer_interrupt on CPU 1.

> > As of now syzkaller parsed it as "in sys_exit_group", which lead to
> > the creation of a new bug and another email, which is suboptimal.
>
> I suggest having syzkaller look for something like "rcu: rcu_[a-z]*
> kthread starved for".

And then what do we need to do to detect this as "in igmp_ifc_timer_expire"?
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20190919201200.GP30224%40paulmck-ThinkPad-P72.

Dmitry Vyukov

unread,
Sep 20, 2019, 3:55:05 AM9/20/19
to pau...@kernel.org, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
Another thing in this report that confuses me is this:
The machine has 2 CPUs. The report contains 2 NMI tracebacks, both for
CPU 0 and CPU 1. A CPU generally does not do NMI traceback for itself
(right? it could do just traceback). The question is: what CPU the
code that prints this runs on then?

Dmitry Vyukov

unread,
Sep 20, 2019, 4:03:27 AM9/20/19
to pau...@kernel.org, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
Oh, I think I start understanding what happened here.
The rcu-health-checking interrupt which prints all of this is
rcu_sched_clock_irq, which is at the very bottom of the report (why?
the current traceback is at the top in all other cases). It also
tracebacks itself with "NMI backtrace for cpu 1" message (why? why not
the normal "Call Trace:"?).
Then, CPU 1 is in spinlock lock at hhf_change (presumably the timer
handler on CPU 0 holds the lock).
So this should have been detected as "spinlock lockup in hhf_change".

Dmitry Vyukov

unread,
Sep 20, 2019, 4:21:36 AM9/20/19
to pau...@kernel.org, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
We have CONFIG_DEBUG_SPINLOCK=y

The spinlock lockups now seem to be detected by normal watchdog:
/*
* We are now relying on the NMI watchdog to detect lockup instead of doing
* the detection here with an unfair lock which can cause problem of its own.
*/
void do_raw_spin_lock(raw_spinlock_t *lock)

That's kernel.watchdog_thresh = 55 that we have.

But then softlock period is actually twice as long (110):

static int get_softlockup_thresh(void)
{
return watchdog_thresh * 2;
}

But for rcu we have 100:

CONFIG_RCU_CPU_STALL_TIMEOUT=100

So our current assumption is that rcu should have been detected this
in some parsable way.

Presumably if the rcu kthread is starving, rcu overall is starving
from all other sides as well, right? So couldn't it produce the normal
stall message?

Wait, why is this just 2 seconds?

static void rcu_check_gp_kthread_starvation(void)
{
struct task_struct *gpk = rcu_state.gp_kthread;
unsigned long j;
j = jiffies - READ_ONCE(rcu_state.gp_activity);
if (j > 2 * HZ) {
pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d)
->state=%#lx ->cpu=%d\n",

Is it? 2 seconds is too short period for any checks on heavily debug
kernel (as heavily as you can possibly imaging with tons of compiler
instrumentation), running under insanely heavy stress workload.
We bump all timeouts towards 100-300 seconds. But this one does not
seem to be tunable.
Or I am missing something? It says "starved for 10502 jiffies" and we
have CONFIG_HZ=100, so that's 10.5 seconds...

Tetsuo Handa

unread,
Sep 20, 2019, 6:22:31 AM9/20/19
to Dmitry Vyukov, pau...@kernel.org, Petr Mladek, Sergey Senozhatsky, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo
Calling printk() people.

On 2019/09/20 16:50, Dmitry Vyukov wrote:
>>> How it runs on top of an interrupt?
>>
>> It is not running on top of an interrupt. Its stack was dumped
>> separately.
>
> I see. Usually the first stack is the traceback of the current stack.
> So I was confused.
>
>>> And why one cpu tracebacks another one?
>>
>> The usual reason is because neither CPU's quiescent state was reported
>> to the RCU core, so the stall-warning code dumped both stacks.
>
> But should the other CPU traceback _itself_? Rather than being traced
> back by another CPU?
> E.g. see this report:
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L61-L83
> Here the overall problem was detected by C2, but then C1 traces back itself.
>
> ... however even in that case C0 and C3 are traced by C2:
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L84-L149
> I can't understand this...
> This makes understanding what happened harder because it's not easy to
> exclude things on other CPUs.

I think this should be
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L84-L172
than #L84-L149 .

Is the reason these lines have "[ C2]" is that these lines were flushed (printk_caller_id()
was called) from log_output() from vprintk_store() from vprintk_emit() from vprintk_deferred()
from printk_deferred() from printk_safe_flush_line() from __printk_safe_flush() from
printk_safe_flush() from printk_safe_flush_on_panic() from panic() ?

Paul E. McKenney

unread,
Sep 20, 2019, 9:36:21 AM9/20/19
to Dmitry Vyukov, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
The difference is that in this case the infinite loop just happened
to interrupt the actual execution of the specific kthread named
rcu_preempt.

> Or is it another case of flaky reporting? We had that other case where
> self-stall or remote-stall had the same timeout, so were reported
> non-deterministically, or worse intermixed at the same time.

The bug manifested in a different way, so it looks different.

> > I do expect to be revisiting the RCU CPU stall warning logic at some
> > point. Or are you asking how you could modify a script that figures
> > this out?
>
> Both. Fixing parsing may be easier and will fix all old kernels too.
> You know nobody uses kernel HEAD, people use down to 3.18 and maybe
> older. You are not going to backport these changes all the way back,
> right? :)

Not for something like this, no, backporting would introduce more
risk than benefit.

But time does pass and people do eventually move forward, for varying
values of "eventually". ;-)

> > > Why it's detected by the kthread?
> >
> > It is not detected by the kthread. The kthread has been interrupted
> > indefinitely by the softirq processing, so is not in a position to
> > detect much of anything. It is instead detected as usual by the
> > scheduling-clock interrupt.
>
> I don't see any rcu-liveness-checking interrupt handler in any
> tracebacks. Where is it? Is it useful to traceback the ktread in such
> case at all?

It is the scheduling-clock interrupt, which invokes an RCU hook that
does the checking. AKA apic_timer_interrupt for many x86 systems.

> What we do is the following, we find first apic_timer_interrupt on the
> current CPU (presumably that the rcu-health-checking interrupt) and
> then take the next "anchor" frame after that. This does not work in
> this case, because the rcu-health-checking interrupt in missing.

In current kernels, you would be looking for rcu_sched_clock_irq(), which
calls rcu_pending() which is often inlined which calls check_cpu_stall(),
which can also be inlined. In older kernels, rcu_sched_clock_irq() is
instead called rcu_check_callbacks() and you might see __rcu_pending()
instead of rcu_pending().

And yes, that stack looked truncated.

> > What is different is that the scheduling-clock interrupt detected
> > that the grace-period kthread had not been running for an extended
> > period of time. It can detect this because the grace period kthread
> > stores timestamps before each activity it undertakes.
> >
> > > How it runs on top of an interrupt?
> >
> > It is not running on top of an interrupt. Its stack was dumped
> > separately.
>
> I see. Usually the first stack is the traceback of the current stack.
> So I was confused.

Yes, when you see the "starved" message, the next stack is that of
the RCU grace-period kthread.

> > > And why one cpu tracebacks another one?
> >
> > The usual reason is because neither CPU's quiescent state was reported
> > to the RCU core, so the stall-warning code dumped both stacks.
>
> But should the other CPU traceback _itself_? Rather than being traced
> back by another CPU?
> E.g. see this report:
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L61-L83
> Here the overall problem was detected by C2, but then C1 traces back itself.
>
> ... however even in that case C0 and C3 are traced by C2:
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/350#L84-L149
> I can't understand this...
> This makes understanding what happened harder because it's not easy to
> exclude things on other CPUs.

It would be in CPU-number order, normally.

Given that CPU 0's stack dump seems to have been truncated, might
CPU 1's stack have been lost entirely?

But this is something that I should be able to fix, give or take
architecture constraints. In most cases, I should be able to dump the
self-detecting CPU's stack first, then sequence through the others.
Would that help?

> > My turn. Why dothere appear to be multiple levels of interrupt, as
> > in one interrupt interrupting another interrupt?
>
> I don't understand the question.
> I see 1 apic_timer_interrupt on CPU 0 and 1 apic_timer_interrupt on CPU 1.

Ah, I was confused by the "Code: Bad RIP value." and subsequent register
dump, and failed to see the immediately following "NMI backtrace for
cpu 1":

[ 351.815741][ C1] NMI backtrace for cpu 1
[ 352.590657][ C1] CPU: 1 PID: 10345 Comm: syz-executor933 Not tainted 5.3.0-rc8+ #0
[ 352.598782][ C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 352.609011][ C1] Call Trace:
[ 352.612468][ C1] <IRQ>
[ 352.615577][ C1] dump_stack+0x172/0x1f0
[ 352.620089][ C1] nmi_cpu_backtrace.cold+0x70/0xb2
[ 352.625376][ C1] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 352.631787][ C1] ? lapic_can_unplug_cpu.cold+0x45/0x45
[ 352.637425][ C1] nmi_trigger_cpumask_backtrace+0x23b/0x28b
[ 352.643685][ C1] arch_trigger_cpumask_backtrace+0x14/0x20
[ 352.649832][ C1] rcu_dump_cpu_stacks+0x183/0x1cf
[ 352.654949][ C1] rcu_sched_clock_irq.cold+0x4dd/0xc13

The above is the scheduling-clock hook for RCU.

It looks like rcu_pending() and check_cpu_stall() have been inlined
or tail-optimized or whatever.

[ 352.660584][ C1] ? raise_softirq+0x138/0x340
[ 352.665357][ C1] update_process_times+0x32/0x80
[ 352.670564][ C1] tick_sched_handle+0xa2/0x190
[ 352.675822][ C1] tick_sched_timer+0x53/0x140
[ 352.680592][ C1] __hrtimer_run_queues+0x364/0xe40
[ 352.686112][ C1] ? tick_sched_do_timer+0x1b0/0x1b0
[ 352.691403][ C1] ? hrtimer_start_range_ns+0xcb0/0xcb0
[ 352.697161][ C1] ? __sanitizer_cov_trace_cmp4+0x16/0x20
[ 352.704973][ C1] ? ktime_get_update_offsets_now+0x2d3/0x440
[ 352.711269][ C1] hrtimer_interrupt+0x314/0x770
[ 352.716225][ C1] smp_apic_timer_interrupt+0x160/0x610
[ 352.721907][ C1] apic_timer_interrupt+0xf/0x20
[ 352.727454][ C1] </IRQ>
[ 352.730759][ C1] RIP: 0010:native_queued_spin_lock_slowpath+0x132/0x9f0
[ 352.737876][ C1] Code: 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 0f 85 37 07 00 00 48 81 c4 98 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f3 90 <e9> 73 ff ff ff 8b 45 98 4c 8d 65 d8 3d 00 01 00 00 0f 84 e5 00 00
[ 352.757777][ C1] RSP: 0018:ffff8880946ef2f8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 352.766276][ C1] RAX: 0000000000000000 RBX: ffff8880a8b1d328 RCX: ffffffff81595c17
[ 352.774247][ C1] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880a8b1d328
[ 352.782332][ C1] RBP: ffff8880946ef3b8 R08: 1ffff11015163a65 R09: ffffed1015163a66
[ 352.790301][ C1] R10: ffffed1015163a65 R11: ffff8880a8b1d32b R12: 0000000000000001
[ 352.798893][ C1] R13: 0000000000000003 R14: ffffed1015163a65 R15: 0000000000000001
[ 352.807058][ C1] ? native_queued_spin_lock_slowpath+0xb7/0x9f0
[ 352.813663][ C1] ? native_queued_spin_lock_slowpath+0xb7/0x9f0
[ 352.820078][ C1] ? __pv_queued_spin_lock_slowpath+0xd10/0xd10
[ 352.826443][ C1] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 352.832775][ C1] ? __nla_validate_parse+0x2d0/0x1ed0
[ 352.838239][ C1] ? mark_held_locks+0xf0/0xf0
[ 352.843008][ C1] do_raw_spin_lock+0x20e/0x2e0
[ 352.847950][ C1] ? rwlock_bug.part.0+0x90/0x90
[ 352.852887][ C1] ? lock_acquire+0x190/0x410
[ 352.857741][ C1] ? hhf_change+0x2e3/0xad0

(And so on.)

> > > As of now syzkaller parsed it as "in sys_exit_group", which lead to
> > > the creation of a new bug and another email, which is suboptimal.
> >
> > I suggest having syzkaller look for something like "rcu: rcu_[a-z]*
> > kthread starved for".
>
> And then what do we need to do to detect this as "in igmp_ifc_timer_expire"?

Skip the first stack dump in this case, anyway.

Then see the igmp_ifc_timer_expire() in the CPU 1 stack dump.

But of course, bugs can do whatever they want, so given any bug-detection
strategy, there will be a bug that defeats that strategy. But you knew
that already. ;-)

Thanx, Paul

Paul E. McKenney

unread,
Sep 20, 2019, 9:50:56 AM9/20/19
to Dmitry Vyukov, syzbot, Andrea Arcangeli, Andrew Morton, ava...@gmail.com, Christian Brauner, dbu...@suse.de, LKML, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, Thomas Gleixner, Tejun Heo, Tetsuo Handa
Because both CPUs were blocking the grace period. CPU 0 was doing so
directly due to the infinite loop. Because that infinite loop just so
happened to prevent RCU's grace-period kthread from making progress,
CPU 1's idle periods went undetected, so CPU 0 was also causing CPU 1
to indirectly block the grace period. When multiple CPUs are blocking
the grace period, RCU dumps them in CPU-number order.

This could be changed, as noted in the earlier email. Would seeing
CPU 1's trace first have helped in this case, given that CPU 0 just
happened to be the root cause of the stall?

> > It also
> > tracebacks itself with "NMI backtrace for cpu 1" message (why? why not
> > the normal "Call Trace:"?).
> > Then, CPU 1 is in spinlock lock at hhf_change (presumably the timer
> > handler on CPU 0 holds the lock).
> > So this should have been detected as "spinlock lockup in hhf_change".
>
> We have CONFIG_DEBUG_SPINLOCK=y
>
> The spinlock lockups now seem to be detected by normal watchdog:
> /*
> * We are now relying on the NMI watchdog to detect lockup instead of doing
> * the detection here with an unfair lock which can cause problem of its own.
> */
> void do_raw_spin_lock(raw_spinlock_t *lock)
>
> That's kernel.watchdog_thresh = 55 that we have.
>
> But then softlock period is actually twice as long (110):
>
> static int get_softlockup_thresh(void)
> {
> return watchdog_thresh * 2;
> }
>
> But for rcu we have 100:
>
> CONFIG_RCU_CPU_STALL_TIMEOUT=100
>
> So our current assumption is that rcu should have been detected this
> in some parsable way.
>
> Presumably if the rcu kthread is starving, rcu overall is starving
> from all other sides as well, right? So couldn't it produce the normal
> stall message?

Yes. Had the rcu_preempt kthread have been able to run, it would
have detected CPU 1's quiescent states, and the resulting RCU CPU
stall warning would have had only CPU 0's stack. Plus the stall
would have been labeled as being due to CPU 0, detected by CPU 1.

> Wait, why is this just 2 seconds?
>
> static void rcu_check_gp_kthread_starvation(void)
> {
> struct task_struct *gpk = rcu_state.gp_kthread;
> unsigned long j;
> j = jiffies - READ_ONCE(rcu_state.gp_activity);
> if (j > 2 * HZ) {
> pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d)
> ->state=%#lx ->cpu=%d\n",
>
> Is it? 2 seconds is too short period for any checks on heavily debug
> kernel (as heavily as you can possibly imaging with tons of compiler
> instrumentation), running under insanely heavy stress workload.
> We bump all timeouts towards 100-300 seconds. But this one does not
> seem to be tunable.

In theory, I could make it tunable. But the rcu_preempt kthread was
trying to sleep for something like three milliseconds, and ...

> Or I am missing something? It says "starved for 10502 jiffies" and we
> have CONFIG_HZ=100, so that's 10.5 seconds...

Yes, and 10.5 seconds is greater than two seconds. The above code gets
called when the stall occurs.

... So you have to have both a stall -and- rcu_preempt not having had
a chance to run for more than two seconds.

I suppose I could make the check be (say) 10% of the stall-warning
timeout.

Except that in this case, it was a legitimate complaint, and it did help
me figure out what was going on.

So do you really want me to scale this check? Is it normal for your
systems to be loaded so heavily that a three-millisecond wait turns into
two seconds? Just at the time that an RCU CPU stall warning takes
place? At the moment, I am not convinced that such scaling would be
a good idea.

Thanx, Paul

Petr Mladek

unread,
Sep 24, 2019, 10:02:44 AM9/24/19
to Tetsuo Handa, Sergey Senozhatsky, Dmitry Vyukov, pau...@kernel.org, Christian Brauner, prs...@codeaurora.org, ava...@gmail.com, syzkaller-bugs, Tejun Heo, Thomas Gleixner, Andrew Morton, Andrea Arcangeli, Oleg Nesterov, dbu...@suse.de, syzbot, LKML, John Ogness
It seems to be the case. CPU2 is clearly flushing per-CPU buffers
from NMI context, for example:

[ 1098.703114][ C2] NMI backtrace for cpu 0
[...]
[ 1098.703295][ C2] NMI backtrace for cpu 3

A solution would be to store all these metadata (timestamp, caller
info) already into the per-CPU buffers. I think that it would be
doable.

But much better solution is a lockless ring buffer. John Ogness is
working hard on it. The plan is to have it ready for 5.5 or 5.6.
I would prefer to concentrate on this solution for the moment.

Best Regards,
Petr
Reply all
Reply to author
Forward
0 new messages