upstream boot error: BUG: soft lockup in __do_softirq

23 views
Skip to first unread message

syzbot

unread,
Jul 31, 2020, 2:44:22 AM7/31/20
to b...@alien8.de, h...@zytor.com, linux-...@vger.kernel.org, lu...@kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: 92ed3019 Linux 5.8-rc7
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
kernel config: https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
compiler: gcc (GCC) 10.1.0-syz 20200507

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8472ea...@syzkaller.appspotmail.com

hrtimer: interrupt took 42698779 ns
random: crng init done
watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]
Modules linked in:
hardirqs last enabled at (2780): [<ffffffff88200204>] __do_softirq+0x204/0xa60 kernel/softirq.c:276
hardirqs last disabled at (2781): [<ffffffff87e5b2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649
softirqs last enabled at (2760): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319
softirqs last disabled at (2779): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
CPU: 3 PID: 4749 Comm: grep Not tainted 5.8.0-rc7-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:__do_softirq+0x22f/0xa60 kernel/softirq.c:278
Code: c7 c0 98 e0 b4 89 48 c1 e8 03 42 80 3c 30 00 0f 85 70 07 00 00 48 83 3d 76 de 94 01 00 0f 84 4a 06 00 00 fb 66 0f 1f 44 00 00 <48> c7 44 24 08 c0 90 a0 89 b8 ff ff ff ff 0f bc 04 24 83 c0 01 89
RSP: 0000:ffffc90000598f70 EFLAGS: 00000286
RAX: 1ffffffff1369c13 RBX: ffff8880294683c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff88200204
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff88000c3a
R13: 0000000000000000 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007fe02dd23700(0000) GS:ffff88802d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000f5d1e8 CR3: 000000001efc6000 CR4: 0000000000340ee0
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
</IRQ>
invoke_softirq kernel/softirq.c:387 [inline]
__irq_exit_rcu kernel/softirq.c:417 [inline]
irq_exit_rcu+0x229/0x270 kernel/softirq.c:429
sysvec_apic_timer_interrupt+0x54/0x120 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
Code: ba 63 03 00 00 e8 f5 bf 00 00 0f 1f 44 00 00 48 89 5c 24 e0 48 89 6c 24 e8 48 89 fb 4c 89 64 24 f0 4c 89 6c 24 f8 48 83 ec 38 <0f> b6 47 04 83 e0 0f 83 f8 06 0f 85 3e 01 00 00 31 d2 66 83 7b 06
RSP: 002b:00007ffd13103ee0 EFLAGS: 00010202
RAX: 00000000000003b7 RBX: 00007fe02d584588 RCX: 0000000000000000
RDX: 00007fe02d57d94c RSI: 0000000000000edc RDI: 00007fe02d584588
RBP: 00007fe02dd2aef8 R08: 00000000004282a7 R09: 0000000000000004
R10: 00007ffd13103f70 R11: 00007ffd13103f70 R12: 0000000000000004
R13: 0000000010a0a9c4 R14: 0000000000000002 R15: 00007ffd131040f8


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Dmitry Vyukov

unread,
Jul 31, 2020, 2:50:38 AM7/31/20
to syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers
On Fri, Jul 31, 2020 at 8:44 AM syzbot
<syzbot+8472ea...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 92ed3019 Linux 5.8-rc7
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> kernel config: https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> compiler: gcc (GCC) 10.1.0-syz 20200507
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8472ea...@syzkaller.appspotmail.com

This is a qemu-kvm instance killing the host kernel somehow, the host
kernel itself running qemu's is full of rcu stalls. I think this is
not a bug in the tested kernel.
We change rcu stall timeout to 120 seconds from the default 21s, but
this happens only after boot using sysctls. I did not find any way to
change the rcu timeout via cmdline/config (would be useful).
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000060adcb05abb71eb6%40google.com.

Randy Dunlap

unread,
Jul 31, 2020, 12:08:18 PM7/31/20
to Dmitry Vyukov, syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, Paul E. McKenney
On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> On Fri, Jul 31, 2020 at 8:44 AM syzbot
> <syzbot+8472ea...@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 92ed3019 Linux 5.8-rc7
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
>> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
>> compiler: gcc (GCC) 10.1.0-syz 20200507
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+8472ea...@syzkaller.appspotmail.com
>
> This is a qemu-kvm instance killing the host kernel somehow, the host
> kernel itself running qemu's is full of rcu stalls. I think this is
> not a bug in the tested kernel.
> We change rcu stall timeout to 120 seconds from the default 21s, but
> this happens only after boot using sysctls. I did not find any way to
> change the rcu timeout via cmdline/config (would be useful).

(adding Paul)


Documentation/RCU/stallwarn.rst says there is a Kconfig:

CONFIG_RCU_CPU_STALL_TIMEOUT

This kernel configuration parameter defines the period of time
that RCU will wait from the beginning of a grace period until it
issues an RCU CPU stall warning. This time period is normally
21 seconds.

and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
one for CPU and one for tasks:

rcupdate.rcu_cpu_stall_timeout= [KNL]
Set timeout for RCU CPU stall warning messages.

rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
to zero.


--
~Randy

Dmitry Vyukov

unread,
Jul 31, 2020, 12:21:18 PM7/31/20
to Randy Dunlap, syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, Paul E. McKenney
Hi Randy,

Thanks for looking into this.
But I think I messed things up. The config has
CONFIG_RCU_CPU_STALL_TIMEOUT=100, but this is not an RCU stall:

watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]

This is what is controlled by kernel.watchdog_thresh sysctl (?).

Dmitry Vyukov

unread,
Jul 31, 2020, 12:23:19 PM7/31/20
to Randy Dunlap, syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, Paul E. McKenney
And there is actually a cmdline parameter for this:

static int __init watchdog_thresh_setup(char *str)
{
get_option(&str, &watchdog_thresh);
return 1;
}
__setup("watchdog_thresh=", watchdog_thresh_setup);

I will write it down somewhere.

syzbot

unread,
Dec 6, 2020, 5:20:08 AM12/6/20
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages