Re: INFO: rcu detected stall in sys_kill

8 views
Skip to first unread message

Dmitry Vyukov

unread,
Dec 3, 2019, 3:38:35 AM12/3/19
to syzbot, Casey Schaufler, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry
On Tue, Dec 3, 2019 at 9:27 AM syzbot
<syzbot+de8d93...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew)
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
> dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
> compiler: clang version 9.0.0 (/home/glider/llvm/clang
> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+de8d93...@syzkaller.appspotmail.com

Something seriously broken in smack+kasan+vmap stacks, we now have 60
rcu stalls all over the place and counting. This is one of the
samples. I've duped 2 other samples to this one, you can see them on
the dashboard:
https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

I see 2 common this across all stalls:
1. They all happen on the instance that uses smack (which is now
effectively dead), see smack instance here:
https://syzkaller.appspot.com/upstream
2. They all contain this frame in the stack trace:
free_thread_stack+0x168/0x590 kernel/fork.c:280
The last commit that touches this file is "fork: support VMAP_STACK
with KASAN_VMALLOC".
That may be very likely the root cause. +Daniel


> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> (detected by 1, t=10502 jiffies, g=6629, q=331)
> rcu: All QSes seen, last rcu_preempt kthread activity 10503
> (4294953794-4294943291), jiffies_till_next_fqs=1, root ->qsmask 0x0
> syz-executor.0 R running task 24648 8293 8292 0x0000400a
> Call Trace:
> <IRQ>
> sched_show_task+0x40f/0x560 kernel/sched/core.c:5954
> print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
> check_cpu_stall kernel/rcu/tree_stall.h:538 [inline]
> rcu_pending kernel/rcu/tree.c:2827 [inline]
> rcu_sched_clock_irq+0x1861/0x1ad0 kernel/rcu/tree.c:2271
> update_process_times+0x12d/0x180 kernel/time/timer.c:1726
> tick_sched_handle kernel/time/tick-sched.c:167 [inline]
> tick_sched_timer+0x263/0x420 kernel/time/tick-sched.c:1310
> __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
> __hrtimer_run_queues+0x403/0x840 kernel/time/hrtimer.c:1576
> hrtimer_interrupt+0x38c/0xda0 kernel/time/hrtimer.c:1638
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
> smp_apic_timer_interrupt+0x109/0x280 arch/x86/kernel/apic/apic.c:1135
> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> </IRQ>
> RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:70 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x50 kernel/kcov.c:102
> Code: cc 07 48 89 de e8 64 02 3b 00 5b 5d c3 cc 48 8b 04 24 65 48 8b 0c 25
> c0 1d 02 00 65 8b 15 b8 81 8b 7e f7 c2 00 01 1f 00 75 2c <8b> 91 80 13 00
> 00 83 fa 02 75 21 48 8b 91 88 13 00 00 48 8b 32 48
> RSP: 0018:ffffc900021c7c28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: ffffffff81487433 RBX: 0000000000000000 RCX: ffff88809428a100
> RDX: 0000000000000001 RSI: 00000000fffffffc RDI: ffffea0002479240
> RBP: ffffc900021c7c50 R08: dffffc0000000000 R09: fffffbfff1287025
> R10: fffffbfff1287025 R11: 0000000000000000 R12: dffffc0000000000
> R13: dffffc0000000000 R14: 00000000fffffffc R15: ffff888091c57428
> free_thread_stack+0x168/0x590 kernel/fork.c:280
> release_task_stack kernel/fork.c:440 [inline]
> put_task_stack+0xa3/0x130 kernel/fork.c:451
> finish_task_switch+0x3f1/0x550 kernel/sched/core.c:3256
> context_switch kernel/sched/core.c:3388 [inline]
> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> preempt_schedule_common kernel/sched/core.c:4236 [inline]
> preempt_schedule+0xdb/0x120 kernel/sched/core.c:4261
> ___preempt_schedule+0x16/0x18 arch/x86/entry/thunk_64.S:50
> __raw_read_unlock include/linux/rwlock_api_smp.h:227 [inline]
> _raw_read_unlock+0x3a/0x40 kernel/locking/spinlock.c:255
> kill_something_info kernel/signal.c:1586 [inline]
> __do_sys_kill kernel/signal.c:3640 [inline]
> __se_sys_kill+0x5e9/0x6c0 kernel/signal.c:3634
> __x64_sys_kill+0x5b/0x70 kernel/signal.c:3634
> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x422a17
> Code: 44 00 00 48 c7 c2 d4 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 3e 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 dd 32 ff ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fff38dca538 EFLAGS: 00000293 ORIG_RAX: 000000000000003e
> RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 0000000000422a17
> RDX: 0000000000000bb8 RSI: 0000000000000009 RDI: 00000000fffffffe
> RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000001c62940
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000008
> R13: 00007fff38dca570 R14: 000000000000f0b6 R15: 00007fff38dca580
> rcu: rcu_preempt kthread starved for 10533 jiffies! g6629 f0x2
> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: RCU grace-period kthread stack dump:
> rcu_preempt R running task 29032 10 2 0x80004008
> Call Trace:
> context_switch kernel/sched/core.c:3388 [inline]
> __schedule+0x9a8/0xcc0 kernel/sched/core.c:4081
> schedule+0x181/0x210 kernel/sched/core.c:4155
> schedule_timeout+0x14f/0x240 kernel/time/timer.c:1895
> rcu_gp_fqs_loop kernel/rcu/tree.c:1661 [inline]
> rcu_gp_kthread+0xed8/0x1770 kernel/rcu/tree.c:1821
> kthread+0x332/0x350 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzk...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000036decf0598c8762e%40google.com.

Dmitry Vyukov

unread,
Dec 4, 2019, 8:58:23 AM12/4/19
to syzbot, Casey Schaufler, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry
I've stopped smack syzbot instance b/c it produces infinite stream of
assorted crashes due to this.
Please ping syzk...@googlegroups.com when this is fixed, I will
re-enable the instance.

Casey Schaufler

unread,
Dec 4, 2019, 11:05:44 AM12/4/19
to Dmitry Vyukov, syzbot, linux-security-module, Daniel Axtens, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler
On 12/4/2019 5:58 AM, Dmitry Vyukov wrote:
> On Tue, Dec 3, 2019 at 9:38 AM Dmitry Vyukov <dvy...@google.com> wrote:
>> On Tue, Dec 3, 2019 at 9:27 AM syzbot
>> <syzbot+de8d93...@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit: 596cf45c Merge branch 'akpm' (patches from Andrew)
>>> git tree: upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=15f11c2ae00000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=9bbcda576154a4b4
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
>>> compiler: clang version 9.0.0 (/home/glider/llvm/clang
>>> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+de8d93...@syzkaller.appspotmail.com
>> Something seriously broken in smack+kasan+vmap stacks, we now have 60
>> rcu stalls all over the place and counting. This is one of the
>> samples. I've duped 2 other samples to this one, you can see them on
>> the dashboard:
>> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

There haven't been Smack changes recently, so this is
going to have been introduced elsewhere. I'm perfectly
willing to accept that Smack is doing something horribly
wrong WRT rcu, and that it needs repair, but its going to
be tough for me to track down. I hope someone else is looking
into this, as my chances of finding the problem are pretty
slim.

Daniel Axtens

unread,
Dec 4, 2019, 6:34:08 PM12/4/19
to Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler
Hi Casey,

> There haven't been Smack changes recently, so this is
> going to have been introduced elsewhere. I'm perfectly
> willing to accept that Smack is doing something horribly
> wrong WRT rcu, and that it needs repair, but its going to
> be tough for me to track down. I hope someone else is looking
> into this, as my chances of finding the problem are pretty
> slim.

Yeah, I'm having a look, it's probably related to my kasan-vmalloc
stuff. It's currently in a bit of flux as syzkaller finds a bunch of
other bugs with it, once that stablises a bit I'll come back to Smack.

Regards,
Daniel

Daniel Axtens

unread,
Dec 17, 2019, 8:39:01 AM12/17/19
to Casey Schaufler, Dmitry Vyukov, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry, Casey Schaufler
Daniel Axtens <d...@axtens.net> writes:

> Hi Casey,
>
>> There haven't been Smack changes recently, so this is
>> going to have been introduced elsewhere. I'm perfectly
>> willing to accept that Smack is doing something horribly
>> wrong WRT rcu, and that it needs repair, but its going to
>> be tough for me to track down. I hope someone else is looking
>> into this, as my chances of finding the problem are pretty
>> slim.
>
> Yeah, I'm having a look, it's probably related to my kasan-vmalloc
> stuff. It's currently in a bit of flux as syzkaller finds a bunch of
> other bugs with it, once that stablises a bit I'll come back to Smack.

I have had a brief and wildly unsuccessful look at this. I'm happy to
come back to it and go over it with a finer toothed comb, but it will
almost certainly have to wait until next year.

I don't think it's related to RCU, we also have a plain lockup:
https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec

Dmitry, I've been really struggling to repro this locally, even with
your config. Is there an easy way to see the kernel command line you
booted with and anything else that makes this image special? I have zero
experience with smack so this is a steep learning curve.

Regards,
Daniel

Dmitry Vyukov

unread,
Jan 8, 2020, 1:20:45 AM1/8/20
to Daniel Axtens, Casey Schaufler, syzbot, linux-security-module, Andrey Ryabinin, kasan-dev, Andrea Arcangeli, Andrew Morton, Christian Brauner, chri...@kellner.me, cyp...@cyphar.com, Reshetova, Elena, Jason Gunthorpe, Kees Cook, l...@altlinux.org, LKML, Andy Lutomirski, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, Al Viro, Will Drewry
On Tue, Dec 17, 2019 at 2:39 PM Daniel Axtens <d...@axtens.net> wrote:
>
> Daniel Axtens <d...@axtens.net> writes:
>
> > Hi Casey,
> >
> >> There haven't been Smack changes recently, so this is
> >> going to have been introduced elsewhere. I'm perfectly
> >> willing to accept that Smack is doing something horribly
> >> wrong WRT rcu, and that it needs repair, but its going to
> >> be tough for me to track down. I hope someone else is looking
> >> into this, as my chances of finding the problem are pretty
> >> slim.
> >
> > Yeah, I'm having a look, it's probably related to my kasan-vmalloc
> > stuff. It's currently in a bit of flux as syzkaller finds a bunch of
> > other bugs with it, once that stablises a bit I'll come back to Smack.
>
> I have had a brief and wildly unsuccessful look at this. I'm happy to
> come back to it and go over it with a finer toothed comb, but it will
> almost certainly have to wait until next year.
>
> I don't think it's related to RCU, we also have a plain lockup:
> https://syzkaller.appspot.com/bug?id=be03729d17bb3b2df1754a7486a8f8628f6ff1ec
>
> Dmitry, I've been really struggling to repro this locally, even with
> your config. Is there an easy way to see the kernel command line you
> booted with and anything else that makes this image special? I have zero
> experience with smack so this is a steep learning curve.

I temporarily re-enabled smack instance and it produced another 50
stalls all over the kernel, and now keeps spewing a dozen every hour.

I've mailed 3 new samples, you can see them here:
https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb

The config is provided, command line args are here:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
Some non-default sysctls that syzbot sets are here:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
Image can be downloaded from here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
look to be virtualization-related (?) so probably should reproduce in
qemu too.
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/87h81zax74.fsf%40dja-thinkpad.axtens.net.

Tetsuo Handa

unread,
Jan 8, 2020, 5:25:46 AM1/8/20
to Dmitry Vyukov, Casey Schaufler, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
On 2020/01/08 15:20, Dmitry Vyukov wrote:
> I temporarily re-enabled smack instance and it produced another 50
> stalls all over the kernel, and now keeps spewing a dozen every hour.

Since we can get stall reports rather easily, can we try modifying
kernel command line (e.g. lsm=smack) and/or kernel config (e.g. no kasan) ?

>
> I've mailed 3 new samples, you can see them here:
> https://syzkaller.appspot.com/bug?extid=de8d933e7d153aa0c1bb
>
> The config is provided, command line args are here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-smack.cmdline
> Some non-default sysctls that syzbot sets are here:
> https://github.com/google/syzkaller/blob/master/dashboard/config/upstream.sysctl
> Image can be downloaded from here:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
> syzbot uses GCE VMs with 2 CPUs and 7.5GB memory, but this does not
> look to be virtualization-related (?) so probably should reproduce in
> qemu too.

Is it possible to add instance for linux-next.git that uses these configs?
If yes, we could try adding some debug printk() under CONFIG_DEBUG_AID_FOR_SYZBOT=y .

Casey Schaufler

unread,
Jan 8, 2020, 12:19:41 PM1/8/20
to Tetsuo Handa, Dmitry Vyukov, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs, Casey Schaufler
On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> On 2020/01/08 15:20, Dmitry Vyukov wrote:
>> I temporarily re-enabled smack instance and it produced another 50
>> stalls all over the kernel, and now keeps spewing a dozen every hour.

Do I have to be using clang to test this? I'm setting up to work on this,
and don't want to waste time using my current tool chain if the problem
is clang specific.

Dmitry Vyukov

unread,
Jan 9, 2020, 3:20:12 AM1/9/20
to Casey Schaufler, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <ca...@schaufler-ca.com> wrote:
>
> On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> >> I temporarily re-enabled smack instance and it produced another 50
> >> stalls all over the kernel, and now keeps spewing a dozen every hour.
>
> Do I have to be using clang to test this? I'm setting up to work on this,
> and don't want to waste time using my current tool chain if the problem
> is clang specific.

Humm, interesting. Initially I was going to say that most likely it's
not clang-related. Bug smack instance is actually the only one that
uses clang as well (except for KMSAN of course). So maybe it's indeed
clang-related rather than smack-related. Let me try to build a kernel
with clang.

Dmitry Vyukov

unread,
Jan 9, 2020, 3:50:25 AM1/9/20
to Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
On Thu, Jan 9, 2020 at 9:19 AM Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Wed, Jan 8, 2020 at 6:19 PM Casey Schaufler <ca...@schaufler-ca.com> wrote:
> >
> > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > >> I temporarily re-enabled smack instance and it produced another 50
> > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> >
> > Do I have to be using clang to test this? I'm setting up to work on this,
> > and don't want to waste time using my current tool chain if the problem
> > is clang specific.
>
> Humm, interesting. Initially I was going to say that most likely it's
> not clang-related. Bug smack instance is actually the only one that
> uses clang as well (except for KMSAN of course). So maybe it's indeed
> clang-related rather than smack-related. Let me try to build a kernel
> with clang.

+clang-built-linux, glider

[clang-built linux is severe broken since early Dec]

Building kernel with clang I can immediately reproduce this locally:

$ syz-manager
2020/01/09 09:27:15 loading corpus...
2020/01/09 09:27:17 serving http on http://0.0.0.0:50001
2020/01/09 09:27:17 serving rpc on tcp://[::]:45851
2020/01/09 09:27:17 booting test machines...
2020/01/09 09:27:17 wait for the connection from test machine...
2020/01/09 09:29:23 machine check:
2020/01/09 09:29:23 syscalls : 2961/3195
2020/01/09 09:29:23 code coverage : enabled
2020/01/09 09:29:23 comparison tracing : enabled
2020/01/09 09:29:23 extra coverage : enabled
2020/01/09 09:29:23 setuid sandbox : enabled
2020/01/09 09:29:23 namespace sandbox : enabled
2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy
does not exist
2020/01/09 09:29:23 fault injection : enabled
2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is
not enabled
2020/01/09 09:29:23 net packet injection : enabled
2020/01/09 09:29:23 net device setup : enabled
2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan
does not exist
2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0
is not available
2020/01/09 09:29:27 corpus : 50226 (0 deleted)
2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0
2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0
2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0
2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle
2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0
2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt
2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3
2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle
2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0
2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep
2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0
2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule
2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail
2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0
2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex
2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput
2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep


Then I switched LSM to selinux and I _still_ can reproduce this. So,
Casey, you may relax, this is not smack-specific :)

Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it
started working normally.

So this is somehow related to both clang and KASAN/VMAP_STACK.

The clang I used is:
https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz
(the one we use on syzbot).

Dmitry Vyukov

unread,
Jan 9, 2020, 4:29:36 AM1/9/20
to Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
Clustering hangs, they all happen within very limited section of the code:

1 free_thread_stack+0x124/0x590 kernel/fork.c:284
5 free_thread_stack+0x12e/0x590 kernel/fork.c:280
39 free_thread_stack+0x12e/0x590 kernel/fork.c:284
6 free_thread_stack+0x133/0x590 kernel/fork.c:280
5 free_thread_stack+0x13d/0x590 kernel/fork.c:280
2 free_thread_stack+0x141/0x590 kernel/fork.c:280
6 free_thread_stack+0x14c/0x590 kernel/fork.c:280
9 free_thread_stack+0x151/0x590 kernel/fork.c:280
3 free_thread_stack+0x15b/0x590 kernel/fork.c:280
67 free_thread_stack+0x168/0x590 kernel/fork.c:280
6 free_thread_stack+0x16d/0x590 kernel/fork.c:284
2 free_thread_stack+0x177/0x590 kernel/fork.c:284
1 free_thread_stack+0x182/0x590 kernel/fork.c:284
1 free_thread_stack+0x186/0x590 kernel/fork.c:284
16 free_thread_stack+0x18b/0x590 kernel/fork.c:284
4 free_thread_stack+0x195/0x590 kernel/fork.c:284

Here is disass of the function:
https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt

But if I am not mistaken, the function only ever jumps down. So how
can it loop?...

Dmitry Vyukov

unread,
Jan 9, 2020, 5:05:26 AM1/9/20
to Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
This is a miscompilation related to static branches.

objdump shows:

ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
./arch/x86/include/asm/jump_label.h:25
asm_volatile_goto("1:"

However, the actual instruction in memory at the time is:

0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f <free_thread_stack+287>

Which jumps to a wrong location in free_thread_stack and makes it loop.

The static branch is this:

static inline bool memcg_kmem_enabled(void)
{
return static_branch_unlikely(&memcg_kmem_enabled_key);
}

static inline void memcg_kmem_uncharge(struct page *page, int order)
{
if (memcg_kmem_enabled())
__memcg_kmem_uncharge(page, order);
}

I suspect it may have something to do with loop unrolling. It may jump
to the right location, but in the wrong unrolled iteration.

Dmitry Vyukov

unread,
Jan 9, 2020, 5:39:19 AM1/9/20
to Casey Schaufler, Daniel Axtens, Alexander Potapenko, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
Kernel built with clang version 10.0.0
(https://github.com/llvm/llvm-project.git
c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.

Alex, please update clang on syzbot machines.

Alexander Potapenko

unread,
Jan 9, 2020, 11:23:33 AM1/9/20
to Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
Done ~3 hours ago, guess we'll see the results within a day.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Nick Desaulniers

unread,
Jan 9, 2020, 12:17:08 PM1/9/20
to Alexander Potapenko, Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
On Thu, Jan 9, 2020 at 8:23 AM 'Alexander Potapenko' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
>
> On Thu, Jan 9, 2020 at 11:39 AM Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > On Thu, Jan 9, 2020 at 11:05 AM Dmitry Vyukov <dvy...@google.com> wrote:
> > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > > > >
> > > > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > > > and don't want to waste time using my current tool chain if the problem
> > > > > > > is clang specific.
> > > > > >
> > > > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > > > not clang-related. Bug smack instance is actually the only one that
> > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > > > with clang.
> > > > >
> > > > > +clang-built-linux, glider
> > > > >
> > > > > [clang-built linux is severe broken since early Dec]

Is there automated reporting? Consider adding our mailing list for
Clang specific failures.
clang-built-linux <clang-bu...@googlegroups.com>
Our CI looks green, but there's a very long tail of combinations of
configs that we don't have coverage of, so bug reports are
appreciated:
https://github.com/ClangBuiltLinux/linux/issues
I disabled loop unrolling and loop unswitching in LLVM when the loop
contained asm goto in:
https://github.com/llvm/llvm-project/commit/c4f245b40aad7e8627b37a8bf1bdcdbcd541e665
I have a fix for loop unrolling in:
https://reviews.llvm.org/D64101
that I should dust off. I haven't looked into loop unswitching yet.

> >
> >
> > Kernel built with clang version 10.0.0
> > (https://github.com/llvm/llvm-project.git
> > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) works fine.
> >
> > Alex, please update clang on syzbot machines.
>
> Done ~3 hours ago, guess we'll see the results within a day.

Please let me know if you otherwise encounter any miscompiles with
Clang, particularly `asm goto` I treat as P0.
--
Thanks,
~Nick Desaulniers

Dmitry Vyukov

unread,
Jan 9, 2020, 12:23:33 PM1/9/20
to Nick Desaulniers, Alexander Potapenko, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
On Thu, Jan 9, 2020 at 6:17 PM Nick Desaulniers <ndesau...@google.com> wrote:
> > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote:
> > > > > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote:
> > > > > > > > >> I temporarily re-enabled smack instance and it produced another 50
> > > > > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour.
> > > > > > > >
> > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this,
> > > > > > > > and don't want to waste time using my current tool chain if the problem
> > > > > > > > is clang specific.
> > > > > > >
> > > > > > > Humm, interesting. Initially I was going to say that most likely it's
> > > > > > > not clang-related. Bug smack instance is actually the only one that
> > > > > > > uses clang as well (except for KMSAN of course). So maybe it's indeed
> > > > > > > clang-related rather than smack-related. Let me try to build a kernel
> > > > > > > with clang.
> > > > > >
> > > > > > +clang-built-linux, glider
> > > > > >
> > > > > > [clang-built linux is severe broken since early Dec]
>
> Is there automated reporting? Consider adding our mailing list for
> Clang specific failures.
> clang-built-linux <clang-bu...@googlegroups.com>
> Our CI looks green, but there's a very long tail of combinations of
> configs that we don't have coverage of, so bug reports are
> appreciated:
> https://github.com/ClangBuiltLinux/linux/issues

syzbot does automatic reporting, but it does not automatically
classify bugs as clang-specific.
FTR, this combination is clang+KASAN+VMAP_STACK (relatively recent
changes, and that's what triggered the infinite loop). But note that
the kernel booted, you can ssh and do some basic things.
c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is in the range between the
broken compiler and the newer compiler that seems to work, so I would
assume that that commit fixes this.
We will get the final stamp from syzbot hopefully by tomorrow.

Nick Desaulniers

unread,
Jan 9, 2020, 12:39:04 PM1/9/20
to Dmitry Vyukov, Alexander Potapenko, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
How often do you refresh the build of Clang in syzbot? Is it manual? I
understand the tradeoffs of living on the tip of the spear, but
c4f245b40aad7e8627b37a8bf1bdcdbcd541e665 is 6 months old. So upstream
LLVM could be regressing more often, and you wouldn't notice for 1/2 a
year or more. :-/

--
Thanks,
~Nick Desaulniers

Daniel Axtens

unread,
Jan 9, 2020, 6:25:33 PM1/9/20
to Dmitry Vyukov, Casey Schaufler, Alexander Potapenko, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
Wow, what a bug. Very happy to be off the hook for causing it, and
feeling a lot better about my inability to reproduce it with a GCC-built
kernel!

Regards,
Daniel

Alexander Potapenko

unread,
Jan 10, 2020, 3:37:19 AM1/10/20
to Nick Desaulniers, Dmitry Vyukov, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
KMSAN used to be the only user of Clang on syzbot, so I didn't bother too often.
Now that there are other users, we'll need a better strategy.
Clang revisions I've been picking previously came from Chromium's
Clang distributions. This is nice, because Chromium folks usually pick
a revision that has been extensively tested at Google already, plus
they make sure Chromium tests also pass.
They don't roll the compiler often, however (typically once a month or
two, but this time there were holidays, plus some nasty breakages).
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAKwvOdkh8CV0pgqqHXknv8%2BgE2ovoKEV_m%2BqiEmWutmLnra3%3Dg%40mail.gmail.com.

Dmitry Vyukov

unread,
Jan 14, 2020, 5:15:39 AM1/14/20
to Alexander Potapenko, Nick Desaulniers, Casey Schaufler, Daniel Axtens, clang-built-linux, Tetsuo Handa, syzbot, kasan-dev, Andrew Morton, LKML, syzkaller-bugs
The clang instances are back to life (incl smack).

#syz invalid

On Fri, Jan 10, 2020 at 9:37 AM 'Alexander Potapenko' via kasan-dev
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAG_fn%3DUU0fuws59L8Bp8DEVhH%2BX6xRaanwuRrzy-HNdrVpqJmg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages