INFO: rcu detected stall in xfrm_confirm_neigh

15 views
Skip to first unread message

syzbot

unread,
Feb 12, 2018, 10:23:45ā€ÆAM2/12/18
to babu....@oracle.com, dzi...@redhat.com, gre...@linuxfoundation.org, h...@zytor.com, kste...@linuxfoundation.org, linux-...@vger.kernel.org, mi...@redhat.com, npi...@gmail.com, pombr...@nexb.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
Hello,

syzbot hit the following crash on net-next commit
9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +0000)
net/ipv4: Allow send to local broadcast from a socket bound to a VRF

So far this crash happened 6 times on net-next.
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7d03c8...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

INFO: rcu_sched self-detected stall on CPU
1-...!: (124998 ticks this GP) idle=376/140000000000001/0
softirq=506054/506054 fqs=19
(t=125000 jiffies g=289415 c=289414 q=312)
rcu_sched kthread starved for 124920 jiffies! g289415 c289414 f0x0
RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
rcu_sched R running task 23456 8 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2799 [inline]
__schedule+0x8eb/0x2060 kernel/sched/core.c:3375
schedule+0xf5/0x430 kernel/sched/core.c:3434
schedule_timeout+0x118/0x230 kernel/time/timer.c:1793
rcu_gp_kthread+0x9e5/0x1930 kernel/rcu/tree.c:2314
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:541
NMI backtrace for cpu 1
CPU: 1 PID: 15893 Comm: syz-executor0 Not tainted 4.15.0-rc9+ #210
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x122/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
rcu_dump_cpu_stacks+0x186/0x1d9 kernel/rcu/tree.c:1459
print_cpu_stall kernel/rcu/tree.c:1608 [inline]
check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1676
__rcu_pending kernel/rcu/tree.c:3440 [inline]
rcu_pending kernel/rcu/tree.c:3502 [inline]
rcu_check_callbacks+0x256/0xd00 kernel/rcu/tree.c:2842
update_process_times+0x30/0x60 kernel/time/timer.c:1628
tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
__run_hrtimer kernel/time/hrtimer.c:1211 [inline]
__hrtimer_run_queues+0x358/0xe20 kernel/time/hrtimer.c:1275
hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1309
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:937
</IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x3b/0x50 kernel/kcov.c:106
RSP: 0018:ffff8801a6867820 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
RAX: 0000000000010000 RBX: ffff8801caf7d200 RCX: ffffffff84acc87d
RDX: 000000000000ffff RSI: ffffc90002aa6000 RDI: ffff8801c54c4f50
RBP: ffff8801a6867820 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c54c4e00
R13: dffffc0000000000 R14: ffffed00395efa5f R15: ffff8801caf7d2fc
xfrm_get_dst_nexthop net/xfrm/xfrm_policy.c:2732 [inline]
xfrm_confirm_neigh+0xad/0x270 net/xfrm/xfrm_policy.c:2759
dst_confirm_neigh include/net/dst.h:419 [inline]
raw_sendmsg+0xece/0x23b0 net/ipv4/raw.c:702
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
SYSC_sendto+0x361/0x5c0 net/socket.c:1747
SyS_sendto+0x40/0x50 net/socket.c:1715
entry_SYSCALL_64_fastpath+0x29/0xa0
RIP: 0033:0x452f19
RSP: 002b:00007f00a389ec58 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000000000071bf58 RCX: 0000000000452f19
RDX: 000000000000001e RSI: 0000000020098000 RDI: 0000000000000013
RBP: 0000000000000510 R08: 0000000020cf9000 R09: 0000000000000010
R10: fffffffffffffffe R11: 0000000000000212 R12: 00000000006f6a20
R13: 00000000ffffffff R14: 00007f00a389f6d4 R15: 0000000000000001


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
raw.log.txt
config.txt

Dmitry Vyukov

unread,
Feb 12, 2018, 10:27:16ā€ÆAM2/12/18
to syzbot, babu....@oracle.com, dzi...@redhat.com, Greg Kroah-Hartman, H. Peter Anvin, Kate Stewart, LKML, Ingo Molnar, npi...@gmail.com, Philippe Ombredanne, syzkall...@googlegroups.com, Thomas Gleixner, the arch/x86 maintainers, Steffen Klassert, Herbert Xu, David Miller, netdev
On Mon, Feb 12, 2018 at 4:23 PM, syzbot
<syzbot+7d03c8...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +0000)
> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>
> So far this crash happened 6 times on net-next.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.


+xfrm maintainers
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a1141ba9ea381f70565057687%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,
Feb 13, 2018, 4:19:39ā€ÆAM2/13/18
to syzbot, dzi...@redhat.com, Greg Kroah-Hartman, H. Peter Anvin, Kate Stewart, LKML, Ingo Molnar, npi...@gmail.com, Philippe Ombredanne, syzkall...@googlegroups.com, Thomas Gleixner, the arch/x86 maintainers, Steffen Klassert, Herbert Xu, David Miller, netdev
On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, Feb 12, 2018 at 4:23 PM, syzbot
> <syzbot+7d03c8...@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +0000)
>> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>>
>> So far this crash happened 6 times on net-next.
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>
>
> +xfrm maintainers

Here is a C repro:
https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
Somewhat messy, but it gives me:

[64360.028053] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [a.out:7891]
[64360.030043] Modules linked in:
[64360.030355] CPU: 0 PID: 7891 Comm: a.out Not tainted 4.15.0+ #96
[64360.031334] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[64360.033632] RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50
[64360.035114] RSP: 0018:ffff88006b96f6e0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff11
[64360.037238] RAX: ffff8800636ce040 RBX: ffff880067fadb00 RCX: ffffffff84d147a6
[64360.039238] RDX: 0000000000000000 RSI: ffffffff84d147b0 RDI: 0000000000000001
[64360.041357] RBP: ffff88006b96f6e0 R08: ffff8800636ce040 R09: ffffed000d1008c9
[64360.043364] R10: 0000000000000000 R11: 0000000000000000 R12: ffffed000cff5b7b
[64360.045366] R13: ffff880067fadbdc R14: ffff88006586ae00 R15: 0000000000000000
[64360.047367] FS: 00007f2a5bab5700(0000) GS:ffff88006ca00000(0000)
knlGS:0000000000000000
[64360.049486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[64360.051095] CR2: 0000000020cf9000 CR3: 00000000626a7001 CR4: 00000000001606f0
[64360.053065] Call Trace:
[64360.053822] xfrm_confirm_neigh+0xd0/0x2c0
[64360.056166] raw_sendmsg+0x1f1e/0x2640
[64360.068396] inet_sendmsg+0x145/0x490
[64360.074198] sock_sendmsg+0xd2/0x120
[64360.075094] SYSC_sendto+0x3de/0x640
[64360.085269] SyS_sendto+0x40/0x50

syzbot

unread,
Feb 19, 2018, 2:17:05ā€ÆAM2/19/18
to Steffen Klassert, steffen....@secunet.com, syzkall...@googlegroups.com

> #syz test:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

This crash does not have a reproducer. I cannot test it.


> Subject: [PATCH] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with
> transport mode.

> On transport mode we forget to fetch the child dst_entry
> before we continue the while loop, this leads to an infinite
> loop. Fix this by fetching the child dst_entry before we
> continue the while loop.

> Reported-by: syzbot+7d03c8...@syzkaller.appspotmail.com
> Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
> Signed-off-by: Steffen Klassert <steffen....@secunet.com>
> ---
> net/xfrm/xfrm_policy.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7a23078132cf..ba5e795b64b2 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2729,14 +2729,14 @@ static const void *xfrm_get_dst_nexthop(const
> struct dst_entry *dst,
> while (dst->xfrm) {
> const struct xfrm_state *xfrm = dst->xfrm;

> + dst = xfrm_dst_child(dst);
> +
> if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
> continue;
> if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR)
> daddr = xfrm->coaddr;
> else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
> daddr = &xfrm->id.daddr;
> -
> - dst = xfrm_dst_child(dst);
> }
> return daddr;
> }
> --
> 2.14.1

Steffen Klassert

unread,
Feb 19, 2018, 2:22:45ā€ÆAM2/19/18
to Dmitry Vyukov, syzbot, dzi...@redhat.com, Greg Kroah-Hartman, H. Peter Anvin, Kate Stewart, LKML, Ingo Molnar, npi...@gmail.com, Philippe Ombredanne, syzkall...@googlegroups.com, Thomas Gleixner, the arch/x86 maintainers, Herbert Xu, David Miller, netdev
On Tue, Feb 13, 2018 at 10:19:17AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> > On Mon, Feb 12, 2018 at 4:23 PM, syzbot
> > <syzbot+7d03c8...@syzkaller.appspotmail.com> wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on net-next commit
> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +0000)
> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
> >>
> >> So far this crash happened 6 times on net-next.
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> Raw console output is attached.
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached.
> >
> >
> > +xfrm maintainers
>
> Here is a C repro:
> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt

Seems like syzbot does not know about this reproducer.

I've send a patch to test and got this as the reply:

Dmitry Vyukov

unread,
Feb 19, 2018, 5:06:00ā€ÆAM2/19/18
to Steffen Klassert, syzbot, dzi...@redhat.com, Greg Kroah-Hartman, H. Peter Anvin, Kate Stewart, LKML, Ingo Molnar, npi...@gmail.com, Philippe Ombredanne, syzkall...@googlegroups.com, Thomas Gleixner, the arch/x86 maintainers, Herbert Xu, David Miller, netdev
On Mon, Feb 19, 2018 at 8:22 AM, Steffen Klassert
<steffen....@secunet.com> wrote:
>> > <syzbot+7d03c8...@syzkaller.appspotmail.com> wrote:
>> >> Hello,
>> >>
>> >> syzbot hit the following crash on net-next commit
>> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +0000)
>> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>> >>
>> >> So far this crash happened 6 times on net-next.
>> >> Unfortunately, I don't have any reproducer for this crash yet.
>> >> Raw console output is attached.
>> >> compiler: gcc (GCC) 7.1.1 20170620
>> >> .config is attached.
>> >
>> >
>> > +xfrm maintainers
>>
>> Here is a C repro:
>> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
>
> Seems like syzbot does not know about this reproducer.
>
> I've send a patch to test and got this as the reply:
>
> This crash does not have a reproducer. I cannot test it.

Yes, it does not know about the reproducer. I've extracted it
manually, these hangs are sometimes hard to reproduce. For syzbot this
bug does not have a reproducer.
Have you tried to run the reproducer? For me it reproduced the bug
quite reliably.

Steffen Klassert

unread,
Feb 20, 2018, 4:53:11ā€ÆAM2/20/18
to Dmitry Vyukov, syzbot, dzi...@redhat.com, Greg Kroah-Hartman, H. Peter Anvin, Kate Stewart, LKML, Ingo Molnar, npi...@gmail.com, Philippe Ombredanne, syzkall...@googlegroups.com, Thomas Gleixner, the arch/x86 maintainers, Herbert Xu, David Miller, netdev
I've tried the reproducer, it does not trigger with my kernel
config and with your config my machine does not boot. So it
was just easy to ask syzbot for a test.

Anyway, Florian Westphal did a test and confirmed that
the patch fixes the issue. I've just applied the fix
below to the ipsec tree.


Subject: [PATCH] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.

On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.

Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c8...@syzkaller.appspotmail.com
Tested-by: Florian Westphal <f...@strlen.de>
Signed-off-by: Steffen Klassert <steffen....@secunet.com>
---
net/xfrm/xfrm_policy.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 150d46633ce6..625b3fca5704 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2732,14 +2732,14 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
Reply all
Reply to author
Forward
0 new messages