INFO: rcu detected stall in is_bpf_text_address

27 views
Skip to first unread message

syzbot

unread,
May 19, 2018, 11:55:03 AM5/19/18
to a...@kernel.org, dan...@iogearbox.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: 73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=156daf97800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3dcd59...@syzkaller.appspotmail.com

8021q: adding VLAN 0 to HW filter on device team0
8021q: adding VLAN 0 to HW filter on device team0
8021q: adding VLAN 0 to HW filter on device team0
8021q: adding VLAN 0 to HW filter on device team0
8021q: adding VLAN 0 to HW filter on device team0
INFO: rcu_sched self-detected stall on CPU
0-...!: (124998 ticks this GP) idle=0be/1/4611686018427387908
softirq=15234/15234 fqs=59
(t=125000 jiffies g=7610 c=7609 q=351640)
rcu_sched kthread starved for 124739 jiffies! g7610 c7609 f0x2
RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
RCU grace-period kthread stack dump:
rcu_sched R running task 23896 9 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2859 [inline]
__schedule+0x801/0x1e30 kernel/sched/core.c:3501
schedule+0xef/0x430 kernel/sched/core.c:3545
schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
NMI backtrace for cpu 0
CPU: 0 PID: 6381 Comm: sh Not tainted 4.17.0-rc5+ #58
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
print_cpu_stall kernel/rcu/tree.c:1525 [inline]
check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
__run_hrtimer kernel/time/hrtimer.c:1398 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
[inline]
RIP: 0010:lock_acquire+0x257/0x520 kernel/locking/lockdep.c:3923
RSP: 0000:ffff8801dae06b60 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffff1003b5c0d71 RCX: 0000000000000000
RDX: 1ffffffff11a30e5 RSI: ffff8801c7b54c38 RDI: 0000000000000286
RBP: ffff8801dae06c50 R08: 0000000000000008 R09: 0000000000000003
R10: ffff8801c7b54cb0 R11: ffff8801c7b54400 R12: ffff8801c7b54400
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
rcu_read_lock include/linux/rcupdate.h:632 [inline]
is_bpf_text_address+0x3b/0x170 kernel/bpf/core.c:478
kernel_text_address+0x79/0xf0 kernel/extable.c:152
__kernel_text_address+0xd/0x40 kernel/extable.c:107
unwind_get_return_address+0x61/0xa0 arch/x86/kernel/unwind_frame.c:18
__save_stack_trace+0x7e/0xd0 arch/x86/kernel/stacktrace.c:45
save_stack_trace+0x1a/0x20 arch/x86/kernel/stacktrace.c:60
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slab.c:3335 [inline]
kmem_cache_alloc_node+0x131/0x780 mm/slab.c:3642
__alloc_skb+0x111/0x780 net/core/skbuff.c:193
alloc_skb include/linux/skbuff.h:987 [inline]
_sctp_make_chunk+0x58/0x280 net/sctp/sm_make_chunk.c:1417
sctp_make_control net/sctp/sm_make_chunk.c:1464 [inline]
sctp_make_heartbeat+0x8f/0x430 net/sctp/sm_make_chunk.c:1177
sctp_sf_heartbeat.isra.23+0x26/0x180 net/sctp/sm_statefuns.c:1005
sctp_sf_sendbeat_8_3+0x38e/0x550 net/sctp/sm_statefuns.c:1049
sctp_do_sm+0x1ab/0x7160 net/sctp/sm_sideeffect.c:1188
sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline]
RIP: 0010:arch_atomic_read arch/x86/include/asm/atomic.h:31 [inline]
RIP: 0010:atomic_read include/asm-generic/atomic-instrumented.h:22 [inline]
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:349 [inline]
RIP: 0010:rcu_is_watching+0xc4/0x140 kernel/rcu/tree.c:1075
RSP: 0000:ffff8801c957f398 EFLAGS: 00000a06 ORIG_RAX: ffffffffffffff13
RAX: 00000000000130be RBX: 1ffff100392afe74 RCX: 1ffff100392afe78
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8801dae23610
RBP: ffff8801c957f428 R08: ffffed003b5c46c3 R09: ffffed003b5c46c2
R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff8801c957f3c0
R13: ffff8801dae23610 R14: ffff8801c957f400 R15: dffffc0000000000
rcu_read_lock_sched_held+0x8d/0x120 kernel/rcu/update.c:112
trace_kmem_cache_alloc include/trace/events/kmem.h:54 [inline]
kmem_cache_alloc+0x5fa/0x760 mm/slab.c:3555
anon_vma_alloc mm/rmap.c:81 [inline]
__anon_vma_prepare+0x3b3/0x700 mm/rmap.c:189
anon_vma_prepare include/linux/rmap.h:153 [inline]
wp_page_copy+0xdf1/0x1440 mm/memory.c:2481
do_wp_page+0x425/0x1990 mm/memory.c:2776
handle_pte_fault mm/memory.c:3979 [inline]
__handle_mm_fault+0x2996/0x4310 mm/memory.c:4087
handle_mm_fault+0x53a/0xc70 mm/memory.c:4124
__do_page_fault+0x60b/0xe40 arch/x86/mm/fault.c:1399
do_page_fault+0xee/0x8a7 arch/x86/mm/fault.c:1474
page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1160
RIP: 0033:0x7f4232a6d1fb
RSP: 002b:00007fff2aeebd28 EFLAGS: 00010246
RAX: 000000000000000d RBX: 00000000756e6547 RCX: 000000006c65746e
RDX: 0000000049656e69 RSI: 0000000000000025 RDI: 0000000000000002
RBP: 00007fff2aeebe20 R08: 0000000000000001 R09: 000000000000002f
R10: 00007f4232a6caf0 R11: 00007fff2aeebb70 R12: 00007f4232a4e000
R13: 00007f4232dd5038 R14: 00007f4232a6cac0 R15: 00007f4232dd5d98
BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 126s!
BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 125s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=7/256
pending: defense_work_handler, defense_work_handler,
defense_work_handler, defense_work_handler, defense_work_handler,
vmstat_shepherd, cache_reap
workqueue events_power_efficient: flags=0x80
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
in-flight: 2107:neigh_periodic_work
workqueue mm_percpu_wq: flags=0x8
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
pending: vmstat_update
workqueue writeback: flags=0x4e
pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256
in-flight: 4784:wb_workfn
workqueue ib_addr: flags=0xa0002
workqueue dm_bufio_cache: flags=0x8
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
pending: work_fn
workqueue ipv6_addrconf: flags=0x40008
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
pending: addrconf_dad_work
delayed: addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_verify_work
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: addrconf_dad_work
delayed: addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work, addrconf_dad_work, addrconf_dad_work, addrconf_dad_work,
addrconf_dad_work
pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 4707 24 4924
pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=4 idle: 44 4846 6


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Eric Dumazet

unread,
May 19, 2018, 11:57:23 AM5/19/18
to syzbot, a...@kernel.org, dan...@iogearbox.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, Xin Long, Marcelo Ricardo Leitner
SCTP experts, please take a look.

Marcelo Ricardo Leitner

unread,
May 19, 2018, 9:45:47 PM5/19/18
to Eric Dumazet, syzbot, a...@kernel.org, dan...@iogearbox.net, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, Xin Long
On Sat, May 19, 2018 at 08:57:19AM -0700, Eric Dumazet wrote:
> SCTP experts, please take a look.

Ugh, same sctp footprint as in the other reports.

Xin Long

unread,
May 20, 2018, 4:26:04 AM5/20/18
to Eric Dumazet, Marcelo Ricardo Leitner, syzbot, a...@kernel.org, Daniel Borkmann, LKML, network dev, syzkaller-bugs
On Sat, May 19, 2018 at 11:57 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> SCTP experts, please take a look.
>
> On 05/19/2018 08:55 AM, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: 73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
>> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000
Thank you.
The Reproducer is more than helpful.

setsockopt$inet_sctp6_SCTP_RTOINFO(r0, 0x84, 0x0,
&(0x7f0000000140)={0x0, 0x6, 0x7, 0x4}, 0x10)

It set rto_min=6 and rto_max=7, these are too small values.
t3_rtx timer works fine with it. But hb_timer will get stuck there, as
in its timer handler it starts this timer again with this value, then
it goes to the timer handler again...

HB has to repeat this and the hb timer's expire may also have to use
'trans->rto >> 1 ...' stuff. But we can limit the RTO's min value, like
HZ/20, which is 'Try again later.' number used when sock lock is
owned by others in all timer handlers.

Marcelo Ricardo Leitner

unread,
May 21, 2018, 8:43:25 PM5/21/18
to Xin Long, Eric Dumazet, syzbot, a...@kernel.org, Daniel Borkmann, LKML, network dev, syzkaller-bugs, linux...@vger.kernel.org
On Sun, May 20, 2018 at 04:26:03PM +0800, Xin Long wrote:
> On Sat, May 19, 2018 at 11:57 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> > SCTP experts, please take a look.
> >
> > On 05/19/2018 08:55 AM, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit: 73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
> >> git tree: upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
> >> kernel config: https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000
> Thank you.
> The Reproducer is more than helpful.
>
> setsockopt$inet_sctp6_SCTP_RTOINFO(r0, 0x84, 0x0,
> &(0x7f0000000140)={0x0, 0x6, 0x7, 0x4}, 0x10)
>
> It set rto_min=6 and rto_max=7, these are too small values.
> t3_rtx timer works fine with it. But hb_timer will get stuck there, as
> in its timer handler it starts this timer again with this value, then
> it goes to the timer handler again...

Nice, thanks Xin.

>
> HB has to repeat this and the hb timer's expire may also have to use
> 'trans->rto >> 1 ...' stuff. But we can limit the RTO's min value, like
> HZ/20, which is 'Try again later.' number used when sock lock is
> owned by others in all timer handlers.

I think a good fix for this is to not allow the application to go
below net.sctp.rto_min, and neither above net.sctp.rto_max.

Then they can even be close to each other, won't be an issue, as long
as rto_min is something sensible. Which then brings it to the second
step of a fix: to restrict rto_min to be >= HZ/5 (copying from TCP
here).

Marcelo Ricardo Leitner

unread,
May 28, 2018, 1:55:51 PM5/28/18
to Xin Long, Eric Dumazet, syzbot, a...@kernel.org, Daniel Borkmann, LKML, network dev, syzkaller-bugs
On Sun, May 20, 2018 at 04:26:03PM +0800, Xin Long wrote:
> On Sat, May 19, 2018 at 11:57 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> > SCTP experts, please take a look.
> >
> > On 05/19/2018 08:55 AM, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit: 73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
> >> git tree: upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
> >> kernel config: https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000
> Thank you.
> The Reproducer is more than helpful.
>
> setsockopt$inet_sctp6_SCTP_RTOINFO(r0, 0x84, 0x0,
> &(0x7f0000000140)={0x0, 0x6, 0x7, 0x4}, 0x10)
>
> It set rto_min=6 and rto_max=7, these are too small values.

Considering
struct sctp_rtoinfo {
sctp_assoc_t srto_assoc_id;
__u32 srto_initial;
__u32 srto_max;
__u32 srto_min;
};

Isn't this actually equivalent to:
struct sctp_rtoinfo foo = {
.srto_assoc_id = 0,
.srto_initial = 6,
.srto_max = 7,
.srto_min = 4 /* instead of 6 */
};

?

This doesn't change a thing in the analysis, it's just to be sure
which one is right.

Marcelo Ricardo Leitner

unread,
May 28, 2018, 2:22:34 PM5/28/18
to Xin Long, Eric Dumazet, syzbot, a...@kernel.org, Daniel Borkmann, LKML, network dev, syzkaller-bugs
On Sun, May 20, 2018 at 04:26:03PM +0800, Xin Long wrote:
> On Sat, May 19, 2018 at 11:57 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> > SCTP experts, please take a look.
> >
> > On 05/19/2018 08:55 AM, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit: 73fcb1a370c7 Merge branch 'akpm' (patches from Andrew)
> >> git tree: upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1462ec0f800000
> >> kernel config: https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f
> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000

The reproducer has:
r0 = socket$inet6(0xa, 0x1, 0x8010000000000084)

Considering socket() prototype uses an int for the 3rd argument, how
should I interpret this 64b value?

0x84 is IPPROTO_SCTP, but don't know about the 0x8010000000000000 in
there.

Andrey Konovalov

unread,
May 29, 2018, 7:53:23 AM5/29/18
to Marcelo Ricardo Leitner, Xin Long, Eric Dumazet, syzbot, a...@kernel.org, Daniel Borkmann, LKML, network dev, syzkaller-bugs
On Mon, May 28, 2018 at 8:22 PM, Marcelo Ricardo Leitner
<marcelo...@gmail.com> wrote:
>
> The reproducer has:
> r0 = socket$inet6(0xa, 0x1, 0x8010000000000084)
>
> Considering socket() prototype uses an int for the 3rd argument, how
> should I interpret this 64b value?
>
> 0x84 is IPPROTO_SCTP, but don't know about the 0x8010000000000000 in
> there.

Hi Marcelo,

It gets stripped to int the same way C does it, so the value would be 0x84.

Thanks!
Reply all
Reply to author
Forward
0 new messages