INFO: task hung in tls_sk_proto_close

14 views
Skip to first unread message

syzbot

unread,
Mar 30, 2020, 3:09:13ā€ÆAM3/30/20
to avi...@mellanox.com, bor...@mellanox.com, dan...@iogearbox.net, da...@davemloft.net, john.fa...@gmail.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following crash on:

HEAD commit: 1a147b74 Merge branch 'DSA-mtu'
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=124fcacbe00000
kernel config: https://syzkaller.appspot.com/x/.config?x=46ee14d4915944bc
dashboard link: https://syzkaller.appspot.com/bug?extid=ca1345cca66556f3d79b
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ca1345...@syzkaller.appspotmail.com

INFO: task syz-executor.3:11325 blocked for more than 143 seconds.
Not tainted 5.6.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3 D28352 11325 7299 0x00004004
Call Trace:
schedule+0xd0/0x2a0 kernel/sched/core.c:4154
schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
do_wait_for_common kernel/sched/completion.c:83 [inline]
__wait_for_common kernel/sched/completion.c:104 [inline]
wait_for_common kernel/sched/completion.c:115 [inline]
wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
__flush_work+0x4f0/0xa70 kernel/workqueue.c:3043
__cancel_work_timer+0x3a2/0x500 kernel/workqueue.c:3130
tls_sk_proto_close+0x4a9/0xaf0 net/tls/tls_main.c:305
inet_release+0xe4/0x1f0 net/ipv4/af_inet.c:427
inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:470
__sock_release+0xcd/0x280 net/socket.c:605
sock_close+0x18/0x20 net/socket.c:1283
__fput+0x2da/0x850 fs/file_table.c:280
task_work_run+0x13f/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:164
prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
syscall_return_slowpath arch/x86/entry/common.c:278 [inline]
do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:304
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4163e1
Code: Bad RIP value.
RSP: 002b:00007ffe2e1990c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 00000000004163e1
RDX: 0000000000000000 RSI: 0000000000001a5e RDI: 0000000000000005
RBP: 0000000000000001 R08: 00000000c2e43a5e R09: 00000000c2e43a62
R10: 00007ffe2e1991a0 R11: 0000000000000293 R12: 000000000076c900
R13: 000000000076c900 R14: 0000000000042661 R15: 000000000076bfac

Showing all locks held in the system:
1 lock held by khungtaskd/1134:
#0: ffffffff899accc0 (rcu_read_lock){....}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5331
3 locks held by kworker/1:3/2831:
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: __write_once_size include/linux/compiler.h:226 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:40 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: set_work_data kernel/workqueue.c:615 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
#0: ffff8880aa026d28 ((wq_completion)events){+.+.}, at: process_one_work+0x82a/0x1690 kernel/workqueue.c:2237
#1: ffffc9000825fdd0 ((work_completion)(&(&sw_ctx_tx->tx_work.work)->work)){+.+.}, at: process_one_work+0x85e/0x1690 kernel/workqueue.c:2241
#2: ffff88809fb020d0 (&ctx->tx_lock){+.+.}, at: tx_work_handler+0x127/0x190 net/tls/tls_sw.c:2209
1 lock held by in:imklog/6642:
#0: ffff8880a794bae0 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xe3/0x100 fs/file.c:826
1 lock held by syz-executor.3/11325:
#0: ffff88808602f740 (&sb->s_type->i_mutex_key#14){+.+.}, at: inode_lock include/linux/fs.h:792 [inline]
#0: ffff88808602f740 (&sb->s_type->i_mutex_key#14){+.+.}, at: __sock_release+0x86/0x280 net/socket.c:604

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1134 Comm: khungtaskd Not tainted 5.6.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0xa8c/0x1010 kernel/hung_task.c:289
kthread+0x357/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 55 Comm: kworker/u4:2 Not tainted 5.6.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
RIP: 0010:check_preemption_disabled lib/smp_processor_id.c:14 [inline]
RIP: 0010:debug_smp_processor_id+0x8/0x185 lib/smp_processor_id.c:57
Code: 1b fe e9 e3 fe ff ff 48 c7 c7 84 02 86 8a e8 9f d6 1b fe eb 87 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 41 54 53 e8 c8 01 df fd <65> 44 8b 25 b0 86 6e 7c 65 8b 1d d1 f1 6e 7c 81 e3 ff ff ff 7f 31
RSP: 0018:ffffc90000ef7cb0 EFLAGS: 00000293
RAX: ffff8880a91be100 RBX: 0000000000038300 RCX: ffffffff87a49a8d
RDX: 0000000000000000 RSI: ffffffff83932ca8 RDI: 0000000000000001
RBP: ffff88805be14b80 R08: ffff8880a91be100 R09: ffffed1015ce707c
R10: ffffed1015ce707b R11: ffff8880ae7383db R12: 0000000000000001
R13: 0000000000000104 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f70bad32000 CR3: 00000000a608e000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
rcu_dynticks_curr_cpu_in_eqs+0x12/0xb0 kernel/rcu/tree.c:299
rcu_is_watching+0xc/0x20 kernel/rcu/tree.c:919
rcu_read_unlock include/linux/rcupdate.h:651 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:411 [inline]
batadv_nc_worker+0x47b/0x760 net/batman-adv/network-coding.c:718
process_one_work+0x94b/0x1690 kernel/workqueue.c:2266
worker_thread+0x96/0xe20 kernel/workqueue.c:2412
kthread+0x357/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,
Aug 29, 2020, 2:09:22ā€ÆAM8/29/20
to avi...@mellanox.com, avi...@nvidia.com, bor...@mellanox.com, bor...@nvidia.com, dan...@iogearbox.net, da...@davemloft.net, john.fa...@gmail.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: b36c9697 Add linux-next specific files for 20200828
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11ae3d61900000
kernel config: https://syzkaller.appspot.com/x/.config?x=5e3cf99580b5542c
dashboard link: https://syzkaller.appspot.com/bug?extid=ca1345cca66556f3d79b
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=170cdfe5900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=139a768e900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ca1345...@syzkaller.appspotmail.com

INFO: task syz-executor014:6815 blocked for more than 143 seconds.
Not tainted 5.9.0-rc2-next-20200828-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor014 state:D stack:23536 pid: 6815 ppid: 6814 flags:0x00004000
Call Trace:
context_switch kernel/sched/core.c:3778 [inline]
__schedule+0x8e5/0x21e0 kernel/sched/core.c:4527
schedule+0xd0/0x2a0 kernel/sched/core.c:4602
schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1855
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
__flush_work+0x51f/0xab0 kernel/workqueue.c:3046
__cancel_work_timer+0x5de/0x700 kernel/workqueue.c:3133
tls_sk_proto_close+0x4a7/0xaf0 net/tls/tls_main.c:305
inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
__sock_release+0xcd/0x280 net/socket.c:596
sock_close+0x18/0x20 net/socket.c:1277
__fput+0x285/0x920 fs/file_table.c:281
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:140 [inline]
exit_to_user_mode_prepare+0x195/0x1c0 kernel/entry/common.c:167
syscall_exit_to_user_mode+0x59/0x2b0 kernel/entry/common.c:242
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x403950
Code: Bad RIP value.
RSP: 002b:00007fff6bd1cf98 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000403950
RDX: 00000000000000d8 RSI: 00000000200005c0 RDI: 0000000000000004
RBP: 00007fff6bd1cfa0 R08: 0000000000000000 R09: 00000000000000d8
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff6bd1cfb0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Showing all locks held in the system:
4 locks held by kworker/u4:0/7:
1 lock held by khungtaskd/1167:
#0: ffffffff89c67640 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5825
1 lock held by in:imklog/6517:
#0: ffff8880a8b1c930 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor014/6815:
#0: ffff888085128750 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:779 [inline]
#0: ffff888085128750 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: __sock_release+0x86/0x280 net/socket.c:595
3 locks held by kworker/0:3/7021:
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x82b/0x1670 kernel/workqueue.c:2240
#1: ffffc90006137da8 ((work_completion)(&(&sw_ctx_tx->tx_work.work)->work)){+.+.}-{0:0}, at: process_one_work+0x85f/0x1670 kernel/workqueue.c:2244
#2: ffff88809dc300d8 (&ctx->tx_lock){+.+.}-{3:3}, at: tx_work_handler+0x127/0x190 net/tls/tls_sw.c:2251

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1167 Comm: khungtaskd Not tainted 5.9.0-rc2-next-20200828-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:147 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:253 [inline]
watchdog+0xd89/0xf30 kernel/hung_task.c:339
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 3893 Comm: systemd-journal Not tainted 5.9.0-rc2-next-20200828-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__read_seqcount_t_begin include/linux/seqlock.h:276 [inline]
RIP: 0010:path_init+0x1d1/0x13c0 fs/namei.c:2218
Code: 24 40 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 73 0e 00 00 44 8b 2d cf 44 de 07 <31> ff 45 89 ef 41 83 e7 01 44 89 fe e8 4e 87 b1 ff 45 84 ff 0f 85
RSP: 0018:ffffc900052f7a48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000040 RCX: ffffffff81c2f109
RDX: 1ffffffff13426c8 RSI: ffffffff81c2f117 RDI: ffffc900052f7c98
RBP: ffffc900052f7ae8 R08: 0000000000000000 R09: ffffffff8c6a59e7
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900052f7c58
R13: 0000000000001444 R14: ffffffff89a13640 R15: 0000000000000000
FS: 00007fcc879848c0(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc84d25000 CR3: 0000000094282000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
path_openat+0x185/0x2730 fs/namei.c:3363
do_filp_open+0x17e/0x3c0 fs/namei.c:3395
do_sys_openat2+0x16d/0x420 fs/open.c:1168
do_sys_open fs/open.c:1184 [inline]
__do_sys_open fs/open.c:1192 [inline]
__se_sys_open fs/open.c:1188 [inline]
__x64_sys_open+0x119/0x1c0 fs/open.c:1188
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fcc86f14840
Code: 73 01 c3 48 8b 0d 68 77 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 bb 20 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e f6 ff ff 48 89 04 24
RSP: 002b:00007ffe226b7b58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007ffe226b7e60 RCX: 00007fcc86f14840
RDX: 00000000000001a0 RSI: 0000000000080042 RDI: 0000563cfc995a50
RBP: 000000000000000d R08: 000000000000c0ff R09: 00000000ffffffff
R10: 0000000000000069 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000563cfc989040 R14: 00007ffe226b7e20 R15: 0000563cfc995820

syzbot

unread,
Aug 29, 2020, 2:23:09ā€ÆPM8/29/20
to a...@unstable.cc, avi...@mellanox.com, avi...@nvidia.com, b.a.t...@lists.open-mesh.org, bor...@mellanox.com, bor...@nvidia.com, dan...@iogearbox.net, da...@davemloft.net, ido...@mellanox.com, john.fa...@gmail.com, ku...@kernel.org, linux-...@vger.kernel.org, marekl...@neomailbox.ch, net...@vger.kernel.org, pe...@mellanox.com, s...@simonwunderlich.de, syzkall...@googlegroups.com
syzbot has bisected this issue to:

commit 02d21b59d5cc4b4b395bbc2a29319b8a529ebeff
Author: Ido Schimmel <ido...@mellanox.com>
Date: Wed Jan 23 14:32:59 2019 +0000

mlxsw: spectrum_nve: Enable VXLAN on Spectrum-2

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14e89b05900000
start commit: 5438dd45 net_sched: fix error path in red_init()
git tree: net
final oops: https://syzkaller.appspot.com/x/report.txt?x=16e89b05900000
console output: https://syzkaller.appspot.com/x/log.txt?x=12e89b05900000
kernel config: https://syzkaller.appspot.com/x/.config?x=a0437fdd630bee11
dashboard link: https://syzkaller.appspot.com/bug?extid=ca1345cca66556f3d79b
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14acdfe5900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1792598e900000

Reported-by: syzbot+ca1345...@syzkaller.appspotmail.com
Fixes: 02d21b59d5cc ("mlxsw: spectrum_nve: Enable VXLAN on Spectrum-2")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Hillf Danton

unread,
Aug 30, 2020, 2:03:50ā€ÆAM8/30/20
to syzbot, avi...@mellanox.com, avi...@nvidia.com, bor...@mellanox.com, bor...@nvidia.com, dan...@iogearbox.net, da...@davemloft.net, john.fa...@gmail.com, ku...@kernel.org, hda...@sina.com, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com

Fri, 28 Aug 2020 23:09:20 -0700
Bail out if sock is being closed paving a brick for kworker to
make progress. This is not a cure but a probe to help narrow
down the ground zero if it makes syzbot happy first.

--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -926,6 +926,11 @@ int tls_sw_sendmsg(struct sock *sk, stru
goto send_end;
}

+ if (test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)) {
+ ret = -ENXIO;
+ goto send_end;
+ }
+
if (ctx->open_rec)
rec = ctx->open_rec;
else
@@ -1127,6 +1132,11 @@ static int tls_sw_do_sendpage(struct soc
goto sendpage_end;
}

+ if (test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)) {
+ ret = -ENXIO;
+ goto sendpage_end;
+ }
+
if (ctx->open_rec)
rec = ctx->open_rec;
else

Reply all
Reply to author
Forward
0 new messages