INFO: task hung in stop_sync

syzbot

unread,

Mar 29, 2018, 3:25:02 AM3/29/18

to core...@netfilter.org, da...@davemloft.net, f...@strlen.de, ho...@verge.net.au, j...@ssi.bg, kad...@blackhole.kfki.hu, linux-...@vger.kernel.org, lvs-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, wen...@linux-vs.org

Hello,

syzbot hit the following crash on net-next commit
5d22d47b9ed96eddb35821dc2cc4f629f45827f7 (Tue Mar 27 17:33:21 2018 +0000)
Merge branch 'sfc-filter-locking'
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6119456711376896
Kernel config:
https://syzkaller.appspot.com/x/.config?id=4372867303600475372
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5fe074...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4,
id = 0
IPVS: stopping backup sync thread 25415 ...
INFO: task syz-executor7:25421 blocked for more than 120 seconds.
Not tainted 4.16.0-rc6+ #284
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D23688 25421 4408 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2862 [inline]
__schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
schedule+0xf5/0x430 kernel/sched/core.c:3499
schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
do_wait_for_common kernel/sched/completion.c:86 [inline]
__wait_for_common kernel/sched/completion.c:107 [inline]
wait_for_common kernel/sched/completion.c:118 [inline]
wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
kthread_stop+0x14a/0x7a0 kernel/kthread.c:530
stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996
do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253
sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039
SYSC_setsockopt net/socket.c:1850 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1829
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x454889
RSP: 002b:00007fc927626c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fc9276276d4 RCX: 0000000000454889
RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000017
RBP: 000000000072bf58 R08: 0000000000000018 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000051c R14: 00000000006f9b40 R15: 0000000000000001

Showing all locks held in the system:
2 locks held by khungtaskd/868:
#0: (rcu_read_lock){....}, at: [<00000000a1a8f002>]
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] watchdog+0x1c5/0xd60
kernel/hung_task.c:249
#1: (tasklist_lock){.+.+}, at: [<0000000037c2f8f9>]
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
1 lock held by rsyslogd/4247:
#0: (&f->f_pos_lock){+.+.}, at: [<000000000d8d6983>]
__fdget_pos+0x12b/0x190 fs/file.c:765
2 locks held by getty/4338:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4339:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4340:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4341:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4342:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4343:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4344:
#0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
3 locks held by kworker/0:5/6494:
#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
[<00000000a062b18e>] work_static include/linux/workqueue.h:198 [inline]
#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
[<00000000a062b18e>] set_work_data kernel/workqueue.c:619 [inline]
#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
[<00000000a062b18e>] set_work_pool_and_clear_pending kernel/workqueue.c:646
[inline]
#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
[<00000000a062b18e>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
#1: ((addr_chk_work).work){+.+.}, at: [<00000000278427d5>]
process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
#2: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syz-executor7/25421:
#0: (ipvs->sync_mutex){+.+.}, at: [<00000000d414a689>]
do_ip_vs_set_ctl+0x277/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2393
2 locks held by syz-executor7/25427:
#0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
#1: (ipvs->sync_mutex){+.+.}, at: [<00000000e6d48489>]
do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388
1 lock held by syz-executor7/25435:
#0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by ipvs-b:2:0/25415:
#0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 868 Comm: khungtaskd Not tainted 4.16.0-rc6+ #284
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x90c/0xd60 kernel/hung_task.c:249
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54

---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

syzbot

unread,

Mar 29, 2018, 2:33:20 PM3/29/18

to Julian Anastasov, j...@ssi.bg, syzkall...@googlegroups.com

> #syz test:
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master

This crash does not have a reproducer. I cannot test it.

Julian Anastasov

unread,

Mar 29, 2018, 4:11:15 PM3/29/18

to syzbot, core...@netfilter.org, da...@davemloft.net, f...@strlen.de, ho...@verge.net.au, kad...@blackhole.kfki.hu, linux-...@vger.kernel.org, lvs-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, wen...@linux-vs.org

Hello,

On Thu, 29 Mar 2018, syzbot wrote:

> syzbot hit the following crash on net-next commit
> 5d22d47b9ed96eddb35821dc2cc4f629f45827f7 (Tue Mar 27 17:33:21 2018 +0000)
> Merge branch 'sfc-filter-locking'
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6119456711376896
> Kernel config: https://syzkaller.appspot.com/x/.config?id=4372867303600475372
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5fe074...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
>
> IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id

> IPVS: = 0

> IPVS: stopping backup sync thread 25415 ...
> INFO: task syz-executor7:25421 blocked for more than 120 seconds.

I think, I know what happens: start_sync_thread holds
rtnl_lock while calling kthread_stop on error. The backup kthread is
signalled and it tries to rtnl_lock in sock_release => lockup.
I think, my fix will solve both reports, only that there is no
reproducer for this one to test the fix with test command.

Above is start_sync_thread() waiting kthread to stop...

> 1 lock held by syz-executor7/25435:
> #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
> 1 lock held by ipvs-b:2:0/25415:
> #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74

backup kthread needs rtnl_lock to stop...

Regards

--
Julian Anastasov <j...@ssi.bg>

syzbot

unread,

Mar 30, 2018, 12:46:02 PM3/30/18

to core...@netfilter.org, da...@davemloft.net, f...@strlen.de, ho...@verge.net.au, j...@ssi.bg, kad...@blackhole.kfki.hu, linux-...@vger.kernel.org, lvs-...@vger.kernel.org, net...@vger.kernel.org, netfilt...@vger.kernel.org, pa...@netfilter.org, syzkall...@googlegroups.com, wen...@linux-vs.org

syzbot has found reproducer for the following crash on net-next commit
18845557fd6fc1998f2d0d8c30467f86db587529 (Thu Mar 29 20:24:06 2018 +0000)
Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618

So far this crash happened 2 times on net-next.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5531070485233664
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=5632385408303104
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6520566391504896
Kernel config: https://syzkaller.appspot.com/x/.config?id=-37309782588693906

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5fe074...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

INFO: task syzkaller923914:4319 blocked for more than 120 seconds.
Not tainted 4.16.0-rc6+ #286

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

syzkaller923914 D23312 4319 4316 0x00000000

Call Trace:
context_switch kernel/sched/core.c:2862 [inline]
__schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
schedule+0xf5/0x430 kernel/sched/core.c:3499
schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
do_wait_for_common kernel/sched/completion.c:86 [inline]
__wait_for_common kernel/sched/completion.c:107 [inline]
wait_for_common kernel/sched/completion.c:118 [inline]
wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
kthread_stop+0x14a/0x7a0 kernel/kthread.c:530
stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996
do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253

udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2400

sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039
SYSC_setsockopt net/socket.c:1850 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1829
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7

RIP: 0033:0x4459d9
RSP: 002b:00007f1d6f47cdb8 EFLAGS: 00000297 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 00000000004459d9
RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000008
RBP: 00000000006dac20 R08: 0000000000000018 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000297 R12: 0000000000000000
R13: 00007fffcf128acf R14: 00007f1d6f47d9c0 R15: 0000000000000001
INFO: lockdep is turned off.

NMI backtrace for cpu 1

CPU: 1 PID: 869 Comm: khungtaskd Not tainted 4.16.0-rc6+ #286

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x90c/0xd60 kernel/hung_task.c:249
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0

CPU: 0 PID: 4317 Comm: syzkaller923914 Not tainted 4.16.0-rc6+ #286

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011

RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50
RSP: 0018:ffff8801b56df138 EFLAGS: 00000093
RAX: ffff8801cbaae640 RBX: 0000000000000000 RCX: ffffffff866a3971
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8801db218038
RBP: ffff8801b56df168 R08: ffff88021fff801c R09: ffff88021fff8008
R10: ffff88021fff8010 R11: ffff88021fff801d R12: ffff8801db218038
R13: 0000000000000040 R14: ffff8801db218038 R15: 00000000ffffffff
FS: 0000000001968880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001c94dc001 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
cpumask_next+0x24/0x30 lib/cpumask.c:21
select_idle_smt kernel/sched/fair.c:6116 [inline]
select_idle_sibling+0x86d/0xda0 kernel/sched/fair.c:6238
select_task_rq_fair+0xe0a/0x2910 kernel/sched/fair.c:6394
select_task_rq kernel/sched/core.c:1554 [inline]
try_to_wake_up+0x4ee/0x15f0 kernel/sched/core.c:2064
default_wake_function+0x30/0x50 kernel/sched/core.c:3693
autoremove_wake_function+0x78/0x350 kernel/sched/wait.c:377
__wake_up_common+0x18e/0x780 kernel/sched/wait.c:97
__wake_up_common_lock+0x1b4/0x310 kernel/sched/wait.c:125
__wake_up_sync_key+0x19/0x20 kernel/sched/wait.c:203
pipe_write+0x63b/0xd60 fs/pipe.c:477
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:482
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4459d9
RSP: 002b:00007fffcf128b48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000006dada0 RCX: 00000000004459d9
RDX: 0000000000000012 RSI: 00000000004adf28 RDI: 0000000000000001
RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dad8c
R13: 00000000006dada0 R14: 0000000000000005 R15: 0000000000000001
Code: ff 4c 89 e7 e8 62 b2 38 00 e9 32 fd ff ff 48 8b bd d0 fe ff ff e8 f1
b2 38 00 e9 c3 fc ff ff 90 90 90 90 90 90 90 90 90 90 90 90 <55> 65 48 8b
04 25 c0 ed 01 00 48 89 e5 65 8b 15 8c 20 90 7e 81

Dmitry Vyukov

unread,

Mar 30, 2018, 12:54:54 PM3/30/18

to syzbot, core...@netfilter.org, David Miller, Florian Westphal, ho...@verge.net.au, Julian Anastasov, Jozsef Kadlecsik, LKML, lvs-...@vger.kernel.org, netdev, netfilt...@vger.kernel.org, Pablo Neira Ayuso, syzkall...@googlegroups.com, wen...@linux-vs.org

On Fri, Mar 30, 2018 at 6:46 PM, syzbot
<syzbot+5fe074...@syzkaller.appspotmail.com> wrote:
> syzbot has found reproducer for the following crash on net-next commit

/\/\/\/\/\/\

now syzbot can test fixes

syzbot

unread,

Mar 30, 2018, 3:55:02 PM3/30/18

to j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine

Tested on net-next commit
c0b6edef0bf0e33c12eaf80c676ff09def011518 (Thu Mar 29 19:58:10 2018 +0000)
tc-testing: Add newline when writing test case files

compiler: gcc (GCC) 7.1.1 20170620

Patch: https://syzkaller.appspot.com/x/patch.diff?id=6101462140583936
Kernel config: https://syzkaller.appspot.com/x/.config?id=-37309782588693906
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6328338956156928

Dmitry Vyukov

unread,

Mar 31, 2018, 11:41:25 AM3/31/18

to Julian Anastasov, syzkall...@googlegroups.com, syzbot

On Fri, Mar 30, 2018 at 10:55 PM, Julian Anastasov <j...@ssi.bg> wrote:
>
> Hello Dmitry,
>
> I'm trying to test a patch but it tells me
> "no output from test machine". Should I worry? May be
> this is indication that the executed program hangs
> while locking some mutex? Here is the output from test
> command:

Hi Julian,

What this error means is that syzbot started running the reproducer on
the kernel, it seen some output (provided as "Raw console output"),
but then it seen no output on both console and from the program for 3+
minutes (while at least the program should have been printed some
output periodically). In such cases kernel _generally_ should
self-detect an issue as a stall/deadlock/hang/etc. However, sometimes
it does not. Usually it means there something wrong with the kernel.
Episodically it happens due to various infrastructure problems.
Unfortunately it's the nature of kernel testing, one can' even say
reliably if the kernel is still good or not.

Let's try it again and see what happens:

#syz test: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

If it happens again, then most likely reproducer still gets kernel
into some broken state.
Or, you can try to run the reproducer locally.

ipvs.patch

syzbot

unread,

Mar 31, 2018, 11:41:25 AM3/31/18

to Dmitry Vyukov, dvy...@google.com, j...@ssi.bg, syzkall...@googlegroups.com

> On Fri, Mar 30, 2018 at 10:55 PM, Julian Anastasov <j...@ssi.bg> wrote:

>> Hello Dmitry,

>> I'm trying to test a patch but it tells me
>> "no output from test machine". Should I worry? May be
>> this is indication that the executed program hangs
>> while locking some mutex? Here is the output from test
>> command:

> Hi Julian,

> What this error means is that syzbot started running the reproducer on
> the kernel, it seen some output (provided as "Raw console output"),
> but then it seen no output on both console and from the program for 3+
> minutes (while at least the program should have been printed some
> output periodically). In such cases kernel _generally_ should
> self-detect an issue as a stall/deadlock/hang/etc. However, sometimes
> it does not. Usually it means there something wrong with the kernel.
> Episodically it happens due to various infrastructure problems.
> Unfortunately it's the nature of kernel testing, one can' even say
> reliably if the kernel is still good or not.

> Let's try it again and see what happens:

> #syz test: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> master

"git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" does not look
like a valid git repo address.

Dmitry Vyukov

unread,

Mar 31, 2018, 11:43:33 AM3/31/18

to syzbot, Julian Anastasov, syzkall...@googlegroups.com

right

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

ipvs.patch

syzbot

unread,

Mar 31, 2018, 11:58:02 AM3/31/18

to dvy...@google.com, j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:

INFO: task hung in reg_check_chans_work

INFO: task kworker/1:1:23 blocked for more than 120 seconds.
Not tainted 4.16.0-rc7+ #81

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

kworker/1:1 D20360 23 2 0x80000000
Workqueue: events_power_efficient reg_check_chans_work

Call Trace:
context_switch kernel/sched/core.c:2862 [inline]
__schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
schedule+0xf5/0x430 kernel/sched/core.c:3499

schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3557
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
reg_check_chans_work+0xaa/0xff0 net/wireless/reg.c:2017
process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
worker_thread+0x223/0x1990 kernel/workqueue.c:2247

kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

Showing all locks held in the system:

3 locks held by kworker/1:1/23:
#0: ((wq_completion)"events_power_efficient"){+.+.}, at:
[<000000006331d0a2>] work_static include/linux/workqueue.h:198 [inline]
#0: ((wq_completion)"events_power_efficient"){+.+.}, at:
[<000000006331d0a2>] set_work_data kernel/workqueue.c:619 [inline]
#0: ((wq_completion)"events_power_efficient"){+.+.}, at:
[<000000006331d0a2>] set_work_pool_and_clear_pending kernel/workqueue.c:646
[inline]
#0: ((wq_completion)"events_power_efficient"){+.+.}, at:
[<000000006331d0a2>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
#1: ((reg_check_chans).work){+.+.}, at: [<000000000ee81609>]
process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
#2: (rtnl_mutex){+.+.}, at: [<0000000071b33009>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

2 locks held by khungtaskd/868:

#0: (rcu_read_lock){....}, at: [<00000000f2dfa387>]
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: (rcu_read_lock){....}, at: [<00000000f2dfa387>] watchdog+0x1c5/0xd60
kernel/hung_task.c:249
#1: (tasklist_lock){.+.+}, at: [<0000000066aa294f>]
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
3 locks held by kworker/1:3/2039:

#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:

[<000000006331d0a2>] work_static include/linux/workqueue.h:198 [inline]

#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:

[<000000006331d0a2>] set_work_data kernel/workqueue.c:619 [inline]

#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:

[<000000006331d0a2>] set_work_pool_and_clear_pending kernel/workqueue.c:646

[inline]
#0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:

[<000000006331d0a2>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
#1: ((addr_chk_work).work){+.+.}, at: [<000000000ee81609>]
process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
#2: (rtnl_mutex){+.+.}, at: [<0000000071b33009>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by rsyslogd/4358:
#0: (&f->f_pos_lock){+.+.}, at: [<00000000a23fd36d>]
__fdget_pos+0x12b/0x190 fs/file.c:765
2 locks held by getty/4448:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4449:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4450:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4451:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4452:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4453:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4454:
#0: (&tty->ldisc_sem){++++}, at: [<00000000058b38d6>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000002fb7d582>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by syz-executor5/4698:
#0: (net_mutex){+.+.}, at: [<00000000f0508473>] copy_net_ns+0x1f5/0x580
net/core/net_namespace.c:417
#1: (rtnl_mutex){+.+.}, at: [<0000000071b33009>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syz-executor3/4712:
#0: (ipvs->sync_mutex){+.+.}, at: [<0000000043742e84>]
do_ip_vs_set_ctl+0x277/0x1c90 net/netfilter/ipvs/ip_vs_ctl.c:2389
2 locks held by syz-executor3/4714:
#0: (rtnl_mutex){+.+.}, at: [<0000000071b33009>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
#1: (ipvs->sync_mutex){+.+.}, at: [<00000000017799ba>]
start_sync_thread+0x140/0x2b30 net/netfilter/ipvs/ip_vs_sync.c:1774
1 lock held by ipvs-b:2:0/4713:
#0: (rtnl_mutex){+.+.}, at: [<0000000071b33009>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

=============================================

NMI backtrace for cpu 0

CPU: 0 PID: 868 Comm: khungtaskd Not tainted 4.16.0-rc7+ #81

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x90c/0xd60 kernel/hung_task.c:249
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54

Tested on upstream commit
b5dbc28762fd3fd40ba76303be0c7f707826f982 (Sat Mar 31 04:53:57 2018 +0000)
Merge tag 'kbuild-fixes-v4.16-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

compiler: gcc (GCC) 7.1.1 20170620

Patch: https://syzkaller.appspot.com/x/patch.diff?id=6035708204548096
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=4796180642922496

Dmitry Vyukov

unread,

Mar 31, 2018, 12:12:13 PM3/31/18

to syzbot, Julian Anastasov, syzkall...@googlegroups.com

On Sat, Mar 31, 2018 at 5:58 PM, syzbot
<syzbot+5fe074...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer still triggered
> crash:
> INFO: task hung in reg_check_chans_work

This is now actionable, right?

I think I know what happens here. It's a bug/problem in kernel/hung_task.c:
https://github.com/google/syzkaller/issues/516#issuecomment-377703632
We really need quite large timeouts for everything because kernel is
slow due to lots of debug checks and insane things happen in the
fuzzer. Lower timeouts lead to flakes. We set task hung timeout to 2
mins and "no output" timeout to 3 mins. But due to implementation of
task hung, it can detect a hung only after 4 mins. And there are also
cpu stalls and rcu stalls. But we can't wait for too long too, because
these things happen all the time and we can't wait for a day just to
confirm that the machine is indeed broke. Ideally somebody need to go
and fix kernel/hung_task.c.

Julian Anastasov

unread,

Mar 31, 2018, 12:54:20 PM3/31/18

to Dmitry Vyukov, syzbot, syzkall...@googlegroups.com

Hello,

On Sat, 31 Mar 2018, Dmitry Vyukov wrote:

> On Sat, Mar 31, 2018 at 5:58 PM, syzbot
> <syzbot+5fe074...@syzkaller.appspotmail.com> wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer still triggered
> > crash:
> > INFO: task hung in reg_check_chans_work
>
> This is now actionable, right?
>
> I think I know what happens here. It's a bug/problem in kernel/hung_task.c:
> https://github.com/google/syzkaller/issues/516#issuecomment-377703632
> We really need quite large timeouts for everything because kernel is
> slow due to lots of debug checks and insane things happen in the
> fuzzer. Lower timeouts lead to flakes. We set task hung timeout to 2
> mins and "no output" timeout to 3 mins. But due to implementation of
> task hung, it can detect a hung only after 4 mins. And there are also
> cpu stalls and rcu stalls. But we can't wait for too long too, because
> these things happen all the time and we can't wait for a day just to
> confirm that the machine is indeed broke. Ideally somebody need to go
> and fix kernel/hung_task.c.

Thanks! I now understand what should mean "no output",
I'll investigate your findings. Also, I have to do my tests
properly, it seems I tested IPVS via netlink while the
reproducing program uses setsockopt interface.

BTW, my tests were with #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master

Dmitry Vyukov

unread,

Mar 31, 2018, 12:56:09 PM3/31/18

to Julian Anastasov, syzbot, syzkall...@googlegroups.com

I've used upstream tree in case there is something wrong with
net-next, and it's actually net-next causes "no output". If we would
get "no output" on different trees, it would be a stronger signal.

syzbot

unread,

Apr 1, 2018, 9:47:02 AM4/1/18

to j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+5fe074...@syzkaller.appspotmail.com

Note: the tag will also help syzbot to understand when the bug is fixed.

Tested on net-next commit
06b19fe9a6df7aaa423cd8404ebe5ac9ec4b2960 (Sun Apr 1 03:37:33 2018 +0000)
Merge branch 'chelsio-inline-tls'

compiler: gcc (GCC) 7.1.1 20170620

Patch: https://syzkaller.appspot.com/x/patch.diff?id=5969872664133632
Kernel config: https://syzkaller.appspot.com/x/.config?id=-37309782588693906

---
There is no WARRANTY for the result, to the extent permitted by applicable
law.
Except when otherwise stated in writing syzbot provides the result "AS IS"
without warranty of any kind, either expressed or implied, but not limited
to,
the implied warranties of merchantability and fittness for a particular
purpose.
The entire risk as to the quality of the result is with you. Should the
result
prove defective, you assume the cost of all necessary servicing, repair or
correction.

Julian Anastasov

unread,

Apr 1, 2018, 10:49:29 AM4/1/18

to Dmitry Vyukov, syzbot, syzkall...@googlegroups.com

I found the culprit, it is in IPVS. When start_sync_thread
is called in parallel, some tool can lock rtnl_mutex and then block
on sync_mutex. This prevents the kthreads to call sock_release
which block on rtnl_mutex. The solution was to use rtnl_trylock
and to restart the call. Thanks again!

syzbot

unread,

Apr 2, 2018, 4:27:02 PM4/2/18

to j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+5fe074...@syzkaller.appspotmail.com

Note: the tag will also help syzbot to understand when the bug is fixed.

Tested on net-next commit

159f02977b2feb18a4bece5e586c838a6d26d44b (Mon Apr 2 15:14:03 2018 +0000)
Merge branch 'net-mvneta-improve-suspend-resume'

compiler: gcc (GCC) 7.1.1 20170620

Patch: https://syzkaller.appspot.com/x/patch.diff?id=6213255039025152
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-2829566955235999811

syzbot

unread,

Apr 4, 2018, 3:49:02 PM4/4/18

to j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot tried to test the proposed patch but build/boot failed:

kernel build failed: failed to run /usr/bin/make [make bzImage -j 32
CC=/syzkaller/gcc/bin/gcc]: exit status 2
scripts/kconfig/conf --syncconfig Kconfig
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
DESCEND objtool
CHK include/generated/utsrelease.h
CC scripts/mod/empty.o
CC scripts/mod/devicetable-offsets.s
gcc: error: unrecognized command line option
‘-fsanitize-coverage=trace-cmp’; did you mean
‘-fsanitize-coverage=trace-pc’?
scripts/Makefile.build:312: recipe for target 'scripts/mod/empty.o' failed
make[2]: *** [scripts/mod/empty.o] Error 1
make[2]: *** Waiting for unfinished jobs....
gcc: error: unrecognized command line option
‘-fsanitize-coverage=trace-cmp’; did you mean
‘-fsanitize-coverage=trace-pc’?
scripts/Makefile.build:138: recipe for
target 'scripts/mod/devicetable-offsets.s' failed
make[2]: *** [scripts/mod/devicetable-offsets.s] Error 1
scripts/Makefile.build:546: recipe for target 'scripts/mod' failed
make[1]: *** [scripts/mod] Error 2
Makefile:1073: recipe for target 'scripts' failed
make: *** [scripts] Error 2
make: *** Waiting for unfinished jobs....
CC kernel/bounds.s
CHK include/generated/timeconst.h
gcc: error: unrecognized command line option
‘-fsanitize-coverage=trace-cmp’; did you mean
‘-fsanitize-coverage=trace-pc’?
Kbuild:21: recipe for target 'kernel/bounds.s' failed
make[1]: *** [kernel/bounds.s] Error 1
make[1]: *** Waiting for unfinished jobs....
Makefile:1109: recipe for target 'prepare0' failed
make: *** [prepare0] Error 2

Tested on net-next commit
17dec0a949153d9ac00760ba2f5b78cb583e995f (Wed Apr 4 02:15:32 2018 +0000)
Merge branch 'userns-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

compiler: gcc (GCC) 7.1.1 20170620

Patch: https://syzkaller.appspot.com/x/patch.diff?id=5791045526749184

Dmitry Vyukov

unread,

Apr 5, 2018, 2:34:38 PM4/5/18

to syzbot, Julian Anastasov, syzkall...@googlegroups.com

There turned out to be some make problem, distclean healed it. Let's try again:

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
master

> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c05dc5a5e950b05690b1d1f%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.

hung.txt

syzbot

unread,

Apr 5, 2018, 3:35:02 PM4/5/18

to dvy...@google.com, j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+5fe074...@syzkaller.appspotmail.com

Note: the tag will also help syzbot to understand when the bug is fixed.

Tested on

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master

commit
17dec0a949153d9ac00760ba2f5b78cb583e995f (Wed Apr 4 02:15:32 2018 +0000)
Merge branch 'userns-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

compiler: gcc (GCC) 8.0.1 20180301 (experimental)
Patch: https://syzkaller.appspot.com/x/patch.diff?id=5792616947908608
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-7654691824488752904

syzbot

unread,

Apr 5, 2018, 4:11:02 PM4/5/18

to j...@ssi.bg, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+5fe074...@syzkaller.appspotmail.com

Note: the tag will also help syzbot to understand when the bug is fixed.

Tested on net-next commit

17dec0a949153d9ac00760ba2f5b78cb583e995f (Wed Apr 4 02:15:32 2018 +0000)
Merge branch 'userns-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

compiler: gcc (GCC) 8.0.1 20180301 (experimental)

Patch: https://syzkaller.appspot.com/x/patch.diff?id=6626007972839424

Reply all

Reply to author

Forward

INFO: task hung in stop_sync_thread (2)

syzbot

syzbot

Julian Anastasov

syzbot

Dmitry Vyukov

syzbot

Dmitry Vyukov

syzbot

Dmitry Vyukov

syzbot

Dmitry Vyukov

Julian Anastasov

Dmitry Vyukov

syzbot

Julian Anastasov

syzbot

syzbot

Dmitry Vyukov

syzbot

syzbot