INFO: task hung in ip6gre_exit_batch_net

53 views
Skip to first unread message

syzbot

unread,
Jun 4, 2018, 11:03:28 AM6/4/18
to christia...@ubuntu.com, da...@davemloft.net, dsa...@gmail.com, f...@strlen.de, jb...@redhat.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, lucie...@gmail.com, msch...@universe-factory.net, net...@vger.kernel.org, syzkall...@googlegroups.com, vyas...@gmail.com
Hello,

syzbot found the following crash on:

HEAD commit: bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
kernel config: https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+bf78a7...@syzkaller.appspotmail.com

INFO: task kworker/u4:1:22 blocked for more than 120 seconds.
Not tainted 4.17.0-rc6+ #68
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:1 D13944 22 2 0x80000000
Workqueue: netns cleanup_net
Call Trace:
context_switch kernel/sched/core.c:2859 [inline]
__schedule+0x801/0x1e30 kernel/sched/core.c:3501
schedule+0xef/0x430 kernel/sched/core.c:3545
schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3603
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0xe38/0x17f0 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
ip6gre_exit_batch_net+0xd5/0x7d0 net/ipv6/ip6_gre.c:1585
ops_exit_list.isra.7+0x105/0x160 net/core/net_namespace.c:155
cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Showing all locks held in the system:
4 locks held by kworker/u4:1/22:
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at:
__write_once_size include/linux/compiler.h:215 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at: atomic64_set
include/asm-generic/atomic-instrumented.h:40 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at:
atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at: set_work_data
kernel/workqueue.c:617 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: 0000000049a7b590 ((wq_completion)"%s""netns"){+.+.}, at:
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
#1: 0000000030a00b6b (net_cleanup_work){+.+.}, at:
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
#2: 000000007eb35e65 (pernet_ops_rwsem){++++}, at: cleanup_net+0x11a/0xb20
net/core/net_namespace.c:490
#3: 000000007eb32c75 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
3 locks held by kworker/1:1/24:
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
__write_once_size include/linux/compiler.h:215 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
set_work_data kernel/workqueue.c:617 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: 000000001c9e6580 ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
#1: 000000009edcfbe7 ((addr_chk_work).work){+.+.}, at:
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
#2: 000000007eb32c75 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
2 locks held by khungtaskd/893:
#0: 000000007eeb621a (rcu_read_lock){....}, at:
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: 000000007eeb621a (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60
kernel/hung_task.c:249
#1: 00000000239f1b5e (tasklist_lock){.+.+}, at:
debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470
2 locks held by getty/4481:
#0: 00000000cc114025 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 000000006ad1f3fc (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4482:
#0: 00000000226a16cc (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 000000008cee8cdc (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4483:
#0: 0000000067bd3c39 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 000000005d8bc81d (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4484:
#0: 00000000f0f8d839 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000a9d5f091 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4485:
#0: 000000002c96ee9a (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 0000000033338ac7 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4486:
#0: 00000000f6db39b5 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000bb7c6099 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4487:
#0: 000000006be9659d (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000e2edd3d0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
3 locks held by kworker/1:3/27147:
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at:
__write_once_size include/linux/compiler.h:215 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at: atomic64_set
include/asm-generic/atomic-instrumented.h:40 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at: atomic_long_set
include/asm-generic/atomic-long.h:57 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at: set_work_data
kernel/workqueue.c:617 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: 00000000c208aa7f ((wq_completion)"events"){+.+.}, at:
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
#1: 00000000fa87e61f (deferred_process_work){+.+.}, at:
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
#2: 000000007eb32c75 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syz-executor7/29665:
#0: 000000006e20d618 (sk_lock-AF_INET6){+.+.}, at: lock_sock
include/net/sock.h:1469 [inline]
#0: 000000006e20d618 (sk_lock-AF_INET6){+.+.}, at:
tls_sw_sendmsg+0x1b9/0x12e0 net/tls/tls_sw.c:384
1 lock held by syz-executor7/29689:
#0: 000000007eb32c75 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 893 Comm: khungtaskd Not tainted 4.17.0-rc6+ #68
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0xc10/0xf60 kernel/hung_task.c:249
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.

Dmitry Vyukov

unread,
Jun 4, 2018, 11:23:16 AM6/4/18
to syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, Kirill Tkhai, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
On Mon, Jun 4, 2018 at 5:03 PM, syzbot
<syzbot+bf78a7...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
> kernel config: https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+bf78a7...@syzkaller.appspotmail.com

Another hang on rtnl lock:

#syz dup: INFO: task hung in netdev_run_todo

May be related to "unregister_netdevice: waiting for DEV to become free":
https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9

Any other explanations for massive hangs on rtnl lock for minutes?
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/0000000000006e4595056dd23c16%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Kirill Tkhai

unread,
Jun 5, 2018, 5:03:33 AM6/5/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
Hi, Dmirty!

On 04.06.2018 18:22, Dmitry Vyukov wrote:
> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
> <syzbot+bf78a7...@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bf78a7...@syzkaller.appspotmail.com
>
> Another hang on rtnl lock:
>
> #syz dup: INFO: task hung in netdev_run_todo
>
> May be related to "unregister_netdevice: waiting for DEV to become free":
> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9
>
> Any other explanations for massive hangs on rtnl lock for minutes?

To exclude the situation, when a task exists with rtnl_mutex held:

would the pr_warn() from print_held_locks_bug() be included in the console output
if they appear?

Kirill

Dmitry Vyukov

unread,
Jun 5, 2018, 5:36:52 AM6/5/18
to Kirill Tkhai, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
On Tue, Jun 5, 2018 at 11:03 AM, Kirill Tkhai <ktk...@virtuozzo.com> wrote:
> Hi, Dmirty!
>
> On 04.06.2018 18:22, Dmitry Vyukov wrote:
>> On Mon, Jun 4, 2018 at 5:03 PM, syzbot
>> <syzbot+bf78a7...@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit: bc2dbc5420e8 Merge branch 'akpm' (patches from Andrew)
>>> git tree: upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=164e42b7800000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=982e2df1b9e60b02
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf78a74f82c1cf19069e
>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+bf78a7...@syzkaller.appspotmail.com
>>
>> Another hang on rtnl lock:
>>
>> #syz dup: INFO: task hung in netdev_run_todo
>>
>> May be related to "unregister_netdevice: waiting for DEV to become free":
>> https://syzkaller.appspot.com/bug?id=1a97a5bd119fd97995f752819fd87840ab9479a9

netdev_wait_allrefs does not hold rtnl lock during waiting, so it must
be something different.


>> Any other explanations for massive hangs on rtnl lock for minutes?
>
> To exclude the situation, when a task exists with rtnl_mutex held:
>
> would the pr_warn() from print_held_locks_bug() be included in the console output
> if they appear?

Yes, everything containing "WARNING:" is detected as bug.

Kirill Tkhai

unread,
Jun 5, 2018, 9:55:35 AM6/5/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
OK, then dead task not releasing the lock is excluded.

One more assumption: someone corrupted memory around rtnl_mutex and it looks like locked.
(I track lockdep "(rtnl_mutex){+.+.}" prints in initial message as "nobody owns rtnl_mutex").
There may help a crash dump of the VM.

Also, there may be a locking code BUG, but this seems the least probable for me.

Kirill

Dmitry Vyukov

unread,
Jun 7, 2018, 2:23:35 PM6/7/18
to Kirill Tkhai, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
I can't find any legend for these +'s and .'s, but {+.+.} is present
in large amounts in just any task hung report for different mutexes,
so I would not expect that it means corruption.

Are dozens of known corruptions that syzkaller can trigger. But
usually they are reliably caught by KASAN. If any of them would lead
to silent memory corruption, we would got dozens of assorted crashes
throughout the kernel. We've seen that at some points, but not
recently. So I would assume that memory is not corrupted in all these
cases:
https://syzkaller.appspot.com/bug?id=2503c576cabb08d41812e732b390141f01a59545

I wonder if it can be just that slow, but not actually hanged... net
namespace destruction is super slow, so perhaps under heavy load it
all stalls for minutes...

Kirill Tkhai

unread,
Jun 7, 2018, 2:55:05 PM6/7/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
This BUG clarifies the {+.+.}:

4 locks held by kworker/0:145/381:
#0: ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] work_static include/linux/workqueue.h:198 [inline]
#0: ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] set_work_data kernel/workqueue.c:619 [inline]
#0: ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline]
#0: ((wq_completion)"hwsim_wq"){+.+.}, at: [<000000003f9487f0>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
#1: ((work_completion)(&data->destroy_work)){+.+.}, at: [<00000000bbdd2115>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
#2: (rtnl_mutex){+.+.}, at: [<000000009c9d14f8>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
#3: (rcu_sched_state.exp_mutex){+.+.}, at: [<000000001ba1a807>] exp_funnel_lock kernel/rcu/tree_exp.h:272 [inline]
#3: (rcu_sched_state.exp_mutex){+.+.}, at: [<000000001ba1a807>] _synchronize_rcu_expedited.constprop.72+0x9fa/0xac0 kernel/rcu/tree_exp.h:596

There we have rtnl_mutex locked and the {..} is like above. It's definitely locked
since there is one more lock after it.

This BUG happen because of there are many rtnl_mutex waiters while owner
is synchronizing RCU. Rather clear for me in comparison to the topic's hung.

> I wonder if it can be just that slow, but not actually hanged... net
> namespace destruction is super slow, so perhaps under heavy load it
> all stalls for minutes...

Thanks,
Kirill

Dmitry Vyukov

unread,
Jun 7, 2018, 3:03:56 PM6/7/18
to Kirill Tkhai, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
You mean that it's not hanged, but rather needs more than 2 minutes to
complete, right?

Kirill Tkhai

unread,
Jun 7, 2018, 4:00:03 PM6/7/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
Yeah, I think, this is possible. I've seen the situations like that.
Let synchronize_rcu_expedited() is executed for X seconds. Then,
it's need just 120/x calls of "this function under rtnl_mutex" to make
a soft lockup of someone else who wants the mutex too.

Also, despite the CFS is fair scheduler, in case of the calls are
made from workqueue, every work will cause sleep. So, every work
will be executed in separate worker task. Every worker will haved its
own time slice. This increases the probability these tasks will
take cpu time before the task in the header of the hang.

Kirill

Dmitry Vyukov

unread,
Jun 8, 2018, 4:18:40 AM6/8/18
to Kirill Tkhai, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
OK, let's stick with this theory for now. Looking at the crash frequencies here:
https://syzkaller.appspot.com/bug?id=2503c576cabb08d41812e732b390141f01a59545
I can actually believe that this is just flakes due to too slow execution.

I've noted that we need either reduce load and/or increase timeouts:
https://github.com/google/syzkaller/issues/516#issuecomment-395685629

Let's keep duping new such reports onto "INFO: task hung in
netdev_run_todo" so that they are all collected at a single location.

Thanks for help

Kirill Tkhai

unread,
Jun 8, 2018, 4:31:45 AM6/8/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
Hm, I'm not sure we should hide such the situations from syzbot,
because the load like this may occur in real life on real workload.
They may help us to understand whether rtnl_mutex already needs
a redesign came from this statistics. Also, these hungs may happen
in a place, which can be rewritten without rtnl_mutex, so we focus
attention on it.

> Let's keep duping new such reports onto "INFO: task hung in
> netdev_run_todo" so that they are all collected at a single location.
>
> Thanks for help

Thanks,
Kirill

Dmitry Vyukov

unread,
Jun 8, 2018, 4:38:28 AM6/8/18
to Kirill Tkhai, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
If somebody wants to act on these reports:
https://syzkaller.appspot.com/bug?id=2503c576cabb08d41812e732b390141f01a59545
it's even better. The point is that testing must not have false
positives, one way or another. If we do nothing then syzbot will
slowly discover all 250 usages of rtnl_lock and produce unique bugs
for them. Each and every of these bug reports will need to handled by
somebody.

Does somebody want to act on these and improve rtnl performance in
foreseeable future?

Kirill Tkhai

unread,
Jun 8, 2018, 4:43:49 AM6/8/18
to Dmitry Vyukov, syzbot, Christian Brauner, David Miller, David Ahern, Florian Westphal, Jiri Benc, LKML, Xin Long, msch...@universe-factory.net, netdev, syzkaller-bugs, Vladislav Yasevich
I just analyzed this question a little bit, and it looks like only
the preparations for improving the performance will take much much time.
And the performance won't change till preparations are finished.
So, this looks like "not a foreseeable future".

Kirill
Reply all
Reply to author
Forward
0 new messages