net: suspicious RCU usage in nf_hook

38 views
Skip to first unread message

Dmitry Vyukov

unread,
Jan 27, 2017, 4:16:18 PM1/27/17
to David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Hello,

I've got the following report while running syzkaller fuzzer on
fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1:

[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: scheduling while atomic: syz-executor14/23111/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Kernel panic - not syncing: scheduling while atomic

Cong Wang

unread,
Jan 27, 2017, 6:23:05 PM1/27/17
to Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> stack backtrace:
> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:15 [inline]
> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
> rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
> ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
> __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
> mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
> atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
> __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
> static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
> net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
> sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
> __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
> sk_destruct+0x47/0x80 net/core/sock.c:1460

jump label uses a mutex and we call jump label API in softIRQ context...
Maybe we have to move it to a work struct as what we did for netlink.

Cong Wang

unread,
Jan 27, 2017, 6:31:12 PM1/27/17
to Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Correct myself: process context but with RCU read lock held...

Eric Dumazet

unread,
Jan 27, 2017, 6:35:27 PM1/27/17
to Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Oh well, I forgot to submit the official patch I think, Jan 9th.

https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ





Cong Wang

unread,
Jan 27, 2017, 8:00:46 PM1/27/17
to Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> Oh well, I forgot to submit the official patch I think, Jan 9th.
>
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>

Hmm, but why only fragments need skb_orphan()? It seems like
any kfree_skb() inside a nf hook needs to have a preceding
skb_orphan().

Also, I am not convinced it is similar to commit 8282f27449bf15548
which is on RX path.

Eric Dumazet

unread,
Jan 27, 2017, 8:31:33 PM1/27/17
to Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Well, we clearly see IPv6 reassembly being part of the equation in both
cases.

I was replying to first part of the splat [1], which was already
diagnosed and had a non official patch.

use after free is also a bug, regardless of jump label being used or
not.

I still do not really understand this nf_hook issue, I thought we were
disabling BH in netfilter.

So the in_interrupt() check in net_disable_timestamp() should trigger,
this was the intent of netstamp_needed_deferred existence.

Not sure if we can test for rcu_read_lock() as well.

[1]

Cong Wang

unread,
Jan 31, 2017, 1:20:13 AM1/31/17
to Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Fri, Jan 27, 2017 at 5:31 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
>> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
>> > Oh well, I forgot to submit the official patch I think, Jan 9th.
>> >
>> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>> >
>>
>> Hmm, but why only fragments need skb_orphan()? It seems like
>> any kfree_skb() inside a nf hook needs to have a preceding
>> skb_orphan().
>
>
>>
>> Also, I am not convinced it is similar to commit 8282f27449bf15548
>> which is on RX path.
>
> Well, we clearly see IPv6 reassembly being part of the equation in both
> cases.

Yeah, of course. My worry is that this problem is more than just
IPv6 reassembly.

>
> I was replying to first part of the splat [1], which was already
> diagnosed and had a non official patch.
>
> use after free is also a bug, regardless of jump label being used or
> not.
>
> I still do not really understand this nf_hook issue, I thought we were
> disabling BH in netfilter.

It is a different warning from use-after-free, this one is about sleep
in atomic context, mutex lock is acquired with RCU read lock held.


>
> So the in_interrupt() check in net_disable_timestamp() should trigger,
> this was the intent of netstamp_needed_deferred existence.
>
> Not sure if we can test for rcu_read_lock() as well.
>

The context is process context (TX path before hitting qdisc), and
BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
makes me thinking maybe we really need to disable BH in this
case for nf_hook()? But it is called in RX path too, and BH is
already disabled there.

Eric Dumazet

unread,
Jan 31, 2017, 10:45:00 AM1/31/17
to Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:

>
> The context is process context (TX path before hitting qdisc), and
> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
> makes me thinking maybe we really need to disable BH in this
> case for nf_hook()? But it is called in RX path too, and BH is
> already disabled there.

ipt_do_table() and similar netfilter entry points disable BH.

Maybe it is done too late.



Cong Wang

unread,
Feb 1, 2017, 3:52:07 PM2/1/17
to Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
I think we need a fix like the following one for minimum impact.

diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..eee7d63 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
void net_disable_timestamp(void)
{
#ifdef HAVE_JUMP_LABEL
- if (in_interrupt()) {
- atomic_inc(&netstamp_needed_deferred);
- return;
- }
-#endif
+ atomic_inc(&netstamp_needed_deferred);
+#else
static_key_slow_dec(&netstamp_needed);
+#endif
}
EXPORT_SYMBOL(net_disable_timestamp);

Eric Dumazet

unread,
Feb 1, 2017, 4:16:13 PM2/1/17
to Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
This would permanently leave the kernel in the netstamp_needed state.

I would prefer the patch using a process context to perform the
cleanup ? Note there is a race window, but probably not a big deal.

net/core/dev.c | 30 ++++++++++--------------------
1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7f218e095361520d11c243d650e053321ea7274f..1cae681b6cfd1cf2c9bee7072eb8af9cf79cced8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1695,37 +1695,27 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);

static struct static_key netstamp_needed __read_mostly;
#ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
static atomic_t netstamp_needed_deferred;
-#endif
-
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
{
-#ifdef HAVE_JUMP_LABEL
int deferred = atomic_xchg(&netstamp_needed_deferred, 0);

- if (deferred) {
- while (--deferred)
- static_key_slow_dec(&netstamp_needed);
- return;
- }
+ while (deferred--)
+ static_key_slow_dec(&netstamp_needed);
+}
+static DECLARE_WORK(netstamp_work, netstamp_clear);
#endif
+
+void net_enable_timestamp(void)
+{
static_key_slow_inc(&netstamp_needed);
}
EXPORT_SYMBOL(net_enable_timestamp);

void net_disable_timestamp(void)
{
-#ifdef HAVE_JUMP_LABEL
- if (in_interrupt()) {
- atomic_inc(&netstamp_needed_deferred);
- return;
- }
-#endif
- static_key_slow_dec(&netstamp_needed);
+ atomic_inc(&netstamp_needed_deferred);
+ schedule_work(&netstamp_work);
}
EXPORT_SYMBOL(net_disable_timestamp);



Eric Dumazet

unread,
Feb 1, 2017, 4:22:45 PM2/1/17
to Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Wed, 2017-02-01 at 13:16 -0800, Eric Dumazet wrote:

> This would permanently leave the kernel in the netstamp_needed state.
>
> I would prefer the patch using a process context to perform the
> cleanup ? Note there is a race window, but probably not a big deal.
>
> net/core/dev.c | 30 ++++++++++--------------------
> 1 file changed, 10 insertions(+), 20 deletions(-)

Patch is not complete (for the HAVE_JUMP_LABEL=n case)

Would you like to author/submit it ?

Thanks.


Cong Wang

unread,
Feb 1, 2017, 6:30:15 PM2/1/17
to Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Not sure if it is better. The difference is caught up in net_enable_timestamp(),
which is called setsockopt() path and sk_clone() path, so we could be
in netstamp_needed state for a long time too until user-space exercises
these paths.

I am feeling we probably need to get rid of netstamp_needed_deferred,
and simply defer the whole static_key_slow_dec(), like the attached patch
(compile only).

What do you think?
net-timestamp.diff

Eric Dumazet

unread,
Feb 1, 2017, 6:48:35 PM2/1/17
to Cong Wang, Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.w...@gmail.com> wrote:

> Not sure if it is better. The difference is caught up in net_enable_timestamp(),
> which is called setsockopt() path and sk_clone() path, so we could be
> in netstamp_needed state for a long time too until user-space exercises
> these paths.
>
> I am feeling we probably need to get rid of netstamp_needed_deferred,
> and simply defer the whole static_key_slow_dec(), like the attached patch
> (compile only).
>
> What do you think?

I think we need to keep the atomic.

If two cpus call net_disable_timestamp() roughly at the same time, the
work will be scheduled once.

Eric Dumazet

unread,
Feb 1, 2017, 6:59:46 PM2/1/17
to Eric Dumazet, Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
Updated patch (but not tested yet)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7f218e095361520d11c243d650e053321ea7274f..29101c98399f40b6b8e42c31a255d8f1fb6bd7a1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1695,24 +1695,19 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
@@ -1720,12 +1715,12 @@ EXPORT_SYMBOL(net_enable_timestamp);
void net_disable_timestamp(void)
{
#ifdef HAVE_JUMP_LABEL
- if (in_interrupt()) {
- atomic_inc(&netstamp_needed_deferred);
- return;
- }
-#endif
+ /* net_disable_timestamp() can be called from non process context */
+ atomic_inc(&netstamp_needed_deferred);
+ schedule_work(&netstamp_work);
+#else
static_key_slow_dec(&netstamp_needed);
+#endif
}
EXPORT_SYMBOL(net_disable_timestamp);



Cong Wang

unread,
Feb 2, 2017, 1:01:23 PM2/2/17
to Eric Dumazet, Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilt...@vger.kernel.org, syzkaller
On Wed, Feb 1, 2017 at 3:59 PM, Eric Dumazet <eric.d...@gmail.com> wrote:
> On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote:
>> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.w...@gmail.com> wrote:
>>
>> > Not sure if it is better. The difference is caught up in net_enable_timestamp(),
>> > which is called setsockopt() path and sk_clone() path, so we could be
>> > in netstamp_needed state for a long time too until user-space exercises
>> > these paths.
>> >
>> > I am feeling we probably need to get rid of netstamp_needed_deferred,
>> > and simply defer the whole static_key_slow_dec(), like the attached patch
>> > (compile only).
>> >
>> > What do you think?
>>
>> I think we need to keep the atomic.
>>
>> If two cpus call net_disable_timestamp() roughly at the same time, the
>> work will be scheduled once.

Good point! Yeah, the same work will not be schedule twice.

>
> Updated patch (but not tested yet)

I can't think out a better way to fix this. I expect jump_label to provide
an API for this, but it doesn't, static_key_slow_dec_deferred()
is just for batching. Probably we should introduce one to avoid these
ugly #ifdef HAVE_JUMP_LABEL here, but that is a -next material.

So, please feel free to send it formally.

Thanks.
Reply all
Reply to author
Forward
0 new messages