Grupos de Google ya no admite publicaciones ni suscripciones nuevas de Usenet. El contenido anterior sigue visible.

Deleting child qdisc doesn't reset parent to default qdisc?

797 vistas
Ir al primer mensaje no leído

Jiri Kosina

no leída,
14 abr 2016, 10:50:08 a.m.14/4/2016
para
Hi,

I've came across the behavior where adding a child qdisc and then deleting
it again makes the networking dysfunctional (I guess that's because all of
a sudden there is absolutely no working qdisc on the device, although
there originally was a default one in the parent).

In a nutshell, is this expected behavior or bug?

=====
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms

jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024

jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms

jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
[ ... nothing happens ... ]
^C
jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.66 ms
=====

Thanks,

--
Jiri Kosina

Jiri Kosina

no leída,
14 abr 2016, 11:00:12 a.m.14/4/2016
para
On Thu, 14 Apr 2016, Jiri Kosina wrote:

> In a nutshell, is this expected behavior or bug?

Just to clarify what seems to suggest to me that this is rather a bug that
needs to be fixed (but apparently one that has been there for quite a long
time) can be demonstrated by this:

>
> =====
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms

The above configuration works.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
>
> jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024
>
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
>
> jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms

The above configuration doesn't although it's identical to the working one
at the beginning.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> [ ... nothing happens ... ]
> ^C

--
Jiri Kosina
SUSE Labs

Eric Dumazet

no leída,
14 abr 2016, 11:10:11 a.m.14/4/2016
para
On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> Hi,
>
> I've came across the behavior where adding a child qdisc and then deleting
> it again makes the networking dysfunctional (I guess that's because all of
> a sudden there is absolutely no working qdisc on the device, although
> there originally was a default one in the parent).
>
> In a nutshell, is this expected behavior or bug?

This is the expected behavior.

If the kernel was suddenly doing a 'replace' when you ask a delete,
then the scripts doing a delete , than a add would break.

tc users are skilled admins ;)

Phil Sutter

no leída,
14 abr 2016, 11:20:07 a.m.14/4/2016
para
On Thu, Apr 14, 2016 at 08:01:39AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> > Hi,
> >
> > I've came across the behavior where adding a child qdisc and then deleting
> > it again makes the networking dysfunctional (I guess that's because all of
> > a sudden there is absolutely no working qdisc on the device, although
> > there originally was a default one in the parent).
> >
> > In a nutshell, is this expected behavior or bug?
>
> This is the expected behavior.

OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
one upon deletion instead of noop_qdisc, hence I would describe
the situation using the words 'inconsistent' and 'accident' rather than
'expected'. :)

Anyhow, the problem with skilled admins is they accept quirks too easily
and just build their scripts around them - the same scripts we have to
keep compatible to then.

Cheers, Phil

Jiri Kosina

no leída,
14 abr 2016, 11:40:06 a.m.14/4/2016
para
On Thu, 14 Apr 2016, Phil Sutter wrote:

> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Exactly. I'd again like to stress the fact that this configuration works:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms

and this (after performing add/delete operation) doesn't:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms

It's hard to spot a difference (hint: there is none).

Thanks,

Eric Dumazet

no leída,
14 abr 2016, 11:50:08 a.m.14/4/2016
para
This is because some qdisc are not visible in the dump.


qdisc_list_add() uses a single list, so adding too much stuff in it
could slow down fast path (qdisc_lookup(), called from
qdisc_tree_reduce_backlog())

Jiri Kosina

no leída,
14 abr 2016, 12:10:07 p.m.14/4/2016
para
On Thu, 14 Apr 2016, Phil Sutter wrote:

> > > I've came across the behavior where adding a child qdisc and then deleting
> > > it again makes the networking dysfunctional (I guess that's because all of
> > > a sudden there is absolutely no working qdisc on the device, although
> > > there originally was a default one in the parent).
> > >
> > > In a nutshell, is this expected behavior or bug?
> >
> > This is the expected behavior.
>
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Would a patch that'd unify this in a sense that all qdiscs would assign
the default one upon deletion acceptable?

Phil Sutter

no leída,
14 abr 2016, 12:30:07 p.m.14/4/2016
para
On Thu, Apr 14, 2016 at 08:44:40AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> > On Thu, 14 Apr 2016, Phil Sutter wrote:
> >
> > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > > one upon deletion instead of noop_qdisc, hence I would describe
> > > the situation using the words 'inconsistent' and 'accident' rather than
> > > 'expected'. :)
> >
> > Exactly. I'd again like to stress the fact that this configuration works:
> >
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> >
> > and this (after performing add/delete operation) doesn't:
> >
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms
> >
> > It's hard to spot a difference (hint: there is none).
>
> This is because some qdisc are not visible in the dump.

And those being invisible can be overridden using 'tc qd add', right?
AFAIR they're not listed because they don't properly register, so the
system doesn't care to override them. In this case we could change all
classful qdiscs to restore the default qdisc if a leaf qdisc is being
deleted instead of noop (which is probably not what the user wants
anyway).

Cheers, Phil

Eric Dumazet

no leída,
14 abr 2016, 12:51:25 p.m.14/4/2016
para
On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote:

> And those being invisible can be overridden using 'tc qd add', right?
> AFAIR they're not listed because they don't properly register, so the
> system doesn't care to override them. In this case we could change all
> classful qdiscs to restore the default qdisc if a leaf qdisc is being
> deleted instead of noop (which is probably not what the user wants
> anyway).

Even if they properly register, they are not visible.

Take a look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95dc19299f741c986227ec33e23cbf9b3321f812

for some context.

When a default pfifo is created on say a HTB class, you do not see it by
default in a dump.

If you have 100 HTB classes, HTB created 100 pfifo just fine, and it
works, unless an admin tries to delete them maybe ;)

Eric Dumazet

no leída,
14 abr 2016, 1:50:08 p.m.14/4/2016
para
And what would be the chosen behavior ?

Relying on TBF installing a bfifo for you at delete would be hazardous.

For example CBQ got it differently than HFSC

If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(

At least always using noop_qdisc is consistent. No magic there.

Doing 'unification' right now would break existing scripts.

This is too late, I am afraid.

Jamal Hadi Salim

no leída,
15 abr 2016, 8:50:06 a.m.15/4/2016
para
On 16-04-14 01:49 PM, Eric Dumazet wrote:

> And what would be the chosen behavior ?
>

TBF is probably a bad example because it started life as
a classless qdisc. There was only one built-in fifo queue
that was shaped. Then someone made it classful and changed
this behavior. To me it sounds reasonable to have the
default behavior restored. At minimal consistency.

> Relying on TBF installing a bfifo for you at delete would be hazardous.
>
> For example CBQ got it differently than HFSC
>
> If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
> falls back to noop_qdisc, without warning the user :(
>
> At least always using noop_qdisc is consistent. No magic there.
>
> Doing 'unification' right now would break existing scripts.
>
> This is too late, I am afraid.


Sigh. So rant:
IMO, we should let any new APIs and API updates stay longer
in discussion. Or better mark them as unstable for sometime.
The excuse that "it is out in the wild therefore cant be changed"
is harmful because the timeline is "forever" whereas
patches are applied after a short period of posting
and discussions and sometimes not involving the right people.
It is like having a jury issuing a death sentence after 1 week of
deliberation. You cant take it back after execution.

cheers,
jamal

Eric Dumazet

no leída,
15 abr 2016, 11:00:06 a.m.15/4/2016
para
On Fri, 2016-04-15 at 08:42 -0400, Jamal Hadi Salim wrote:
> On 16-04-14 01:49 PM, Eric Dumazet wrote:
>
> > And what would be the chosen behavior ?
> >
>
> TBF is probably a bad example because it started life as
> a classless qdisc. There was only one built-in fifo queue
> that was shaped. Then someone made it classful and changed
> this behavior. To me it sounds reasonable to have the
> default behavior restored. At minimal consistency.


Then you need to save the initial qdisc (bfifo for TBF) in a special
place, to make sure the delete operation is guaranteed to succeed.

Or fail the delete if the bfifo can not be allocated.

I can tell that determinism if far more interesting than usability for
some users occasionally playing with tc.

Surely the silent fallback to noop_qdisc is wrong.

Anyway, we probably need to improve our ability to understand qdisc
hierarchies. Having some hidden qdiscs is the real problem here.

We need to add some hash table so that qdisc_match_from_root() does not
have to scan hundred of qdiscs.

David Miller

no leída,
15 abr 2016, 1:20:08 p.m.15/4/2016
para
From: Eric Dumazet <eric.d...@gmail.com>
Date: Fri, 15 Apr 2016 07:58:48 -0700

> Having some hidden qdiscs is the real problem here.

+1

Jiri Kosina

no leída,
28 jun 2016, 11:20:06 a.m.28/6/2016
para
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> > TBF is probably a bad example because it started life as a classless
> > qdisc. There was only one built-in fifo queue that was shaped. Then
> > someone made it classful and changed this behavior. To me it sounds
> > reasonable to have the default behavior restored. At minimal
> > consistency.
>
>
> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
>
> Or fail the delete if the bfifo can not be allocated.
>
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.

BTW, I've started to actually work on fixing this, and I've noticed that
TBF behavior actually violates what's stated in pfifo_fast manpage:

==========
Whenever an interface is created, the pfifo_fast qdisc is
automatically used as a queue. If another qdisc is
attached, it preempts the default pfifo_fast, which automatically
returns to function when an existing qdisc is detached.

In this sense this qdisc is magic, and unlike other qdiscs.
==========

Jiri Kosina

no leída,
28 jun 2016, 12:50:06 p.m.28/6/2016
para
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
>
> Or fail the delete if the bfifo can not be allocated.
>
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.
>
> Surely the silent fallback to noop_qdisc is wrong.

So before we go further and fix the fact that we actually do have hidden
qdiscs (by refactoring qdisc_match_from_root() and friends), I'd still
like to bring the patch below up for consideration.

Thanks.




From: Jiri Kosina <jko...@suse.cz>
Subject: [PATCH] sch_tbf: avoid silent fallback to noop_qdisc

TBF started its life as a classless qdisc with a single builtin FIFO queue
which was being shaped.

When it got later turned into classful qdisc, it was written in a way that
the fallback qdisc was noop_qdisc, which produces bad user experience (delete
of last manually added class doesn't reset it to initial default, but renders
networking unusable instead).

Switch the default fallback to bfifo; this also mimics how the other guys
(HTB, HFSC, CBQ, ...) are behaving.

Signed-off-by: Jiri Kosina <jko...@suse.cz>
---
net/sched/sch_tbf.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 3161e49..b06dffe 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -508,8 +508,12 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
{
struct tbf_sched_data *q = qdisc_priv(sch);

- if (new == NULL)
- new = &noop_qdisc;
+ if (new == NULL) {
+ /* reset to default qdisc */
+ new = qdisc_create_dflt(sch->dev_queue, &bfifo_qdisc_ops, sch->parent);
+ if (!new)
+ return -ENOBUFS;
+ }

*old = qdisc_replace(sch, new, &q->qdisc);
return 0;

Cong Wang

no leída,
28 jun 2016, 1:30:06 p.m.28/6/2016
para
On Tue, Jun 28, 2016 at 8:19 AM, Jiri Kosina <ji...@kernel.org> wrote:
> BTW, I've started to actually work on fixing this, and I've noticed that
> TBF behavior actually violates what's stated in pfifo_fast manpage:
>
> ==========
> Whenever an interface is created, the pfifo_fast qdisc is
> automatically used as a queue. If another qdisc is
> attached, it preempts the default pfifo_fast, which automatically
> returns to function when an existing qdisc is detached.
>
> In this sense this qdisc is magic, and unlike other qdiscs.
> ==========

It is out of date, now default qdisc can be set to any other qdisc
via /proc. Also, probably due to historical reasons, we don't have
a unified default default qdisc, some uses bfifo, some uses pfifo,
we may break some existing script if we change that.

Jiri Kosina

no leída,
28 jun 2016, 1:40:06 p.m.28/6/2016
para
While I do understand that reasoning, I'd argue that unpredictable and
unexpected behavior of TBF causing systems with non-working networking is
much more likely than any userspace having hard dependency on the fact
that default (*) qdisc for TBF is noop.

(*) where 'default upon creation' != 'default when reset'

Thanks,

Jiri Kosina

no leída,
7 jul 2016, 5:10:06 a.m.7/7/2016
para
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Anyway, we probably need to improve our ability to understand qdisc
> hierarchies. Having some hidden qdiscs is the real problem here.
>
> We need to add some hash table so that qdisc_match_from_root() does not
> have to scan hundred of qdiscs.

So how about something like the patch below? I already have preliminary
patches on top which unhide the default qdiscs, but let's make this one
step after the other.

Thanks.




From: Jiri Kosina <jko...@suse.cz>
Subject: [PATCH] net: sched: convert qdisc linked list to hashtable

Convert the per-device linked list into a hashtable. The primary motivation
for this change is that currently, we're not tracking all the qdiscs in
hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
linked list by qdisc_match_from_root() is rather expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will bring
much more determinism in user experience.

Signed-off-by: Jiri Kosina <jko...@suse.cz>
---
include/linux/netdevice.h | 2 ++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 1 +
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 6 +++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
#include <uapi/linux/netdevice.h>
#include <uapi/linux/if_bonding.h>
#include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>

struct netpoll_info;
struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned int num_tx_queues;
unsigned int real_num_tx_queues;
struct Qdisc *qdisc;
+ DECLARE_HASHTABLE (qdisc_hash, 16);
unsigned long tx_queue_len;
spinlock_t tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
void qdisc_get_default(char *id, size_t len);
int qdisc_set_default(const char *id);

-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops *ops;
struct qdisc_size_table __rcu *stab;
- struct list_head list;
+ struct hlist_node hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->all_adj_list.lower);
INIT_LIST_HEAD(&dev->ptype_all);
INIT_LIST_HEAD(&dev->ptype_specific);
+ hash_init(dev->qdisc_hash);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..82953cb 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
#include <linux/hrtimer.h>
#include <linux/lockdep.h>
#include <linux/slab.h>
+#include <linux/hashtable.h>

#include <net/net_namespace.h>
#include <net/sock.h>
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
root->handle == handle)
return root;

- list_for_each_entry_rcu(q, &root->list, list) {
+ hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
if (q->handle == handle)
return q;
}
return NULL;
}

-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
{
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;

WARN_ON_ONCE(root == &noop_qdisc);
ASSERT_RTNL();
- list_add_tail_rcu(&q->list, &root->list);
+ hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
}
}
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);

-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
{
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
ASSERT_RTNL();
- list_del_rcu(&q->list);
+ hash_del_rcu(&q->hash);
}
}
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);

struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
{
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
goto err_out4;
}

- qdisc_list_add(sch);
+ qdisc_hash_add(sch);

return sch;
}
@@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
{
int ret = 0, q_idx = *q_idx_p;
struct Qdisc *q;
+ int b;

if (!root)
return 0;
@@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
goto done;
q_idx++;
}
- list_for_each_entry(q, &root->list, list) {
+ hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (q_idx < s_q_idx) {
q_idx++;
continue;
@@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
int *t_p, int s_t)
{
struct Qdisc *q;
+ int b;

if (!root)
return 0;
@@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;

- list_for_each_entry(q, &root->list, list) {
+ hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..7efc923 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
.dequeue = noop_dequeue,
.flags = TCQ_F_BUILTIN,
.ops = &noop_qdisc_ops,
- .list = LIST_HEAD_INIT(noop_qdisc.list),
.q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
.dev_queue = &noop_netdev_queue,
.busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
sch->padded = (char *) sch - (char *) p;
}
- INIT_LIST_HEAD(&sch->list);
skb_queue_head_init(&sch->q);

spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
return;

#ifdef CONFIG_NET_SCHED
- qdisc_list_del(qdisc);
+ qdisc_hash_del(qdisc);

qdisc_put_stab(rtnl_dereference(qdisc->stab));
#endif
@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
qdisc->ops->attach(qdisc);
}
}
+ if (dev->qdisc)
+ qdisc_hash_add(dev->qdisc);
}

static void transition_one_qdisc(struct net_device *dev,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 56a77b8..3bee15d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -88,7 +88,7 @@ static void mq_attach(struct Qdisc *sch)
qdisc_destroy(old);
#ifdef CONFIG_NET_SCHED
if (ntx < dev->real_num_tx_queues)
- qdisc_list_add(qdisc);
+ qdisc_hash_add(qdisc);
#endif

}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b8002ce..dbfb3a5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -182,7 +182,7 @@ static void mqprio_attach(struct Qdisc *sch)
if (old)
qdisc_destroy(old);
if (ntx < dev->real_num_tx_queues)
- qdisc_list_add(qdisc);
+ qdisc_hash_add(qdisc);
}
kfree(priv->qdiscs);
priv->qdiscs = NULL;

Eric Dumazet

no leída,
7 jul 2016, 10:00:11 a.m.7/7/2016
para
On Thu, 2016-07-07 at 11:04 +0200, Jiri Kosina wrote:

>
>
> From: Jiri Kosina <jko...@suse.cz>
> Subject: [PATCH] net: sched: convert qdisc linked list to hashtable
>
> Convert the per-device linked list into a hashtable. The primary motivation
> for this change is that currently, we're not tracking all the qdiscs in
> hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
> linked list by qdisc_match_from_root() is rather expensive.

...
Not sure why you used the rcu version here, but the non rcu version in
tc_dump_qdisc_root()

Thanks.

Jiri Kosina

no leída,
7 jul 2016, 12:40:08 p.m.7/7/2016
para
Good catch.

Actually even the current code is odd in this regard --
qdisc_match_from_root() uses RCU iterator, while tc_dump_*() use the
non-RCU one; addition and deletion is performed using RCU primitives.

I haven't got my head around this yet; if it's correct at all, it'd at
least deserve a comment somewhere.

I'll respin v2 of the patch (there is also a conflict on HASH_SIZE
definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include
in netdevice.h that needs to be resolved as well) that'd make RCU usage
consistent.

Any other objections/comments? I was namely curious about any opinions
regarding the hashtable size.

Thanks,

Eric Dumazet

no leída,
7 jul 2016, 1:00:08 p.m.7/7/2016
para
Because it can be run from qdisc enqueue() dequeue(), not holding RTNL.

> while tc_dump_*() use the
> non-RCU one; addition and deletion is performed using RCU primitives.

It really is protected by RTNL (qdiscs can not change during the dump)

>
> I haven't got my head around this yet; if it's correct at all, it'd at
> least deserve a comment somewhere.
>
> I'll respin v2 of the patch (there is also a conflict on HASH_SIZE
> definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include
> in netdevice.h that needs to be resolved as well) that'd make RCU usage
> consistent.
>
> Any other objections/comments? I was namely curious about any opinions
> regarding the hashtable size.

Well, this is the tricky part, but rhashtable would mean way more
changes...

Jiri Kosina

no leída,
7 jul 2016, 4:40:07 p.m.7/7/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to make
sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jko...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own
HASH_SIZE that now oncflicts with the one from
hashtable.h (newly included via netdevice.h)

include/linux/netdevice.h | 2 ++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 1 +
net/ipv6/ip6_gre.c | 8 ++++----
net/ipv6/ip6_tunnel.c | 6 +++---
net/ipv6/ip6_vti.c | 6 +++---
net/ipv6/sit.c | 10 +++++-----
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 6 +++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
12 files changed, 39 insertions(+), 33 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..0f70ecc 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");

#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)

static int ip6gre_net_id __read_mostly;
struct ip6gre_net {
- struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+ struct ip6_tnl __rcu *tunnels[4][__HASH_SIZE];

struct net_device *fb_tunnel_dev;
};
@@ -96,7 +96,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
will match fallback tunnel.
*/

-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(__HASH_SIZE - 1))
static u32 HASH_ADDR(const struct in6_addr *addr)
{
u32 hash = ipv6_addr_hash(addr);
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)

for (prio = 0; prio < 4; prio++) {
int h;
- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < __HASH_SIZE; h++) {
struct ip6_tnl *t;

t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..a9da620 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -65,7 +65,7 @@ MODULE_ALIAS_RTNL_LINK("ip6tnl");
MODULE_ALIAS_NETDEV("ip6tnl0");

#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)

static bool log_ecn_error = true;
module_param(log_ecn_error, bool, 0644);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
/* the IPv6 tunnel fallback device */
struct net_device *fb_tnl_dev;
/* lists for storing tunnels in use */
- struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+ struct ip6_tnl __rcu *tnls_r_l[__HASH_SIZE];
struct ip6_tnl __rcu *tnls_wc[1];
struct ip6_tnl __rcu **tnls[2];
};
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
if (dev->rtnl_link_ops == &ip6_link_ops)
unregister_netdevice_queue(dev, &list);

- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < __HASH_SIZE; h++) {
t = rtnl_dereference(ip6n->tnls_r_l[h]);
while (t) {
/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..2d192af 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -51,7 +51,7 @@
#include <net/netns/generic.h>

#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)

static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
{
@@ -69,7 +69,7 @@ struct vti6_net {
/* the vti6 tunnel fallback device */
struct net_device *fb_tnl_dev;
/* lists for storing tunnels in use */
- struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+ struct ip6_tnl __rcu *tnls_r_l[__HASH_SIZE];
struct ip6_tnl __rcu *tnls_wc[1];
struct ip6_tnl __rcu **tnls[2];
};
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
struct ip6_tnl *t;
LIST_HEAD(list);

- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < __HASH_SIZE; h++) {
t = rtnl_dereference(ip6n->tnls_r_l[h]);
while (t) {
unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..9f776d7 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
For comments look at net/ipv4/ip_gre.c --ANK
*/

-#define HASH_SIZE 16
+#define __HASH_SIZE 16
#define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)

static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;

static int sit_net_id __read_mostly;
struct sit_net {
- struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
- struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
- struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_r_l[__HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_r[__HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_l[__HASH_SIZE];
struct ip_tunnel __rcu *tunnels_wc[1];
struct ip_tunnel __rcu **tunnels[4];

@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,

for (prio = 1; prio < 4; prio++) {
int h;
- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < __HASH_SIZE; h++) {
struct ip_tunnel *t;

t = rtnl_dereference(sitn->tunnels[prio][h]);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644

Craig Gallek

no leída,
7 jul 2016, 7:00:06 p.m.7/7/2016
para
On Thu, Jul 7, 2016 at 4:36 PM, Jiri Kosina <ji...@kernel.org> wrote:
> From: Jiri Kosina <jko...@suse.cz>
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> As we're adding hashtable.h include into generic netdevice.h, we have to make
> sure HASH_SIZE macro is now non-conflicting with local definitions.
>
> Signed-off-by: Jiri Kosina <jko...@suse.cz>
> ---
>
> v1 -> v2: fix up RCU hastable usage wrt. rtnl
> fix compilation of .c files which define their own
> HASH_SIZE that now oncflicts with the one from
> hashtable.h (newly included via netdevice.h)
This sort of seems like it's just side-stepping the problem. Given
that the size of this hash table is fixed, the lookup time of this
operation is still going to approach linear as the number of qdiscs
increases. I took a quick pass at trying to use rhashtable for this
purpose a few weeks ago but dropped it when I realized many of the
table operations (which can trigger resize events) need to happen
while holding the rtnl lock. I still think it would be possible to
use a dynamically sizable datastructure for this purpose, but it will
be a fair amount of work to change the current locking semantics to
make it work...

Jiri Kosina

no leída,
8 jul 2016, 4:20:07 a.m.8/7/2016
para
On Thu, 7 Jul 2016, Craig Gallek wrote:

> This sort of seems like it's just side-stepping the problem. Given
> that the size of this hash table is fixed, the lookup time of this
> operation is still going to approach linear as the number of qdiscs
> increases.

That's true; however the primary goal here is not to actually ultimately
improve speed of qdisc lookup per se, but rather to make it possible to
unhide the qdiscs which are currently omitted as the linked list takes too
long to walk. The static hashtable is going help here.

Thanks,

Eric Dumazet

no leída,
8 jul 2016, 5:00:09 a.m.8/7/2016
para
On Thu, 2016-07-07 at 22:36 +0200, Jiri Kosina wrote:
> From: Jiri Kosina <jko...@suse.cz>
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> As we're adding hashtable.h include into generic netdevice.h, we have to make
> sure HASH_SIZE macro is now non-conflicting with local definitions.
>
> Signed-off-by: Jiri Kosina <jko...@suse.cz>
> ---


> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index fdc9de2..0f70ecc 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
> MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
>
> #define HASH_SIZE_SHIFT 5
> -#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
> +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)

__ prefix is mostly used for functions having some kind of
shells/helpers.

I would rather use IP6_GRE_HASH_SIZE or something which has lower
chances of being used elsewhere.

Or maybe you could use new HASH_SIZE(name), providing proper 'name'

@@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
> qdisc->ops->attach(qdisc);
> }
> }
> + if (dev->qdisc)
> + qdisc_hash_add(dev->qdisc);
> }
>

I do not understand this addition, could you comment on it ?

Jiri Kosina

no leída,
8 jul 2016, 5:10:07 a.m.8/7/2016
para
On Fri, 8 Jul 2016, Eric Dumazet wrote:

> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index fdc9de2..0f70ecc 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
> > MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
> >
> > #define HASH_SIZE_SHIFT 5
> > -#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
> > +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
>
> __ prefix is mostly used for functions having some kind of
> shells/helpers.
>
> I would rather use IP6_GRE_HASH_SIZE or something which has lower
> chances of being used elsewhere.

Alright, makes sense, will do this in v3.

> @@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
> > qdisc->ops->attach(qdisc);
> > }
> > }
> > + if (dev->qdisc)
> > + qdisc_hash_add(dev->qdisc);
> > }
> >
>
> I do not understand this addition, could you comment on it ?

With linked lists, assigning to struct net_device's Qdisc pointer is
enough to "initialize" the linked list and have it contain one (root)
item. With hashtable, this is not the case, it needs to be explicitly
added.

Hmm, dev_init_scheduler() (and perhaps also dev_shutdown()) would possibly
need similar treatment in order to have accurate data there 100% of the
time even during initialization.

Thomas Graf

no leída,
8 jul 2016, 7:10:07 a.m.8/7/2016
para
On 07/07/16 at 10:36pm, Jiri Kosina wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f45929c..630838e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,7 @@
> #include <uapi/linux/netdevice.h>
> #include <uapi/linux/if_bonding.h>
> #include <uapi/linux/pkt_cls.h>
> +#include <linux/hashtable.h>
>
> struct netpoll_info;
> struct device;
> @@ -1778,6 +1779,7 @@ struct net_device {
> unsigned int num_tx_queues;
> unsigned int real_num_tx_queues;
> struct Qdisc *qdisc;
> + DECLARE_HASHTABLE (qdisc_hash, 16);

This blows up net_device to an insane size: 64K * sizeof(struct
hlist_head). Can we allocate this on demand for net_devices where
it is actually needed? The majority of virtual devices won't need
this. Doesn't have to be rhashtable, can still be fixed size but
at least allocate it.

Eric Dumazet

no leída,
8 jul 2016, 10:00:10 a.m.8/7/2016
para
Jiri probably misread the API and should have used :

DECLARE_HASHTABLE (qdisc_hash, 4);

Google has a very similar patch with 16 buckets, and it is 'good enough',
although we do not hit the qdisc_tree_reduce_backlog() penalty.

Jiri Kosina

no leída,
11 jul 2016, 10:10:07 a.m.11/7/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jko...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own
HASH_SIZE that now oncflicts with the one from
hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
fix up the number of hash bucket bits (4 bits for 16 buckets)

include/linux/netdevice.h | 2 ++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 1 +
net/ipv6/ip6_gre.c | 12 ++++++------
net/ipv6/ip6_tunnel.c | 10 +++++-----
net/ipv6/ip6_vti.c | 10 +++++-----
net/ipv6/sit.c | 10 +++++-----
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 6 +++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
12 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..0b5c172e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
#include <uapi/linux/netdevice.h>
#include <uapi/linux/if_bonding.h>
#include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>

struct netpoll_info;
struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned int num_tx_queues;
unsigned int real_num_tx_queues;
struct Qdisc *qdisc;
+ DECLARE_HASHTABLE (qdisc_hash, 4);
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
module_param(log_ecn_error, bool, 0644);
MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");

-#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT 5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)

static int ip6gre_net_id __read_mostly;
struct ip6gre_net {
- struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+ struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];

struct net_device *fb_tunnel_dev;
};
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
will match fallback tunnel.
*/

-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
static u32 HASH_ADDR(const struct in6_addr *addr)
{
u32 hash = ipv6_addr_hash(addr);

- return hash_32(hash, HASH_SIZE_SHIFT);
+ return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
}

#define tunnels_r_l tunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head)

for (prio = 0; prio < 4; prio++) {
int h;
- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
struct ip6_tnl *t;

t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("ip6tnl");
MODULE_ALIAS_NETDEV("ip6tnl0");

-#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_TUNNEL_HASH_SIZE_SHIFT 5
+#define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)

static bool log_ecn_error = true;
module_param(log_ecn_error, bool, 0644);
@@ -75,7 +75,7 @@ static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
{
u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);

- return hash_32(hash, HASH_SIZE_SHIFT);
+ return hash_32(hash, IP6_TUNNEL_HASH_SIZE_SHIFT);
}

static int ip6_tnl_dev_init(struct net_device *dev);
@@ -87,7 +87,7 @@ struct ip6_tnl_net {
/* the IPv6 tunnel fallback device */
struct net_device *fb_tnl_dev;
/* lists for storing tunnels in use */
- struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+ struct ip6_tnl __rcu *tnls_r_l[IP6_TUNNEL_HASH_SIZE];
struct ip6_tnl __rcu *tnls_wc[1];
struct ip6_tnl __rcu **tnls[2];
};
@@ -2031,7 +2031,7 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net)
if (dev->rtnl_link_ops == &ip6_link_ops)
unregister_netdevice_queue(dev, &list);

- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < IP6_TUNNEL_HASH_SIZE; h++) {
t = rtnl_dereference(ip6n->tnls_r_l[h]);
while (t) {
/* If dev is in the same netns, it has already
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f..cc7e058 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -50,14 +50,14 @@
#include <net/net_namespace.h>
#include <net/netns/generic.h>

-#define HASH_SIZE_SHIFT 5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_VTI_HASH_SIZE_SHIFT 5
+#define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)

static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
{
u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);

- return hash_32(hash, HASH_SIZE_SHIFT);
+ return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
}

static int vti6_dev_init(struct net_device *dev);
@@ -69,7 +69,7 @@ struct vti6_net {
/* the vti6 tunnel fallback device */
struct net_device *fb_tnl_dev;
/* lists for storing tunnels in use */
- struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE];
+ struct ip6_tnl __rcu *tnls_r_l[IP6_VTI_HASH_SIZE];
struct ip6_tnl __rcu *tnls_wc[1];
struct ip6_tnl __rcu **tnls[2];
};
@@ -1040,7 +1040,7 @@ static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
struct ip6_tnl *t;
LIST_HEAD(list);

- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < IP6_VTI_HASH_SIZE; h++) {
t = rtnl_dereference(ip6n->tnls_r_l[h]);
while (t) {
unregister_netdevice_queue(t->dev, &list);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0a5a255..94dd0f0 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -62,7 +62,7 @@
For comments look at net/ipv4/ip_gre.c --ANK
*/

-#define HASH_SIZE 16
+#define IP6_SIT_HASH_SIZE 16
#define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)

static bool log_ecn_error = true;
@@ -78,9 +78,9 @@ static struct rtnl_link_ops sit_link_ops __read_mostly;

static int sit_net_id __read_mostly;
struct sit_net {
- struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
- struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
- struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_r_l[IP6_SIT_HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_r[IP6_SIT_HASH_SIZE];
+ struct ip_tunnel __rcu *tunnels_l[IP6_SIT_HASH_SIZE];
struct ip_tunnel __rcu *tunnels_wc[1];
struct ip_tunnel __rcu **tunnels[4];

@@ -1773,7 +1773,7 @@ static void __net_exit sit_destroy_tunnels(struct net *net,

for (prio = 1; prio < 4; prio++) {
int h;
- for (h = 0; h < HASH_SIZE; h++) {
+ for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {

Cong Wang

no leída,
12 jul 2016, 1:40:10 p.m.12/7/2016
para
On Mon, Jul 11, 2016 at 7:02 AM, Jiri Kosina <ji...@kernel.org> wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f45929c..0b5c172e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,7 @@
> #include <uapi/linux/netdevice.h>
> #include <uapi/linux/if_bonding.h>
> #include <uapi/linux/pkt_cls.h>
> +#include <linux/hashtable.h>
>
> struct netpoll_info;
> struct device;
> @@ -1778,6 +1779,7 @@ struct net_device {
> unsigned int num_tx_queues;
> unsigned int real_num_tx_queues;
> struct Qdisc *qdisc;
> + DECLARE_HASHTABLE (qdisc_hash, 4);
> unsigned long tx_queue_len;
> spinlock_t tx_global_lock;
> int watchdog_timeo;

Should it be surrounded by CONFIG_NET_SCHED?
To save several bytes for !CONFIG_NET_SCHED case.

Jiri Kosina

no leída,
13 jul 2016, 10:00:07 a.m.13/7/2016
para
Makes sense. I'll wait a bit for more feedback (if there is any) before
including this in potential v4.

Thanks,

Jiri Kosina

no leída,
14 jul 2016, 10:20:06 a.m.14/7/2016
para

[ added CCs ]

On Tue, 12 Jul 2016, kbuild test robot wrote:

> Hi,
>
> [auto build test ERROR on net/master]
> [also build test ERROR on v4.7-rc7 next-20160711]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527
> config: arm-tct_hammer_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
> reproduce:
> wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm
>
> All errors (new ones prefixed by >>):
>
> net/built-in.o: In function `dev_activate':
> >> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add'

This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?

The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).

Jiri Kosina

no leída,
28 jul 2016, 6:00:05 a.m.28/7/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jko...@suse.cz>
---
v1 -> v2: fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own
HASH_SIZE that now oncflicts with the one from
hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if
CONFIG_NET_SCHED has been enabled

include/linux/netdevice.h | 4 ++++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 3 +++
net/ipv6/ip6_gre.c | 12 ++++++------
net/ipv6/ip6_tunnel.c | 10 +++++-----
net/ipv6/ip6_vti.c | 10 +++++-----
net/ipv6/sit.c | 10 +++++-----
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 6 +++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
12 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
#include <uapi/linux/netdevice.h>
#include <uapi/linux/if_bonding.h>
#include <uapi/linux/pkt_cls.h>
+#include <linux/hashtable.h>

struct netpoll_info;
struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned int num_tx_queues;
unsigned int real_num_tx_queues;
struct Qdisc *qdisc;
+#ifdef CONFIG_NET_SCHED
+ DECLARE_HASHTABLE (qdisc_hash, 4);
+#endif
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->all_adj_list.lower);
INIT_LIST_HEAD(&dev->ptype_all);
INIT_LIST_HEAD(&dev->ptype_specific);
+#ifdef CONFIG_NET_SCHED
+ hash_init(dev->qdisc_hash);
+#endif

Jiri Kosina

no leída,
28 jul 2016, 7:20:06 a.m.28/7/2016
para
On Thu, 28 Jul 2016, kbuild test robot wrote:

> [auto build test ERROR on v4.7-rc7]
> [also build test ERROR on next-20160728]
> [cannot apply to net/master net-next/master ipsec-next/master]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
> config: i386-randconfig-s0-201630 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
> net/built-in.o: In function `dev_activate':
> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.LNX.2.00.1...@cbobk.fhfr.pm

Thanks,

Fengguang Wu

no leída,
28 jul 2016, 9:00:05 a.m.28/7/2016
para
Sorry I missed that. For your convenience, here is the answer to the
original email:

>This issue is be there even without my patch (but with qdisc_list_add
>instead), isn't it?

Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'

>The problem is that sch_generic.c (where dev_activate() is) is being
>compiled everytime CONFIG_NET is set, but sch_api.c (where
>qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>is no stub for !CONFIG_NET_SCHED case).

So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang

Fengguang Wu

no leída,
28 jul 2016, 9:00:05 a.m.28/7/2016
para
Hi Jiri,

On Thu, Jul 14, 2016 at 04:14:58PM +0200, Jiri Kosina wrote:
>
>[ added CCs ]
>
>On Tue, 12 Jul 2016, kbuild test robot wrote:
>
>> Hi,
>>
>> [auto build test ERROR on net/master]
>> [also build test ERROR on v4.7-rc7 next-20160711]
>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>>
>> url: https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527
>> config: arm-tct_hammer_defconfig (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
>> reproduce:
>> wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> # save the attached .config to linux build tree
>> make.cross ARCH=arm
>>
>> All errors (new ones prefixed by >>):
>>
>> net/built-in.o: In function `dev_activate':
>> >> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add'
>
>This issue is be there even without my patch (but with qdisc_list_add
>instead), isn't it?

Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'

>The problem is that sch_generic.c (where dev_activate() is) is being
>compiled everytime CONFIG_NET is set, but sch_api.c (where
>qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>is no stub for !CONFIG_NET_SCHED case).

Cong Wang

no leída,
28 jul 2016, 1:00:06 p.m.28/7/2016
para
On Thu, Jul 28, 2016 at 2:56 AM, Jiri Kosina <ji...@kernel.org> wrote:
> From: Jiri Kosina <jko...@suse.cz>
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> As we're adding hashtable.h include into generic netdevice.h, we have to
> make sure HASH_SIZE macro is now non-conflicting with local definitions.
>
> Signed-off-by: Jiri Kosina <jko...@suse.cz>
> ---
> v1 -> v2: fix up RCU hastable usage wrt. rtnl
> fix compilation of .c files which define their own
> HASH_SIZE that now oncflicts with the one from
> hashtable.h (newly included via netdevice.h)
>
> v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
> fix up the number of hash bucket bits (4 bits for 16 buckets)
>
> v3 -> v4: put the hastable into struct netdevice only if
> CONFIG_NET_SCHED has been enabled

Reviewed-by: Cong Wang <xiyou.w...@gmail.com>

Thanks!

Cong Wang

no leída,
28 jul 2016, 1:00:07 p.m.28/7/2016
para
Agreed. I can send a patch if Jiri doesn't. ;)

Jiri Kosina

no leída,
29 jul 2016, 4:00:10 a.m.29/7/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.w...@gmail.com>
Signed-off-by: Jiri Kosina <jko...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own
HASH_SIZE that now oncflicts with the one from
hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if
CONFIG_NET_SCHED has been enabled

v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
add Cong Wang's reviewed-by

include/linux/netdevice.h | 4 ++++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 3 +++
net/ipv6/ip6_gre.c | 12 ++++++------
net/ipv6/ip6_tunnel.c | 10 +++++-----
net/ipv6/ip6_vti.c | 10 +++++-----
net/ipv6/sit.c | 10 +++++-----
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 8 +++++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
12 files changed, 51 insertions(+), 39 deletions(-)
index f9e0e9c..94d5999 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -378,7 +378,6 @@ struct Qdisc noop_qdisc = {
.dequeue = noop_dequeue,
.flags = TCQ_F_BUILTIN,
.ops = &noop_qdisc_ops,
- .list = LIST_HEAD_INIT(noop_qdisc.list),
.q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
.dev_queue = &noop_netdev_queue,
.busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
@@ -565,7 +564,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
sch->padded = (char *) sch - (char *) p;
}
- INIT_LIST_HEAD(&sch->list);
skb_queue_head_init(&sch->q);

spin_lock_init(&sch->busylock);
@@ -645,7 +643,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
return;

#ifdef CONFIG_NET_SCHED
- qdisc_list_del(qdisc);
+ qdisc_hash_del(qdisc);

qdisc_put_stab(rtnl_dereference(qdisc->stab));
#endif
@@ -732,6 +730,10 @@ static void attach_default_qdiscs(struct net_device *dev)
qdisc->ops->attach(qdisc);
}
}
+#ifdef CONFIG_NET_SCHED
+ if (dev->qdisc)
+ qdisc_hash_add(dev->qdisc);
+#endif

Fengguang Wu

no leída,
31 jul 2016, 7:30:04 a.m.31/7/2016
para
Jiri, I just double checked and find no similar errors related to
qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
in dccp_v4_ctl_send_reset") builds fine without error.

Thanks,
Fengguang

Jiri Kosina

no leída,
1 ago 2016, 6:20:07 a.m.1/8/2016
para
On Sun, 31 Jul 2016, Fengguang Wu wrote:

> Jiri, I just double checked and find no similar errors related to
> qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
> in dccp_v4_ctl_send_reset") builds fine without error.

You are right, I realized my mistake afterwards. This is fixed in v5 of
the patch.

Jiri Kosina

no leída,
1 ago 2016, 6:30:07 a.m.1/8/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.w...@gmail.com>
Signed-off-by: Jiri Kosina <jko...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own
HASH_SIZE that now oncflicts with the one from
hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if
CONFIG_NET_SCHED has been enabled

v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
add Cong Wang's reviewed-by

v5 -> v6: build fix for davinci_emac driver that got symbol conflict
due to hashtable.h include, reported by 0day bot

drivers/net/ethernet/ti/davinci_emac.c | 14 +++++++-------
include/linux/netdevice.h | 4 ++++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 3 +++
net/ipv6/ip6_gre.c | 12 ++++++------
net/ipv6/ip6_tunnel.c | 10 +++++-----
net/ipv6/ip6_vti.c | 10 +++++-----
net/ipv6/sit.c | 10 +++++-----
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 8 +++++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
13 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..91ca2b2 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr)
}

/**
- * hash_add - Hash function to add mac addr from hash table
+ * emac_hash_add - Hash function to add mac addr from hash table
* @priv: The DaVinci EMAC private adapter structure
* @mac_addr: mac address to delete from hash table
*
* Adds mac address to the internal hash table
*
*/
-static int hash_add(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr)
{
struct device *emac_dev = &priv->ndev->dev;
u32 rc = 0;
@@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)

if (hash_value >= EMAC_NUM_MULTICAST_BITS) {
if (netif_msg_drv(priv)) {
- dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\
+ dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): Invalid "\
"Hash %08x, should not be greater than %08x",
hash_value, (EMAC_NUM_MULTICAST_BITS - 1));
}
@@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
}

/**
- * hash_del - Hash function to delete mac addr from hash table
+ * emac_hash_del - Hash function to delete mac addr from hash table
* @priv: The DaVinci EMAC private adapter structure
* @mac_addr: mac address to delete from hash table
*
* Removes mac address from the internal hash table
*
*/
-static int hash_del(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr)
{
u32 hash_value;
u32 hash_bit;
@@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 action, u8 *mac_addr)

switch (action) {
case EMAC_MULTICAST_ADD:
- update = hash_add(priv, mac_addr);
+ update = emac_hash_add(priv, mac_addr);
break;
case EMAC_MULTICAST_DEL:
- update = hash_del(priv, mac_addr);
+ update = emac_hash_del(priv, mac_addr);
break;
case EMAC_ALL_MULTI_SET:
update = 1;

Jiri Kosina

no leída,
10 ago 2016, 4:30:09 p.m.10/8/2016
para
From: Jiri Kosina <jko...@suse.cz>

Convert the per-device linked list into a hashtable. The primary
motivation for this change is that currently, we're not tracking all the
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
performed over the linked list by qdisc_match_from_root() is rather
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will
bring much more determinism in user experience.

Reviewed-by: Cong Wang <xiyou.w...@gmail.com>
Signed-off-by: Jiri Kosina <jko...@suse.cz>
---
include/linux/netdevice.h | 4 ++++
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 2 +-
net/core/dev.c | 3 +++
net/sched/sch_api.c | 23 +++++++++++++----------
net/sched/sch_generic.c | 8 +++++---
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
8 files changed, 30 insertions(+), 18 deletions(-)
1.9.2

Daniel Borkmann

no leída,
12 ago 2016, 9:00:06 a.m.12/8/2016
para
Hi Jiri,

On 08/10/2016 11:05 AM, Jiri Kosina wrote:
> From: Jiri Kosina <jko...@suse.cz>
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> Reviewed-by: Cong Wang <xiyou.w...@gmail.com>
> Signed-off-by: Jiri Kosina <jko...@suse.cz>

This results in below panic. Tested reverting this patch and it fixes
the panic.

Did you test this also with ingress or clsact qdisc (just try adding
it to lo dev for example) ?

What happens is the following in qdisc_match_from_root():

[ 995.422187] XXX qdisc:ffff88025e4fc800 queue:ffff880262759000 dev:ffff880261cc2000 handle:ffff0000
[ 995.422200] XXX qdisc:ffffffff81cf8100 queue:ffffffff81cf8240 dev: (null) handle:ffff0000

I believe this is due to dev_ingress_queue_create() assigning the
global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
uses for qdisc_match_from_root().

But everything that uses things like noop_qdisc cannot work with the
new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
NULL pointer dereference there. Reason is because the dev is always
NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
in sch_generic.c.

Now how to fix it? Creating separate noop instances each time it's set
would be quite a waste of memory. Even fuglier would be to hack a static
net device struct into sch_generic.c and let noop_netdev_queue point there
to get to the hash table. Or we just not use qdisc_dev().

I've tried the below to hand in dev pointer instead of qdisc_dev(), but
I think this is not sound yet. Despite fixing the panic, I get something
weird like:

# tc qdisc show dev wlp2s0b1
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Not yet sure whether this below, really quickly hacked patch is just buggy
(I guess so) or it's another side effect of the original patch.

If you have some cycles to take a look into fixing the panic, would be great.

Thanks

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 25aada7..c2c9799 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -256,7 +256,8 @@ int qdisc_set_default(const char *name)
* Note: caller either uses rtnl or rcu_read_lock()
*/

-static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+static struct Qdisc *qdisc_match_from_root(struct net_device *dev,
+ struct Qdisc *root, u32 handle)
{
struct Qdisc *q;

@@ -264,7 +265,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
root->handle == handle)
return root;

- hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) {
+ hash_for_each_possible_rcu(dev->qdisc_hash, q, hash, handle) {
if (q->handle == handle)
return q;
}
@@ -296,12 +297,12 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
{
struct Qdisc *q;

- q = qdisc_match_from_root(dev->qdisc, handle);
+ q = qdisc_match_from_root(dev, dev->qdisc, handle);
if (q)
goto out;

if (dev_ingress_queue(dev))
- q = qdisc_match_from_root(
+ q = qdisc_match_from_root(dev,
dev_ingress_queue(dev)->qdisc_sleeping,
handle);
out:
@@ -1430,8 +1431,8 @@ err_out:
return -EINVAL;
}

-static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
- struct netlink_callback *cb,
+static int tc_dump_qdisc_root(struct net_device *dev, struct Qdisc *root,
+ struct sk_buff *skb, struct netlink_callback *cb,
int *q_idx_p, int s_q_idx)
{
int ret = 0, q_idx = *q_idx_p;
@@ -1451,7 +1452,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
goto done;
q_idx++;
}
- hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
+ hash_for_each(dev->qdisc_hash, b, q, hash) {
if (q_idx < s_q_idx) {
q_idx++;
continue;
@@ -1492,12 +1493,12 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
s_q_idx = 0;
q_idx = 0;

- if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
+ if (tc_dump_qdisc_root(dev, dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
goto done;

dev_queue = dev_ingress_queue(dev);
if (dev_queue &&
- tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
+ tc_dump_qdisc_root(dev, dev_queue->qdisc_sleeping, skb, cb,
&q_idx, s_q_idx) < 0)
goto done;

@@ -1762,9 +1763,9 @@ static int tc_dump_tclass_qdisc(struct Qdisc *q, struct sk_buff *skb,
return 0;
}

-static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
- struct tcmsg *tcm, struct netlink_callback *cb,
- int *t_p, int s_t)
+static int tc_dump_tclass_root(struct net_device *dev, struct Qdisc *root,
+ struct sk_buff *skb, struct tcmsg *tcm,
+ struct netlink_callback *cb, int *t_p, int s_t)
{
struct Qdisc *q;
int b;
@@ -1775,7 +1776,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;

- hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
+ hash_for_each(dev->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
}
@@ -1800,12 +1801,12 @@ static int tc_dump_tclass(struct sk_buff *skb, struct netlink_callback *cb)
s_t = cb->args[0];
t = 0;

- if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
+ if (tc_dump_tclass_root(dev, dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
goto done;

dev_queue = dev_ingress_queue(dev);
if (dev_queue &&
- tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb,
+ tc_dump_tclass_root(dev, dev_queue->qdisc_sleeping, skb, tcm, cb,
&t, s_t) < 0)
goto done;

Panic output:

[ 1243.459280] BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
[ 1243.459430] IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
[ 1243.459528] PGD 1aceba067 PUD 1aceb7067 PMD 0
[ 1243.459604] Oops: 0000 [#1] PREEMPT SMP
[ 1243.459659] Modules linked in: ccm br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c loop xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 arc4 brcmsmac nf_nat nf_conntrack cordic brcmutil iptable_mangle b43 iptable_security iptable_raw iptable_filter ip_tables x_tables mac80211 bnep cfg80211 snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_codec_generic ssb snd_hda_intel snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel btusb mmc_core kvm btrtl btbcm snd_hda_core btintel uvcvideo bluetooth nls_utf8 snd_hwdep hfsplus snd_seq videobuf2_vmalloc
[ 1243.476357] videobuf2_memops iTCO_wdt videobuf2_v4l2 iTCO_vendor_support videobuf2_core videodev snd_seq_device bcma irqbypass snd_pcm crc32_pclmul crc32c_intel applesmc input_polldev ghash_clmulni_intel pcspkr nfsd i2c_i801 lpc_ich rfkill bcm5974 media mfd_core i2c_smbus joydev snd_timer snd mei_me auth_rpcgss sbs mei tpm_tis sbshc nfs_acl tpm_tis_core lockd tpm apple_bl soundcore grace sunrpc i915 i2c_algo_bit drm_kms_helper drm i2c_core video
[ 1243.494439] CPU: 2 PID: 2223 Comm: tc Not tainted 4.7.0+ #1181
[ 1243.499015] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, BIOS MBA51.88Z.00EF.B02.1211271028 11/27/2012
[ 1243.503630] task: ffff8801ec996e00 task.stack: ffff8801ec934000
[ 1243.508311] RIP: 0010:[<ffffffff8167efac>] [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
[ 1243.513207] RSP: 0018:ffff8801ec937ab0 EFLAGS: 00010203
[ 1243.518053] RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
[ 1243.522893] RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
[ 1243.527598] RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
[ 1243.532378] R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
[ 1243.537221] R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
[ 1243.542051] FS: 00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
[ 1243.542059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1243.542062] CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
[ 1243.542064] Stack:
[ 1243.542075] ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
[ 1243.542082] ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
[ 1243.542090] ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
[ 1243.542092] Call Trace:
[ 1243.542104] [<ffffffff81681210>] qdisc_lookup+0x40/0x50
[ 1243.542113] [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
[ 1243.542125] [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
[ 1243.542136] [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
[ 1243.542144] [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
[ 1243.542151] [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
[ 1243.542158] [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
[ 1243.542164] [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
[ 1243.542170] [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
[ 1243.542180] [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
[ 1243.542187] [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
[ 1243.542197] [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
[ 1243.542206] [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
[ 1243.542214] [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
[ 1243.542222] [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
[ 1243.542230] [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
[ 1243.542237] [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
[ 1243.542245] [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
[ 1243.542254] [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
[ 1243.542261] [<ffffffff810038e7>] do_syscall_64+0x57/0xb0
[ 1243.542268] [<ffffffff81770fa1>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1243.542357] Code: 1f 44 00 00 f6 47 10 01 55 48 89 e5 75 05 39 77 38 74 48 48 8b 57 48 69 c6 47 86 c8 61 48 8b 12 c1 e8 1c 48 8d 84 c2 d0 03 00 00 <48> 8b 50 08 31 c0 48 8d 4a d8 48 85 d2 48 0f 45 c1 eb 04 48 83
[ 1243.542366] RIP [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
[ 1243.542368] RSP <ffff8801ec937ab0>
[ 1243.542370] CR2: 0000000000000410
[ 1243.565166] ---[ end trace aee041a86366c4d4 ]---

Jiri Kosina

no leída,
12 ago 2016, 10:00:06 a.m.12/8/2016
para
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> This results in below panic. Tested reverting this patch and it fixes
> the panic.
>
> Did you test this also with ingress or clsact qdisc (just try adding
> it to lo dev for example) ?

Hi Daniel,

thanks for the report. Hmm, I am pretty sure clsact worked for me, but
I'll recheck.

> What happens is the following in qdisc_match_from_root():
>
> [ 995.422187] XXX qdisc:ffff88025e4fc800 queue:ffff880262759000
> dev:ffff880261cc2000 handle:ffff0000
> [ 995.422200] XXX qdisc:ffffffff81cf8100 queue:ffffffff81cf8240 dev:
> (null) handle:ffff0000
>
> I believe this is due to dev_ingress_queue_create() assigning the
> global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
> uses for qdisc_match_from_root().
>
> But everything that uses things like noop_qdisc cannot work with the
> new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
> NULL pointer dereference there. Reason is because the dev is always
> NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
> in sch_generic.c.
>
> Now how to fix it? Creating separate noop instances each time it's set
> would be quite a waste of memory. Even fuglier would be to hack a static
> net device struct into sch_generic.c and let noop_netdev_queue point there
> to get to the hash table. Or we just not use qdisc_dev().

How about we actually extend a little bit the TCQ_F_BUILTIN special case
test in qdisc_match_from_root()?

After the change, the only way how qdisc_dev() could be NULL should be a
TCQ_F_BUILTIN case, right?

I was thinking about something like the patch below (the reasong being
that ->dev would be NULL only in cases of singletonish qdiscs) ...
wouldn't that also fix the issue you're seeing? Have to think it through a
little bit more ..


diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 25aada7..1c9faed 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -260,6 +260,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
{
struct Qdisc *q;

+ if (!qdisc_dev(root))
+ return (root->handle == handle ? root : NULL);
+
if (!(root->flags & TCQ_F_BUILTIN) &&
root->handle == handle)
return root;


Thanks!

Daniel Borkmann

no leída,
12 ago 2016, 10:30:06 a.m.12/8/2016
para
Ahh, so this has the same effect as previously observed with the other fix.

Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
be pfifo_fast instances appearing.

# tc qdisc show dev wlp2s0b1
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
# tc qdisc add dev wlp2s0b1 clsact
# tc qdisc show dev wlp2s0b1
qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Jiri Kosina

no leída,
12 ago 2016, 10:40:25 a.m.12/8/2016
para
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> > I was thinking about something like the patch below (the reasong being
> > that ->dev would be NULL only in cases of singletonish qdiscs) ...
> > wouldn't that also fix the issue you're seeing? Have to think it
> > through a little bit more ..
>
> Ahh, so this has the same effect as previously observed with the other fix.

Thanks a lot for confirming that this fixes the panic. I still have to
think a little bit more about this though.

> Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
> be pfifo_fast instances appearing.
>
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> # tc qdisc add dev wlp2s0b1 clsact
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc clsact ffff: parent ffff:fff1
> qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Hmm, no immediate idea where those are coming from, we'll have to figure
it out. The mq device used here has 4 queues, right?

Daniel Borkmann

no leída,
12 ago 2016, 10:50:07 a.m.12/8/2016
para
Yes, the first tc qdisc show is after boot and how it should normally
look like, so 4 tx queues.

# ls /sys/class/net/wlp2s0b1/queues/
rx-0 tx-0 tx-1 tx-2 tx-3

When adding clsact, only the 'qdisc clsact' line should be extra. Given
the extra pfifo_fast ones look the same as above, I would suspect a htab
dumping issue, perhaps. I can debug a bit later tonight on this.

Cong Wang

no leída,
14 ago 2016, 4:30:06 a.m.14/8/2016
para
I think this is probably why we never show noop qdisc in dump. So I think
we should relax the singleton rule for noop_qdisc, to save some code
for noop_qdisc case and also for dumping noop_qdisc.

I will try to work on a patch tomorrow.

Jiri Kosina

no leída,
15 ago 2016, 7:30:04 p.m.15/8/2016
para
On Sat, 13 Aug 2016, Cong Wang wrote:

> > How about we actually extend a little bit the TCQ_F_BUILTIN special case
> > test in qdisc_match_from_root()?
> >
> > After the change, the only way how qdisc_dev() could be NULL should be a
> > TCQ_F_BUILTIN case, right?
> >
> > I was thinking about something like the patch below (the reasong being
> > that ->dev would be NULL only in cases of singletonish qdiscs) ...
> > wouldn't that also fix the issue you're seeing? Have to think it through a
> > little bit more ..
>
> I think this is probably why we never show noop qdisc in dump.

Well, partially. A lot of 'default' qdiscs are omitted in a not really
uniform and deterministic way. That's actually the primary point of this
whole effort -- to get rid of the hidden qdiscs entirely.

> So I think we should relax the singleton rule for noop_qdisc, to save
> some code for noop_qdisc case and also for dumping noop_qdisc.

Completely moving away from singleton qdiscs is one of the possibilities,
but OTOH I think that my special-casing of !qdisc_dev(root) in
qdisc_match_from_root() is correct handling of singletons. I've been
completely off the grid for the past three days, but I plan to submit this
as a proper followup fix tomorrow if noone has any objections.

> I will try to work on a patch tomorrow.

What still needs to be looked into are the duplicate clsact entries for
multiqueue.

Thanks,
0 mensajes nuevos