Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [GIT] Networking

106 views
Skip to first unread message

Linus Torvalds

unread,
May 1, 2013, 9:30:01 PM5/1/13
to
On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
>
> Highlights (1721 non-merge commits, this has to be a record of some
> sort):

Lowlight: it completely breaks my machine with r8169 ethernet. In the
networkmanager applet, it claims no cable connection, which is a bit
odd, because (a) it works with an older kernel and (b) the kernel
messages actually say

r8169 0000:01:00.0 eth0: link up

but nothing actually works.

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:01:00.0 eth0: RTL8168d/8111d at 0xffffc90010ece000,
e0:cb:4e:95:1a:d7, XID 083000c0 IRQ 53
r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx
checksumming: ko]

Any ideas?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linus Torvalds

unread,
May 2, 2013, 12:40:02 AM5/2/13
to
On Wed, May 1, 2013 at 6:28 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> Lowlight: it completely breaks my machine with r8169 ethernet. In the
> networkmanager applet, it claims no cable connection, which is a bit
> odd, because (a) it works with an older kernel and (b) the kernel
> messages actually say
>
> r8169 0000:01:00.0 eth0: link up
>
> but nothing actually works.
>
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:01:00.0 eth0: RTL8168d/8111d at 0xffffc90010ece000,
> e0:cb:4e:95:1a:d7, XID 083000c0 IRQ 53
> r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx
> checksumming: ko]
>
> Any ideas?

Hmm. I bisected it.

And my machine is broken by commit 8ad227ff89a7 ("net: vlan: add
802.1ad support"). I booted several times just to make sure, and it's
consistent. The previous commit (86a9bad3ab6b) works fine, and
8ad227ff89a7 is broken.

I don't see what could be broken in that commit, and I'd *like* to
just revert it on top of current -git, but that causes problems
("error: ‘NETIF_F_HW_VLAN_STAG_TX_BIT’ undeclared"), so I can't just
do a straight revert to double-check with the current tree state. But
the bisection was very straightforward, and as mentioned, I checked
that boundary several times just because it looked so odd.

Maybe somebody who knows the code better just goes "Duh!". Anybody?

Linus Torvalds

unread,
May 2, 2013, 1:00:01 AM5/2/13
to
On Wed, May 1, 2013 at 9:37 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> I don't see what could be broken in that commit, and I'd *like* to
> just revert it on top of current -git, but that causes problems
> ("error: ‘NETIF_F_HW_VLAN_STAG_TX_BIT’ undeclared"), so I can't just
> do a straight revert to double-check with the current tree state. But
> the bisection was very straightforward, and as mentioned, I checked
> that boundary several times just because it looked so odd.

Ok, this is just f*cking odd.

So I first tried to revert commit 8ad227ff89a7 but leave the new
*_HW_VLAN_STAG_* bit definitions in place so that it would compile
without that error. That still resulted in a non-working network.

So then I start getting desperate, and say to myself "maybe the bit
positions matter". So do a full revert (so that those bits are no
longer enumerated), and then to make things compile for me I comment
out the uses I hit in my build:

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 5a934ef90f8b..df019a7ab51e 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -64,9 +64,11 @@ static const char
netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GST

[NETIF_F_HW_VLAN_CTAG_RX_BIT] = "rx-vlan-ctag-hw-parse",
[NETIF_F_HW_VLAN_CTAG_FILTER_BIT] = "rx-vlan-ctag-filter",
+#if 0
[NETIF_F_HW_VLAN_STAG_TX_BIT] = "tx-vlan-stag-hw-insert",
[NETIF_F_HW_VLAN_STAG_RX_BIT] = "rx-vlan-stag-hw-parse",
[NETIF_F_HW_VLAN_STAG_FILTER_BIT] = "rx-vlan-stag-filter",
+#endif
[NETIF_F_VLAN_CHALLENGED_BIT] = "vlan-challenged",
[NETIF_F_GSO_BIT] = "tx-generic-segmentation",
[NETIF_F_LLTX_BIT] = "tx-lockless",

and guess what? I have working networking again.

So either this is some very odd heisenbug (but quite frankly, it
bisected perfectly, and reverting it *does* fix it with the above
addition), or the bit positions for those NETIF constants matter.

I think the positions of those bits matter, and adding
NETIF_F_HW_VLAN_STAG_*_BIT randomly in the middle broke things. That's
backed up by the fact that we have things like

__UNUSED_NETIF_F_1

and

/**/NETIF_F_GSO_SHIFT, /* keep the order of SKB_GSO_* bits */
NETIF_F_TSO_BIT /* ... TCPv4 segmentation */
= NETIF_F_GSO_SHIFT,

in that array. There is some ordering, and there is some meaning to
the bit numbers, and adding the *_STAG_* bits in the middle broke some
subtle dependency.

That's as far as I'm going to be able to debug this. I've pinpointed
the commit, and I think I've pinpointed the approximate cause. Pls get
my networking going again without my disgusting local hack..

David Miller

unread,
May 2, 2013, 1:30:01 AM5/2/13
to
From: Linus Torvalds <torv...@linux-foundation.org>
Date: Wed, 1 May 2013 18:28:38 -0700

> On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
>>
>> Highlights (1721 non-merge commits, this has to be a record of some
>> sort):
>
> Lowlight: it completely breaks my machine with r8169 ethernet. In the
> networkmanager applet, it claims no cable connection, which is a bit
> odd, because (a) it works with an older kernel and (b) the kernel
> messages actually say
>
> r8169 0000:01:00.0 eth0: link up
>
> but nothing actually works.
>
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:01:00.0 eth0: RTL8168d/8111d at 0xffffc90010ece000,
> e0:cb:4e:95:1a:d7, XID 083000c0 IRQ 53
> r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx
> checksumming: ko]
>
> Any ideas?

I'll take a look at this first thing tomorrow if someone doesn't beat
me to it.

David Miller

unread,
May 2, 2013, 2:50:02 AM5/2/13
to
From: Linus Torvalds <torv...@linux-foundation.org>
Date: Wed, 1 May 2013 21:55:38 -0700

> I think the positions of those bits matter, and adding
> NETIF_F_HW_VLAN_STAG_*_BIT randomly in the middle broke things. That's
> backed up by the fact that we have things like
>
> __UNUSED_NETIF_F_1
>
> and
>
> /**/NETIF_F_GSO_SHIFT, /* keep the order of SKB_GSO_* bits */
> NETIF_F_TSO_BIT /* ... TCPv4 segmentation */
> = NETIF_F_GSO_SHIFT,
>
> in that array. There is some ordering, and there is some meaning to
> the bit numbers, and adding the *_STAG_* bits in the middle broke some
> subtle dependency.

The other thing this does is it pushes some bits past bit 31.

netdev_features_t, which holds these masks, is 64-bit but we've
already seen one place in a driver where a 32-bit value was being
used.

I'll look more deeply into this, thanks.

Francois Romieu

unread,
May 2, 2013, 2:50:02 AM5/2/13
to
Linus Torvalds <torv...@linux-foundation.org> :
[...]
> Any ideas?

1a9646497b163a8b9da5e70008d809dc91b32855 ("r8169: adjust the flow of hw_start"),
eee3786f7d3134e3edc54c1134511d520dd74285 ("r8169: Modify the method for setting
firmware") and 0427d0152eb3c2c2712afa427dd593c68fc09299
("r8169: Remove firmware code") may play a role on your 8168D setup, either by
themselves or combined with something else.

Expect no sign of life from me until the late end of the european working
hours.

--
Ueimor

Patrick McHardy

unread,
May 2, 2013, 3:20:02 AM5/2/13
to
On Thu, May 02, 2013 at 02:45:52AM -0400, David Miller wrote:
> From: Linus Torvalds <torv...@linux-foundation.org>
> Date: Wed, 1 May 2013 21:55:38 -0700
>
> > I think the positions of those bits matter, and adding
> > NETIF_F_HW_VLAN_STAG_*_BIT randomly in the middle broke things. That's
> > backed up by the fact that we have things like
> >
> > __UNUSED_NETIF_F_1
> >
> > and
> >
> > /**/NETIF_F_GSO_SHIFT, /* keep the order of SKB_GSO_* bits */
> > NETIF_F_TSO_BIT /* ... TCPv4 segmentation */
> > = NETIF_F_GSO_SHIFT,
> >
> > in that array. There is some ordering, and there is some meaning to
> > the bit numbers, and adding the *_STAG_* bits in the middle broke some
> > subtle dependency.
>
> The other thing this does is it pushes some bits past bit 31.
>
> netdev_features_t, which holds these masks, is 64-bit but we've
> already seen one place in a driver where a 32-bit value was being
> used.
>
> I'll look more deeply into this, thanks.

I'll also have a look at this.

David Miller

unread,
May 2, 2013, 4:20:02 AM5/2/13
to
From: Patrick McHardy <ka...@trash.net>
Date: Thu, 2 May 2013 09:03:37 +0200

> I'll also have a look at this.

By the mere existence of /sys/devices/${DEV_PATH}/net/${netdev_name}/flags
we have to preserve the bit layout.

So Linus was right.

So network manager is probably reading that flags sysfs file and
interpreting it.

I'll fix the layout to how it was before.

Patrick McHardy

unread,
May 2, 2013, 4:40:02 AM5/2/13
to
On Thu, May 02, 2013 at 04:16:25AM -0400, David Miller wrote:
> From: Patrick McHardy <ka...@trash.net>
> Date: Thu, 2 May 2013 09:03:37 +0200
>
> > I'll also have a look at this.
>
> By the mere existence of /sys/devices/${DEV_PATH}/net/${netdev_name}/flags
> we have to preserve the bit layout.
>
> So Linus was right.
>
> So network manager is probably reading that flags sysfs file and
> interpreting it.

Right, that seems plausible.

> I'll fix the layout to how it was before.

I also found one spot in net/core/dev.c which was using an int for the
features. Patch attached.


01.patch

David Miller

unread,
May 2, 2013, 4:40:02 AM5/2/13
to

Commit 8ad227ff89a7e6f05d07cd0acfd95ed3a24450ca ("net: vlan: add
802.1ad support") added some new NETIF_F_* features bits, but it
added them in the middle of existing values.

Userland depends upon the flag bits via the per-netdevice 'flags'
sysfs file.

So restore the previous ordering by adding the new flags at the
end.

Reported-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: David S. Miller <da...@davemloft.net>
---

Linus, just apply this to your tree directly if it works for you.

It obviously should since this is essentially what you tested already.

Thanks.

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index cbaa027..09906b7 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -25,9 +25,6 @@ enum {
NETIF_F_HW_VLAN_CTAG_TX_BIT, /* Transmit VLAN CTAG HW acceleration */
NETIF_F_HW_VLAN_CTAG_RX_BIT, /* Receive VLAN CTAG HW acceleration */
NETIF_F_HW_VLAN_CTAG_FILTER_BIT,/* Receive filtering on VLAN CTAGs */
- NETIF_F_HW_VLAN_STAG_TX_BIT, /* Transmit VLAN STAG HW acceleration */
- NETIF_F_HW_VLAN_STAG_RX_BIT, /* Receive VLAN STAG HW acceleration */
- NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
NETIF_F_VLAN_CHALLENGED_BIT, /* Device cannot handle VLAN packets */
NETIF_F_GSO_BIT, /* Enable software GSO. */
NETIF_F_LLTX_BIT, /* LockLess TX - deprecated. Please */
@@ -59,6 +56,9 @@ enum {
NETIF_F_LOOPBACK_BIT, /* Enable loopback */
NETIF_F_RXFCS_BIT, /* Append FCS to skb pkt data */
NETIF_F_RXALL_BIT, /* Receive errored frames too */
+ NETIF_F_HW_VLAN_STAG_TX_BIT, /* Transmit VLAN STAG HW acceleration */
+ NETIF_F_HW_VLAN_STAG_RX_BIT, /* Receive VLAN STAG HW acceleration */
+ NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */

/*
* Add your fresh new feature above and remember to update
@@ -86,9 +86,6 @@ enum {
#define NETIF_F_HW_VLAN_CTAG_FILTER __NETIF_F(HW_VLAN_CTAG_FILTER)
#define NETIF_F_HW_VLAN_CTAG_RX __NETIF_F(HW_VLAN_CTAG_RX)
#define NETIF_F_HW_VLAN_CTAG_TX __NETIF_F(HW_VLAN_CTAG_TX)
-#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
-#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
-#define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)
#define NETIF_F_IP_CSUM __NETIF_F(IP_CSUM)
#define NETIF_F_IPV6_CSUM __NETIF_F(IPV6_CSUM)
#define NETIF_F_LLTX __NETIF_F(LLTX)
@@ -110,6 +107,9 @@ enum {
#define NETIF_F_RXALL __NETIF_F(RXALL)
#define NETIF_F_GSO_GRE __NETIF_F(GSO_GRE)
#define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL)
+#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
+#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
+#define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)

/* Features valid for ethtool to change */
/* = all defined minus driver/device-class-related */

Bjørn Mork

unread,
May 2, 2013, 5:10:02 AM5/2/13
to
And a couple more attached.

I am also wondering about the consequences of the
ETHTOOL_DEV_FEATURE_WORDS calculation in ethtool.c. Adding the new
netdev features will make it go from 1 to 2:

#define ETHTOOL_DEV_FEATURE_WORDS ((NETDEV_FEATURE_COUNT + 31) / 32)


But this constant seems to be part of the userspace API AFAICS, so it
cannot just change like that:

static int ethtool_set_features(struct net_device *dev, void __user *useraddr)
{
struct ethtool_sfeatures cmd;
struct ethtool_set_features_block features[ETHTOOL_DEV_FEATURE_WORDS];
netdev_features_t wanted = 0, valid = 0;
int i, ret = 0;

if (copy_from_user(&cmd, useraddr, sizeof(cmd)))
return -EFAULT;
useraddr += sizeof(cmd);

if (cmd.size != ETHTOOL_DEV_FEATURE_WORDS)
return -EINVAL;

..


Is this correctly analyzed? If so, then I have no clue how to fix
that...



Bjørn


0001-net-vlan-ethtool-netdev_features_t-is-more-than-32-b.patch

David Miller

unread,
May 2, 2013, 5:20:01 AM5/2/13
to
From: Bjørn Mork <bj...@mork.no>
Date: Thu, 02 May 2013 11:06:42 +0200

> Adding the new netdev features will make it go from 1 to 2:

We already had more than 31 feature bits before Patrick's
changes, and I'm pretty sure this was the case when we added
that ethtool API.

Bjørn Mork

unread,
May 2, 2013, 6:30:01 AM5/2/13
to
David Miller <da...@davemloft.net> writes:
> From: Bjørn Mork <bj...@mork.no>
> Date: Thu, 02 May 2013 11:06:42 +0200
>
>> Adding the new netdev features will make it go from 1 to 2:
>
> We already had more than 31 feature bits before Patrick's
> changes, and I'm pretty sure this was the case when we added
> that ethtool API.

Oh, thanks. Then I misunderstood the issue. This is one case where I'm
happy to be wrong :)


Bjørn

Ben Hutchings

unread,
May 2, 2013, 6:30:02 AM5/2/13
to
On Thu, 2013-05-02 at 05:17 -0400, David Miller wrote:
> From: Bjørn Mork <bj...@mork.no>
> Date: Thu, 02 May 2013 11:06:42 +0200
>
> > Adding the new netdev features will make it go from 1 to 2:
>
> We already had more than 31 feature bits before Patrick's
> changes, and I'm pretty sure this was the case when we added
> that ethtool API.

It wasn't, but this should be OK. Userland is supposed to query the
number of features using ETHTOOL_GSSET_INFO and then work out the number
of words/blocks using FEATURE_BITS_TO_BLOCKS().

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

Bjørn Mork

unread,
May 2, 2013, 8:00:01 AM5/2/13
to
Ben Hutchings <bhutc...@solarflare.com> writes:
> On Thu, 2013-05-02 at 05:17 -0400, David Miller wrote:
>> From: Bjørn Mork <bj...@mork.no>
>> Date: Thu, 02 May 2013 11:06:42 +0200
>>
>> > Adding the new netdev features will make it go from 1 to 2:
>>
>> We already had more than 31 feature bits before Patrick's
>> changes, and I'm pretty sure this was the case when we added
>> that ethtool API.
>
> It wasn't, but this should be OK. Userland is supposed to query the
> number of features using ETHTOOL_GSSET_INFO and then work out the number
> of words/blocks using FEATURE_BITS_TO_BLOCKS().


Looking at
http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/platform/nm-linux-platform.c#n1025
there seems to be a couple of bugs in this area. This is certainly
abusing the exported API, but it does mean that NM breaks if you ever
move NETIF_F_VLAN_CHALLENGED (like the 802.1ad patch did):

----
#define NETIF_F_VLAN_CHALLENGED (1 << 10)

static gboolean
link_supports_vlans (NMPlatform *platform, int ifindex)
{
auto_nl_object struct rtnl_link *rtnllink = link_get (platform, ifindex);
const char *name = nm_platform_link_get_name (ifindex);
struct {
struct ethtool_gfeatures features;
struct ethtool_get_features_block features_block;
} edata = { .features = { .cmd = ETHTOOL_GFEATURES, .size = 1 } };

/* Only ARPHRD_ETHER links can possibly support VLANs. */
if (!rtnllink || rtnl_link_get_arptype (rtnllink) != ARPHRD_ETHER)
return FALSE;

if (!name || !ethtool_get (name, &edata))
return FALSE;

return !(edata.features.features[0].active & NETIF_F_VLAN_CHALLENGED);
}
----


Not that I see how this particular bug matters unless you need VLAN
support in NM. But there could be similar issues around. I guess
avoiding unnecessary renumbering of the NETIF_F bits can save us some
trouble. Although you can certainly argue that those bits never were
intended to be part of the API, and that using them like this is a user
application bug.



Bjørn

Michał Mirosław

unread,
May 2, 2013, 12:30:03 PM5/2/13
to
2013/5/2 Bjørn Mork <bj...@mork.no>:
This is certainly a bug in NM, and a fresh one: commit
b636ea86b1c0a28b77eda311c84d3b2417cad22e from 2013-04-10 14:40:58
(GMT). Userspace is expected to use ETHTOOL_GSTRINGS for
ETH_SS_FEATURES and find a corresponding bit position by feature name
("vlan-challenged" in this case).

Cc: commit's author.

Best Regards,
Michał Mirosław

Dan Williams

unread,
May 2, 2013, 12:30:03 PM5/2/13
to
On Thu, 2013-05-02 at 13:51 +0200, Bjørn Mork wrote:
> Ben Hutchings <bhutc...@solarflare.com> writes:
> > On Thu, 2013-05-02 at 05:17 -0400, David Miller wrote:
> >> From: Bjørn Mork <bj...@mork.no>
> >> Date: Thu, 02 May 2013 11:06:42 +0200
> >>
> >> > Adding the new netdev features will make it go from 1 to 2:
> >>
> >> We already had more than 31 feature bits before Patrick's
> >> changes, and I'm pretty sure this was the case when we added
> >> that ethtool API.
> >
> > It wasn't, but this should be OK. Userland is supposed to query the
> > number of features using ETHTOOL_GSSET_INFO and then work out the number
> > of words/blocks using FEATURE_BITS_TO_BLOCKS().
>
>
> Looking at
> http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/platform/nm-linux-platform.c#n1025
> there seems to be a couple of bugs in this area. This is certainly
> abusing the exported API, but it does mean that NM breaks if you ever
> move NETIF_F_VLAN_CHALLENGED (like the 802.1ad patch did):

NM doesn't actually use any of the code in src/platform/ yet, so it
wouldn't be affecting anything NM does at this time. However, comments
like this are quite useful so we can fix it before NM does start to
depend on the code :)

Dan
> To unsubscribe from this list: send the line "unsubscribe netdev" in

Dan Williams

unread,
May 2, 2013, 12:40:02 PM5/2/13
to
On Wed, 2013-05-01 at 18:28 -0700, Linus Torvalds wrote:
> On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
> >
> > Highlights (1721 non-merge commits, this has to be a record of some
> > sort):
>
> Lowlight: it completely breaks my machine with r8169 ethernet. In the
> networkmanager applet, it claims no cable connection, which is a bit
> odd, because (a) it works with an older kernel and (b) the kernel
> messages actually say
>
> r8169 0000:01:00.0 eth0: link up

NM calls ETHTOOL_GLINK and if that returns success, NM expects the
driver to support carrier detection. NM then listens to netlink for
device flags changes, and uses IFF_LOWER_UP to determine carrier on/off
state. NM does not use NETIF_F_VLAN_CHALLENGED or any of the other
ETHTOOL_GFEATURES flags yet, but will in the future.

Is the link status accurately reflected by /sys/class/net/eth0/carrier ?

Dan


> but nothing actually works.
>
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:01:00.0 eth0: RTL8168d/8111d at 0xffffc90010ece000,
> e0:cb:4e:95:1a:d7, XID 083000c0 IRQ 53
> r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx
> checksumming: ko]
>
> Any ideas?
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in

David Miller

unread,
May 2, 2013, 1:50:02 PM5/2/13
to
From: Dan Williams <dc...@redhat.com>
Date: Thu, 02 May 2013 11:34:27 -0500

> On Wed, 2013-05-01 at 18:28 -0700, Linus Torvalds wrote:
>> On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
>> >
>> > Highlights (1721 non-merge commits, this has to be a record of some
>> > sort):
>>
>> Lowlight: it completely breaks my machine with r8169 ethernet. In the
>> networkmanager applet, it claims no cable connection, which is a bit
>> odd, because (a) it works with an older kernel and (b) the kernel
>> messages actually say
>>
>> r8169 0000:01:00.0 eth0: link up
>
> NM calls ETHTOOL_GLINK and if that returns success, NM expects the
> driver to support carrier detection. NM then listens to netlink for
> device flags changes, and uses IFF_LOWER_UP to determine carrier on/off
> state. NM does not use NETIF_F_VLAN_CHALLENGED or any of the other
> ETHTOOL_GFEATURES flags yet, but will in the future.
>
> Is the link status accurately reflected by /sys/class/net/eth0/carrier ?

Something cares about the .../eth0/flags value because with the bit
ordering different for bits 10 and above things break.

David Miller

unread,
May 2, 2013, 2:00:02 PM5/2/13
to
From: Michał Mirosław <mir...@gmail.com>
Date: Thu, 2 May 2013 19:47:25 +0200

> 2013/5/2 David Miller <da...@davemloft.net>:
>> From: Dan Williams <dc...@redhat.com>
>> Date: Thu, 02 May 2013 11:34:27 -0500
>>
>>> On Wed, 2013-05-01 at 18:28 -0700, Linus Torvalds wrote:
>>>> On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
>>>> >
>>>> > Highlights (1721 non-merge commits, this has to be a record of some
>>>> > sort):
>>>>
>>>> Lowlight: it completely breaks my machine with r8169 ethernet. In the
>>>> networkmanager applet, it claims no cable connection, which is a bit
>>>> odd, because (a) it works with an older kernel and (b) the kernel
>>>> messages actually say
>>>>
>>>> r8169 0000:01:00.0 eth0: link up
>>>
>>> NM calls ETHTOOL_GLINK and if that returns success, NM expects the
>>> driver to support carrier detection. NM then listens to netlink for
>>> device flags changes, and uses IFF_LOWER_UP to determine carrier on/off
>>> state. NM does not use NETIF_F_VLAN_CHALLENGED or any of the other
>>> ETHTOOL_GFEATURES flags yet, but will in the future.
>>>
>>> Is the link status accurately reflected by /sys/class/net/eth0/carrier ?
>> Something cares about the .../eth0/flags value because with the bit
>> ordering different for bits 10 and above things break.
>
> Are you sure it's "flags"? /sys/class/net/*/features were removed some
> time ago, and flags don't depend on NETIF_F_*.

Oh, it's flags, which is printed in raw hex by net/core/net-sysfs.c:

NETDEVICE_SHOW(flags, fmt_hex);

David Miller

unread,
May 2, 2013, 2:00:02 PM5/2/13
to
From: Patrick McHardy <ka...@trash.net>
Date: Thu, 2 May 2013 10:36:49 +0200

> commit d0ed96507b21da2668766207f68518bef3193111
> Author: Patrick McHardy <ka...@trash.net>
> Date: Thu May 2 10:35:02 2013 +0200
>
> net: use netdev_features_t in skb_needs_linearize()
>
> Signed-off-by: Patrick McHardy <ka...@trash.net>

Applied and queued up for -stable, thanks.

Michał Mirosław

unread,
May 2, 2013, 2:00:03 PM5/2/13
to
2013/5/2 David Miller <da...@davemloft.net>:
> From: Dan Williams <dc...@redhat.com>
> Date: Thu, 02 May 2013 11:34:27 -0500
>
>> On Wed, 2013-05-01 at 18:28 -0700, Linus Torvalds wrote:
>>> On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
>>> >
>>> > Highlights (1721 non-merge commits, this has to be a record of some
>>> > sort):
>>>
>>> Lowlight: it completely breaks my machine with r8169 ethernet. In the
>>> networkmanager applet, it claims no cable connection, which is a bit
>>> odd, because (a) it works with an older kernel and (b) the kernel
>>> messages actually say
>>>
>>> r8169 0000:01:00.0 eth0: link up
>>
>> NM calls ETHTOOL_GLINK and if that returns success, NM expects the
>> driver to support carrier detection. NM then listens to netlink for
>> device flags changes, and uses IFF_LOWER_UP to determine carrier on/off
>> state. NM does not use NETIF_F_VLAN_CHALLENGED or any of the other
>> ETHTOOL_GFEATURES flags yet, but will in the future.
>>
>> Is the link status accurately reflected by /sys/class/net/eth0/carrier ?
> Something cares about the .../eth0/flags value because with the bit
> ordering different for bits 10 and above things break.

Are you sure it's "flags"? /sys/class/net/*/features were removed some
time ago, and flags don't depend on NETIF_F_*.

Best Regards,
Michał Mirosław

David Miller

unread,
May 2, 2013, 2:10:01 PM5/2/13
to
From: David Miller <da...@davemloft.net>
Date: Thu, 02 May 2013 13:55:42 -0400 (EDT)

> From: Michał Mirosław <mir...@gmail.com>
> Date: Thu, 2 May 2013 19:47:25 +0200
>
>> Are you sure it's "flags"? /sys/class/net/*/features were removed some
>> time ago, and flags don't depend on NETIF_F_*.
>
> Oh, it's flags, which is printed in raw hex by net/core/net-sysfs.c:
>
> NETDEVICE_SHOW(flags, fmt_hex);

Oh you're right, I'm confusing netdev->flags with netdev->features, my bad.

More coffee needed :)

David Miller

unread,
May 2, 2013, 2:10:01 PM5/2/13
to
From: Bjørn Mork <bj...@mork.no>
Date: Thu, 02 May 2013 11:06:42 +0200

> From d957cf339bf625869c39d852ac6733ef597ecef9 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bj...@mork.no>
> Date: Thu, 2 May 2013 10:37:05 +0200
> Subject: [PATCH] net: vlan,ethtool: netdev_features_t is more than 32 bit
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Signed-off-by: Bjørn Mork <bj...@mork.no>

Also applied and queued up for -stable.

These changes show me that this special type isn't providing type
safety in the way that we actually need it.

Something like how we do the MM page table types would work better:

typedef struct { u64 val; } netdev_features_t;

#define __netdev_feature(X) ((netdev_features_t) { X } )

and also with the appropriate set of accessors.

Then you can't get it wrong without a compile error.

But this is net-next material of course.

Dan Williams

unread,
May 2, 2013, 2:20:01 PM5/2/13
to
On Thu, 2013-05-02 at 14:06 -0400, David Miller wrote:
> From: David Miller <da...@davemloft.net>
> Date: Thu, 02 May 2013 13:55:42 -0400 (EDT)
>
> > From: Michał Mirosław <mir...@gmail.com>
> > Date: Thu, 2 May 2013 19:47:25 +0200
> >
> >> Are you sure it's "flags"? /sys/class/net/*/features were removed some
> >> time ago, and flags don't depend on NETIF_F_*.
> >
> > Oh, it's flags, which is printed in raw hex by net/core/net-sysfs.c:
> >
> > NETDEVICE_SHOW(flags, fmt_hex);
>
> Oh you're right, I'm confusing netdev->flags with netdev->features, my bad.

Was just going to ask about that. NM only cares about 'flags', not
features, at this point in time. Relevant code for carrier notification
in NM, using libnl helpers:

link_obj = (struct rtnl_link *) obj;
flags = rtnl_link_get_flags (link_obj);
ifidx = rtnl_link_get_ifindex (link_obj);

nm_log_dbg (LOGD_HW, "netlink link message: iface idx %d flags 0x%X",
ifidx, flags);

if (flags & IFF_LOWER_UP)
g_signal_emit (self, signals[CARRIER_ON], 0, ifidx);
else
g_signal_emit (self, signals[CARRIER_OFF], 0, ifidx);

So it's got to be something else other than the netdev features ordering
that's screwing up for Linus. I'm sure we'd have heard about it a long,
long time ago if something had messed up dev->flags bits...

Dan

Dan Williams

unread,
May 2, 2013, 2:30:02 PM5/2/13
to
On Thu, 2013-05-02 at 11:34 -0500, Dan Williams wrote:
> On Wed, 2013-05-01 at 18:28 -0700, Linus Torvalds wrote:
> > On Wed, May 1, 2013 at 1:47 PM, David Miller <da...@davemloft.net> wrote:
> > >
> > > Highlights (1721 non-merge commits, this has to be a record of some
> > > sort):
> >
> > Lowlight: it completely breaks my machine with r8169 ethernet. In the
> > networkmanager applet, it claims no cable connection, which is a bit
> > odd, because (a) it works with an older kernel and (b) the kernel
> > messages actually say
> >
> > r8169 0000:01:00.0 eth0: link up
>
> NM calls ETHTOOL_GLINK and if that returns success, NM expects the
> driver to support carrier detection. NM then listens to netlink for
> device flags changes, and uses IFF_LOWER_UP to determine carrier on/off
> state. NM does not use NETIF_F_VLAN_CHALLENGED or any of the other
> ETHTOOL_GFEATURES flags yet, but will in the future.
>
> Is the link status accurately reflected by /sys/class/net/eth0/carrier ?

One more question Linus; what version of NetworkManager, and can you
grab some syslog 'daemon' facility logs grepped for "
NetworkManger" | "eth0"?

When r8169 says "link up" I would expect
that /sys/class/net/eth0/carrier is '1', and that NetworkManager then
prints something in the logs about "carrier is ON". Output of the eth0
section of 'nm-tool' would be useful here as well, specifically the
"State" and "Carrier" lines.

Dan

David Miller

unread,
May 2, 2013, 2:30:03 PM5/2/13
to
From: Dan Williams <dc...@redhat.com>
Date: Thu, 02 May 2013 13:15:32 -0500

> So it's got to be something else other than the netdev features ordering
> that's screwing up for Linus. I'm sure we'd have heard about it a long,
> long time ago if something had messed up dev->flags bits...

We know for a fact that reordering the feature bits makes the link not
show as up in NM.

And we know that, when this happens, the driver and the kernel
internally both know that the link is up.

But how this ties into NM being confused about the link state is what
we're trying to find out. It's still a mystery to me too.

Linus Torvalds

unread,
May 2, 2013, 3:00:05 PM5/2/13
to
On Thu, May 2, 2013 at 11:24 AM, Dan Williams <dc...@redhat.com> wrote:
>
> One more question Linus; what version of NetworkManager,

NetworkManager-0.9.8.1-1.git20130327.fc18.x86_64

> and can you
> grab some syslog 'daemon' facility logs grepped for "
> NetworkManger" | "eth0"?

Here's the NetworkManager part of /var/log/messages from a bad boot:

NetworkManager[540]: <info> WEXT support is enabled
NetworkManager[540]: <info> VPN: loaded org.freedesktop.NetworkManager.openvpn
NetworkManager[540]: <info> VPN: loaded org.freedesktop.NetworkManager.vpnc
NetworkManager[540]: <info> VPN: loaded
org.freedesktop.NetworkManager.openconnect
NetworkManager[540]: <info> VPN: loaded org.freedesktop.NetworkManager.pptp
NetworkManager[540]: ifcfg-rh: Acquired D-Bus service com.redhat.ifcfgrh1
NetworkManager[540]: <info> Loaded plugin ifcfg-rh: (c) 2007 - 2010
Red Hat, Inc. To report bugs please use the NetworkManager mailing
list.
NetworkManager[540]: <info> Loaded plugin keyfile: (c) 2007 - 2010
Red Hat, Inc. To report bugs please use the NetworkManager mailing
list.
NetworkManager[540]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-lo ...
NetworkManager[540]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-eth0 ...
NetworkManager[540]: ifcfg-rh: read connection 'System eth0'
NetworkManager[540]: <info> monitoring kernel firmware directory
'/lib/firmware'.
NetworkManager[540]: <info> WiFi enabled by radio killswitch;
enabled by state file
NetworkManager[540]: <info> WWAN enabled by radio killswitch;
enabled by state file
NetworkManager[540]: <info> WiMAX enabled by radio killswitch;
enabled by state file
NetworkManager[540]: <info> Networking is enabled by state file
NetworkManager[540]: <warn> failed to allocate link cache: (-10)
Operation not supported
NetworkManager[540]: <info> (eth0): carrier is OFF
NetworkManager[540]: <info> (eth0): new Ethernet device (driver:
'r8169' ifindex: 2)
NetworkManager[540]: <info> (eth0): exported as
/org/freedesktop/NetworkManager/Devices/0
NetworkManager[540]: <info> (eth0): device state change: unmanaged
-> unavailable (reason 'managed') [10 20 2]
NetworkManager[540]: <info> (eth0): bringing up device.
NetworkManager[540]: <info> (eth0): preparing device.
NetworkManager[540]: <info> (eth0): deactivating device (reason 'managed') [2]
NetworkManager[540]: <warn> /sys/devices/virtual/net/lo: couldn't
determine device driver; ignoring...
NetworkManager[540]: <warn> /sys/devices/virtual/net/lo: couldn't
determine device driver; ignoring...
NetworkManager[540]: <info> (eth0): carrier now ON (device state 20)

and that's all it says. hHere's a good boot:

NetworkManager[547]: <info> NetworkManager (version
0.9.8.1-1.git20130327.fc18) is starting...
NetworkManager[547]: <info> Read config file
/etc/NetworkManager/NetworkManager.conf
NetworkManager[547]: <info> WEXT support is enabled
NetworkManager[547]: <info> VPN: loaded org.freedesktop.NetworkManager.openvpn
NetworkManager[547]: <info> VPN: loaded org.freedesktop.NetworkManager.vpnc
NetworkManager[547]: <info> VPN: loaded
org.freedesktop.NetworkManager.openconnect
NetworkManager[547]: <info> VPN: loaded org.freedesktop.NetworkManager.pptp
NetworkManager[547]: ifcfg-rh: Acquired D-Bus service com.redhat.ifcfgrh1
NetworkManager[547]: <info> Loaded plugin ifcfg-rh: (c) 2007 - 2010
Red Hat, Inc. To report bugs please use the NetworkManager mailing
list.
NetworkManager[547]: <info> Loaded plugin keyfile: (c) 2007 - 2010
Red Hat, Inc. To report bugs please use the NetworkManager mailing
list.
NetworkManager[547]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-lo ...
NetworkManager[547]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-eth0 ...
NetworkManager[547]: ifcfg-rh: read connection 'System eth0'
NetworkManager[547]: <info> monitoring kernel firmware directory
'/lib/firmware'.
NetworkManager[547]: <info> WiFi enabled by radio killswitch;
enabled by state file
NetworkManager[547]: <info> WWAN enabled by radio killswitch;
enabled by state file
NetworkManager[547]: <info> WiMAX enabled by radio killswitch;
enabled by state file
NetworkManager[547]: <info> Networking is enabled by state file
NetworkManager[547]: <warn> failed to allocate link cache: (-10)
Operation not supported
NetworkManager[547]: <info> (eth0): carrier is OFF
NetworkManager[547]: <info> (eth0): new Ethernet device (driver:
'r8169' ifindex: 2)
NetworkManager[547]: <info> (eth0): exported as
/org/freedesktop/NetworkManager/Devices/0
NetworkManager[547]: <info> (eth0): device state change: unmanaged
-> unavailable (reason 'managed') [10 20 2]
NetworkManager[547]: <info> (eth0): bringing up device.
NetworkManager[547]: <info> (eth0): preparing device.
NetworkManager[547]: <info> (eth0): deactivating device (reason 'managed') [2]
NetworkManager[547]: <warn> /sys/devices/virtual/net/lo: couldn't
determine device driver; ignoring...
NetworkManager[547]: <warn> /sys/devices/virtual/net/lo: couldn't
determine device driver; ignoring...
NetworkManager[547]: <info> (eth0): carrier now ON (device state 20)
NetworkManager[547]: <info> (eth0): device state change: unavailable
-> disconnected (reason 'carrier-changed') [20 30 40]
NetworkManager[547]: <info> Auto-activating connection 'System eth0'.
NetworkManager[547]: <info> Activation (eth0) starting connection
'System eth0'
NetworkManager[547]: <info> (eth0): device state change:
disconnected -> prepare (reason 'none') [30 40 0]
NetworkManager[547]: <info> Activation (eth0) Stage 1 of 5 (Device
Prepare) scheduled...
NetworkManager[547]: <info> Activation (eth0) Stage 1 of 5 (Device
Prepare) started...
NetworkManager[547]: <info> Activation (eth0) Stage 2 of 5 (Device
Configure) scheduled...
NetworkManager[547]: <info> Activation (eth0) Stage 1 of 5 (Device
Prepare) complete.
NetworkManager[547]: <info> Activation (eth0) Stage 2 of 5 (Device
Configure) starting...
NetworkManager[547]: <info> (eth0): device state change: prepare ->
config (reason 'none') [40 50 0]
NetworkManager[547]: <info> Activation (eth0) Stage 2 of 5 (Device
Configure) successful.
NetworkManager[547]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) scheduled.
NetworkManager[547]: <info> Activation (eth0) Stage 2 of 5 (Device
Configure) complete.
NetworkManager[547]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) started...
NetworkManager[547]: <info> (eth0): device state change: config ->
ip-config (reason 'none') [50 70 0]
NetworkManager[547]: <info> Activation (eth0) Beginning DHCPv4
transaction (timeout in 45 seconds)
NetworkManager[547]: <info> dhclient started with pid 910
NetworkManager[547]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) complete.
NetworkManager[547]: <info> (eth0): DHCPv4 state changed nbi -> preinit
NetworkManager[547]: <info> (eth0): DHCPv4 state changed preinit -> reboot
NetworkManager[547]: <info> address 192.168.0.52
NetworkManager[547]: <info> prefix 24 (255.255.255.0)
NetworkManager[547]: <info> gateway 192.168.0.1
NetworkManager[547]: <info> nameserver '192.168.0.1'
NetworkManager[547]: <info> Activation (eth0) Stage 5 of 5 (IPv4
Configure Commit) scheduled...
NetworkManager[547]: <info> Activation (eth0) Stage 5 of 5 (IPv4
Commit) started...
NetworkManager[547]: <info> (eth0): device state change: ip-config
-> secondaries (reason 'none') [70 90 0]
NetworkManager[547]: <info> Activation (eth0) Stage 5 of 5 (IPv4
Commit) complete.
NetworkManager[547]: <info> (eth0): device state change: secondaries
-> activated (reason 'none') [90 100 0]
NetworkManager[547]: <info> Policy set 'System eth0' (eth0) as
default for IPv4 routing and DNS.
NetworkManager[547]: <info> Activation (eth0) successful, device activated.

so apparently NM does see carrier even in the bad state, and it is
just the NM-applet that then says "cable disconnected". Who knows what
goes through NM-applets tiny little mind? Not me.

Anyway, they seem to be identical (apart from the pid change, of
course) up until the good case says

NetworkManager[540]: <info> (eth0): device state change: unavailable
-> disconnected (reason 'carrier-changed') [20 30 40]

and with a bad kernel, that just never happens. Don't ask me why. In
both cases, the preceding kernel messages were

r8169 0000:01:00.0 eth0: link up
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

but for some reason in the bad case NM just doesn't react past the
"carrier now ON" part.

Linus
--

Linus Torvalds

unread,
May 2, 2013, 3:00:05 PM5/2/13
to
On Thu, May 2, 2013 at 11:52 AM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> but for some reason in the bad case NM just doesn't react past the
> "carrier now ON" part.

Grr. That was much more readable while editing, then gmail ended up
doing its "smart line wrap" thing and my nice long lines became a
mess. I think you can still figure it out,

John Stoffel

unread,
May 2, 2013, 3:10:02 PM5/2/13
to
>>>>> "David" == David Miller <da...@davemloft.net> writes:

David> From: Bjørn Mork <bj...@mork.no>
David> Date: Thu, 02 May 2013 11:06:42 +0200

>> From d957cf339bf625869c39d852ac6733ef597ecef9 Mon Sep 17 00:00:00 2001
>> From: Bjørn Mork <bj...@mork.no>
>> Date: Thu, 2 May 2013 10:37:05 +0200
>> Subject: [PATCH] net: vlan,ethtool: netdev_features_t is more than 32 bit
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Signed-off-by: Bjørn Mork <bj...@mork.no>

David> Also applied and queued up for -stable.

David> These changes show me that this special type isn't providing type
David> safety in the way that we actually need it.

David> Something like how we do the MM page table types would work better:

David> typedef struct { u64 val; } netdev_features_t;

David> #define __netdev_feature(X) ((netdev_features_t) { X } )

David> and also with the appropriate set of accessors.

David> Then you can't get it wrong without a compile error.

Isn't part of the problem that you're exporting it into /sys in a
binary format? Why not just have each flag as it's own file and
value? Sure, it's a waste in some ways, but then it makes it simpler
to just do an 'opendir()' to see if the flag exists, much less what
it's set to.

John

Dan Williams

unread,
May 2, 2013, 3:20:01 PM5/2/13
to
On Thu, 2013-05-02 at 11:53 -0700, Linus Torvalds wrote:
> On Thu, May 2, 2013 at 11:52 AM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
> >
> > but for some reason in the bad case NM just doesn't react past the
> > "carrier now ON" part.
>
> Grr. That was much more readable while editing, then gmail ended up
> doing its "smart line wrap" thing and my nice long lines became a
> mess. I think you can still figure it out,

Yeah, the dump is good, and it seems to point to a problem in NM for
now. I'm investigating.

Dan

Ben Hutchings

unread,
May 2, 2013, 4:20:01 PM5/2/13
to
On Thu, 2013-05-02 at 14:53 -0400, John Stoffel wrote:
> >>>>> "David" == David Miller <da...@davemloft.net> writes:
>
> David> From: Bjørn Mork <bj...@mork.no>
> David> Date: Thu, 02 May 2013 11:06:42 +0200
>
> >> From d957cf339bf625869c39d852ac6733ef597ecef9 Mon Sep 17 00:00:00 2001
> >> From: Bjørn Mork <bj...@mork.no>
> >> Date: Thu, 2 May 2013 10:37:05 +0200
> >> Subject: [PATCH] net: vlan,ethtool: netdev_features_t is more than 32 bit
> >> MIME-Version: 1.0
> >> Content-Type: text/plain; charset=UTF-8
> >> Content-Transfer-Encoding: 8bit
> >>
> >> Signed-off-by: Bjørn Mork <bj...@mork.no>
>
> David> Also applied and queued up for -stable.
>
> David> These changes show me that this special type isn't providing type
> David> safety in the way that we actually need it.
>
> David> Something like how we do the MM page table types would work better:
>
> David> typedef struct { u64 val; } netdev_features_t;
>
> David> #define __netdev_feature(X) ((netdev_features_t) { X } )
>
> David> and also with the appropriate set of accessors.
>
> David> Then you can't get it wrong without a compile error.
>
> Isn't part of the problem that you're exporting it into /sys in a
> binary format?
[...]

Features are exported through SIOCETHTOOL, not sysfs (though they *used*
to be there).

The 'flags' attribue in sysfs is something different.

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

Dan Williams

unread,
May 2, 2013, 4:30:02 PM5/2/13
to
On Thu, 2013-05-02 at 14:17 -0500, Dan Williams wrote:
> On Thu, 2013-05-02 at 11:53 -0700, Linus Torvalds wrote:
> > On Thu, May 2, 2013 at 11:52 AM, Linus Torvalds
> > <torv...@linux-foundation.org> wrote:
> > >
> > > but for some reason in the bad case NM just doesn't react past the
> > > "carrier now ON" part.
> >
> > Grr. That was much more readable while editing, then gmail ended up
> > doing its "smart line wrap" thing and my nice long lines became a
> > mess. I think you can still figure it out,
>
> Yeah, the dump is good, and it seems to point to a problem in NM for
> now. I'm investigating.

If you don't mind helping me confirm/reject a theory, would you mind:

1) drop "level=debug" into the [logging] section
of /etc/NetworkManager/NetworkManager.conf (create the section if it
doesn't exist)

2) reboot (or just rmmod/modprobe r8169 if that's enough to trigger it)

3) grab syslog and send it to me (privately if you wish)

Thanks!

John Stoffel

unread,
May 2, 2013, 4:50:03 PM5/2/13
to
>>>>> "Ben" == Ben Hutchings <bhutc...@solarflare.com> writes:

Ben> On Thu, 2013-05-02 at 14:53 -0400, John Stoffel wrote:
>> >>>>> "David" == David Miller <da...@davemloft.net> writes:
>>
David> From: Bj�rn Mork <bj...@mork.no>
David> Date: Thu, 02 May 2013 11:06:42 +0200
>>
>> >> From d957cf339bf625869c39d852ac6733ef597ecef9 Mon Sep 17 00:00:00 2001
>> >> From: Bj�rn Mork <bj...@mork.no>
>> >> Date: Thu, 2 May 2013 10:37:05 +0200
>> >> Subject: [PATCH] net: vlan,ethtool: netdev_features_t is more than 32 bit
>> >> MIME-Version: 1.0
>> >> Content-Type: text/plain; charset=UTF-8
>> >> Content-Transfer-Encoding: 8bit
>> >>
>> >> Signed-off-by: Bj�rn Mork <bj...@mork.no>
>>
David> Also applied and queued up for -stable.
>>
David> These changes show me that this special type isn't providing type
David> safety in the way that we actually need it.
>>
David> Something like how we do the MM page table types would work better:
>>
David> typedef struct { u64 val; } netdev_features_t;
>>
David> #define __netdev_feature(X) ((netdev_features_t) { X } )
>>
David> and also with the appropriate set of accessors.
>>
David> Then you can't get it wrong without a compile error.
>>
>> Isn't part of the problem that you're exporting it into /sys in a
>> binary format?
Ben> [...]

Ben> Features are exported through SIOCETHTOOL, not sysfs (though they *used*
Ben> to be there).

Ben> The 'flags' attribue in sysfs is something different.

THanks for the clarification.

Pavel Simerda

unread,
May 3, 2013, 7:40:02 PM5/3/13
to
Recorded the bug with NetworkManager bugzilla blocking the upcoming release:

https://bugzilla.gnome.org/show_bug.cgi?id=699649

The code is experimental and is not used by NetworkManager even in the 'master' branch except by nm-platform's automated tests. Once we fix it, you don't need to worry about it.

Thanks!

Pavel

David Miller

unread,
May 4, 2013, 10:50:01 PM5/4/13
to

1) Several routines do not use netdev_features_t to hold such bitmasks,
fixes from Patrick McHardy and Bj�rn Mork.

2) Update cpsw IRQ software state and the actual HW irq enabling in
the correct order. From Mugunthan V N.

3) When sending tipc packets to multiple bearers, we have to make copies
of the SKB rather than just giving the original SKB directly. Fix
from Gerlando Falauto.

4) Fix race with bridging topology change timer, from Stephen
Hemminger.

5) Fix TCPv6 segmentation handling in GRE and VXLAN, from Pravin B
Shelar.

6) Endian bug in USB pegasus driver, from Dan Carpenter.

7) Fix crashes on MTU reduction in USB asix driver, from Holger
Eitzenberger.

8) Don't allow the kernel to BUG() just because the user puts some
crap in an AF_PACKET mmap() ring descriptor. Fix from Daniel
Borkmann.

9) Don't use variable sized arrays on the stack in xen-netback, from
Wei Liu.

10) Fix stats reporting and an unbalanced napi_disable() in be2net
driver. From Somnath Kotur and Ajit Khaparde.

Please pull, thanks a lot!

The following changes since commit 99c6bcf46d2233d33e441834e958ed0bc22b190a:

Merge tag 'multiplatform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2013-05-02 09:38:16 -0700)

are available in the git repository at:


git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 777c2300865cb9b1b1791862ed23da677abfe6dc:

cxgb4: fix error recovery when t4_fw_hello returns a positive value (2013-05-03 16:10:34 -0400)

----------------------------------------------------------------
Ajit Khaparde (4):
be2net: Fix to use version 2 of cq_create for SkyHawk-R devices
be2net: Fix to use 32-bit stats to report rx_drops_no_fragment
be2net: Fix to show tx priority pause counter in ethtool -S
be2net: Fix to receive Multicast Packets when Promiscuous mode is enabled on certain devices

Bj�rn Mork (1):
net: vlan,ethtool: netdev_features_t is more than 32 bit

Dan Carpenter (1):
usbnet: pegasus: endian bug in write_mii_word()

Daniel Borkmann (1):
packet: tpacket_v3: do not trigger bug() on wrong header status

Gerlando Falauto (3):
tipc: cosmetic: clean up comments and break a long line
tipc: tipc_bcbearer_send(): simplify bearer selection
tipc: pskb_copy() buffers when sending on more than one bearer

Kirill Smelkov (1):
sky2: Fix crash on receiving VLAN frames

Mugunthan V N (1):
drivers: net: cpsw: irq not disabled in cpsw isr in particular sequence

Patrick McHardy (1):
net: use netdev_features_t in skb_needs_linearize()

Pravin B Shelar (2):
gre: Fix GREv4 TCPv6 segmentation.
vxlan: Fix TCPv6 segmentation.

Somnath Kotur (3):
be2net: Fix firmware download for Lancer
be2net: avoid napi_disable() when it has not been enabled
be2net: Fix to fail probe if MSI-X enable fails for a VF

Teppo Kotilainen (1):
net: qmi_wwan: Add Telewell TW-LTE 4G

Thadeu Lima de Souza Cascardo (1):
cxgb4: fix error recovery when t4_fw_hello returns a positive value

Wei Liu (3):
xen-netback: remove redundent parameter in netbk_count_requests
xen-netback: avoid allocating variable size array on stack
xen-netback: better names for thresholds

hol...@eitzenberger.org (1):
asix: fix BUG in receive path when lowering MTU

stephen hemminger (1):
bridge: fix race with topology change timer

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +-
drivers/net/ethernet/emulex/benet/be.h | 1 +
drivers/net/ethernet/emulex/benet/be_cmds.c | 35 +++++++++++++++++---------------
drivers/net/ethernet/emulex/benet/be_cmds.h | 2 +-
drivers/net/ethernet/emulex/benet/be_ethtool.c | 1 +
drivers/net/ethernet/emulex/benet/be_main.c | 54 +++++++++++++++++++++++++++++++++++--------------
drivers/net/ethernet/marvell/sky2.c | 2 ++
drivers/net/ethernet/ti/cpsw.c | 2 +-
drivers/net/usb/asix_common.c | 3 +++
drivers/net/usb/pegasus.c | 3 ++-
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/xen-netback/netback.c | 69 ++++++++++++++++++++++++++++++++++++++++-----------------------
net/8021q/vlan_dev.c | 2 +-
net/bridge/br_stp_timer.c | 2 +-
net/core/dev.c | 2 +-
net/core/ethtool.c | 2 +-
net/ipv4/af_inet.c | 1 +
net/ipv4/gre.c | 4 +++-
net/ipv4/udp.c | 7 ++++++-
net/packet/af_packet.c | 53 +++++++++++++++++++++---------------------------
net/tipc/bcast.c | 40 +++++++++++++++++++++++-------------
21 files changed, 178 insertions(+), 110 deletions(-)

Maarten Lankhorst

unread,
May 5, 2013, 6:30:02 AM5/5/13
to
Hey,

Op 05-05-13 04:42, David Miller schreef:
> 1) Several routines do not use netdev_features_t to hold such bitmasks,
> fixes from Patrick McHardy and Bjørn Mork.
>
> 2) Update cpsw IRQ software state and the actual HW irq enabling in
> the correct order. From Mugunthan V N.
>
> 3) When sending tipc packets to multiple bearers, we have to make copies
> of the SKB rather than just giving the original SKB directly. Fix
> from Gerlando Falauto.
>
> 4) Fix race with bridging topology change timer, from Stephen
> Hemminger.
>
> 5) Fix TCPv6 segmentation handling in GRE and VXLAN, from Pravin B
> Shelar.
>
> 6) Endian bug in USB pegasus driver, from Dan Carpenter.
>
> 7) Fix crashes on MTU reduction in USB asix driver, from Holger
> Eitzenberger.
>
> 8) Don't allow the kernel to BUG() just because the user puts some
> crap in an AF_PACKET mmap() ring descriptor. Fix from Daniel
> Borkmann.
>
> 9) Don't use variable sized arrays on the stack in xen-netback, from
> Wei Liu.
>
> 10) Fix stats reporting and an unbalanced napi_disable() in be2net
> driver. From Somnath Kotur and Ajit Khaparde.
>
I finally decided to try 3.10rc0, but I'm getting a lot of this:

BUG: scheduling while atomic: irq/43-eth1/1465/0x00000102
Modules linked in: adt7475 ebtable_nat ebtables nouveau ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge snd_hda_codec_hdmi stp llc ttm drm_kms_helper drm snd_hda_codec_realtek mxm_wmi intel_powerclamp kvm_intel kvm snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm video parport_pc snd_page_alloc ptp ppdev pps_core parport nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry(P)
CPU: 0 PID: 1465 Comm: irq/43-eth1 Tainted: P W 3.10.0-rc0-patser+ #70
Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
0000000000000000 ffff88041dc03cc8 ffffffff8165ea16 ffff88041dc03cd8
ffffffff8165a84a ffff88041dc03d58 ffffffff8166200d 000000000000c350
000000000000c350 ffff8803f827a180 ffff8803f8275fd8 ffff8803f8275fd8
Call Trace:
<IRQ> [<ffffffff8165ea16>] dump_stack+0x19/0x1b
[<ffffffff8165a84a>] __schedule_bug+0x48/0x56
[<ffffffff8166200d>] __schedule+0x7fa/0x832
[<ffffffff81662154>] schedule+0x29/0x59
[<ffffffff81660aa0>] schedule_hrtimeout_range_clock+0xeb/0x129
[<ffffffff81078c5f>] ? update_rmtp+0x65/0x65
[<ffffffff81660af1>] schedule_hrtimeout_range+0x13/0x15
[<ffffffff8106140a>] usleep_range+0x40/0x42
[<ffffffffa012a39a>] e1000_irq_enable+0x18b/0x202 [e1000e]
[<ffffffffa012e022>] e1000e_poll+0x1d1/0x2e2 [e1000e]
[<ffffffff81535e1b>] net_rx_action+0xaa/0x1f3
[<ffffffff8105b055>] __do_softirq+0xcc/0x2a9
[<ffffffff810d49e1>] ? irq_thread_fn+0x48/0x48
[<ffffffff8166b3cc>] call_softirq+0x1c/0x30
<EOI> [<ffffffff810042d3>] do_softirq+0x4d/0x8a
[<ffffffff8105ad57>] local_bh_enable+0xa0/0xa2
[<ffffffff810d4a2a>] irq_forced_thread_fn+0x49/0x5a
[<ffffffff810d467d>] irq_thread+0x122/0x145
[<ffffffff810d48e9>] ? irq_finalize_oneshot.part.33+0xe7/0xe7
[<ffffffff810d455b>] ? wake_threads_waitq+0x44/0x44
[<ffffffff81075fe5>] kthread+0xc0/0xc5
[<ffffffff81075f25>] ? flush_kthread_work+0x10c/0x10c
[<ffffffff8166a16c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075f25>] ? flush_kthread_work+0x10c/0x10c

BUG: scheduling while atomic: Socket Thread/3163/0x00000402

etc, from various places..

I was using CONFIG_RCU_NOCB_CPU_ALL=y and threadirqs, I'm not sure if it's related or not..
Looking at e1000e there has been a few commits to change everything to usleep_range, maybe
it was a bit too much?

~Maarten

Allan, Bruce W

unread,
May 6, 2013, 12:50:03 PM5/6/13
to
> -----Original Message-----
> From: Maarten Lankhorst [mailto:m.b.la...@gmail.com]
> Sent: Sunday, May 05, 2013 3:24 AM
> To: David Miller; Allan, Bruce W
> Cc: torv...@linux-foundation.org; ak...@linux-foundation.org;
> net...@vger.kernel.org; linux-...@vger.kernel.org
> Subject: Re: [GIT] Networking
>
> Hey,
>
[snip]
Yup, my bad (commit ce43a216 e1000e: cleanup USLEEP_RANGE checkpatch checks).
I'll send along a patch to fix this shortly.

Bruce.

David Miller

unread,
May 6, 2013, 5:10:03 PM5/6/13
to

Just a small pile of fixes:

1) Fix race conditions in IP fragmentation LRU list handling,
from Konstantin Khlebnikov.

2) vfree() is no longer verboten in interrupts, so deferring is
pointless, from Al Viro.

3) Conversion from mutex to semaphore in netpoll left trylock
test inverted, caught by Dan Carpenter.

4) 3c59x uses wrong base address when releasing regions, from
Sergei Shtylyov.

5) Bounds checking in TIPC from Dan Carpenter.

6) Fastopen cookies should not be expired as aggressively as other
TCP metrics. From Eric Dumazet.

7) Fix retrieval of MAC address in ibmveth, from Ben Herrenschmidt.

8) Don't use "u16" in virtio user headers, from Stephen Hemminger.

Please pull, thanks a lot!

The following changes since commit f8ce1faf55955de62e0a12e330c6d9a526071f65:

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux (2013-05-05 10:58:06 -0700)

are available in the git repository at:


git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to 6bf15191f666c5965d212561d7a5c7b78b808dfa:

tipc: potential divide by zero in tipc_link_recv_fragment() (2013-05-06 16:16:52 -0400)

----------------------------------------------------------------
Al Viro (2):
fib_trie: no need to delay vfree()
rps_dev_flow_table_release(): no need to delay vfree()

Benjamin Herrenschmidt (1):
net/eth/ibmveth: Fixup retrieval of MAC address

Dan Carpenter (3):
netpoll: inverted down_trylock() test
tipc: add a bounds check in link_recv_changeover_msg()
tipc: potential divide by zero in tipc_link_recv_fragment()

Eric Dumazet (1):
tcp: do not expire TCP fastopen cookies

Konstantin Khlebnikov (1):
net: frag, fix race conditions in LRU list maintenance

Sergei Shtylyov (1):
3c59x: fix freeing nonexistent resource on driver unload

hayeswang (1):
net/usb: new driver for RTL8152

stephen hemminger (1):
virtio: don't expose u16 in userspace api

drivers/net/ethernet/3com/3c59x.c | 2 +-
drivers/net/ethernet/ibm/ibmveth.c | 23 +-
drivers/net/usb/Kconfig | 11 +
drivers/net/usb/Makefile | 1 +
drivers/net/usb/cdc_ether.c | 10 +
drivers/net/usb/r8152.c | 1767 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/netdevice.h | 1 -
include/net/inet_frag.h | 5 +-
include/uapi/linux/virtio_net.h | 2 +-
net/core/net-sysfs.c | 12 +-
net/core/netpoll.c | 2 +-
net/ipv4/fib_trie.c | 13 +-
net/ipv4/inet_fragment.c | 1 +
net/ipv4/tcp_metrics.c | 15 +-
net/tipc/link.c | 11 +-
15 files changed, 1826 insertions(+), 50 deletions(-)
create mode 100644 drivers/net/usb/r8152.c

David Miller

unread,
May 6, 2013, 7:10:04 PM5/6/13
to
From: Linus Torvalds <torv...@linux-foundation.org>
Date: Mon, 6 May 2013 15:57:15 -0700

> On Mon, May 6, 2013 at 2:05 PM, David Miller <da...@davemloft.net> wrote:
>>
>> Just a small pile of fixes:
>
> Ok, apparently the e1000e problem didn't get fixed yet. Should I just
> revert ce43a2168c59 for now, or do we have a fix for that forthcoming
> shortly?

It is coming very shortly.

Dan Williams

unread,
May 6, 2013, 7:40:01 PM5/6/13
to
On Thu, 2013-05-02 at 15:22 -0500, Dan Williams wrote:
> On Thu, 2013-05-02 at 14:17 -0500, Dan Williams wrote:
> > On Thu, 2013-05-02 at 11:53 -0700, Linus Torvalds wrote:
> > > On Thu, May 2, 2013 at 11:52 AM, Linus Torvalds
> > > <torv...@linux-foundation.org> wrote:
> > > >
> > > > but for some reason in the bad case NM just doesn't react past the
> > > > "carrier now ON" part.
> > >
> > > Grr. That was much more readable while editing, then gmail ended up
> > > doing its "smart line wrap" thing and my nice long lines became a
> > > mess. I think you can still figure it out,
> >
> > Yeah, the dump is good, and it seems to point to a problem in NM for
> > now. I'm investigating.
>
> If you don't mind helping me confirm/reject a theory, would you mind:
>
> 1) drop "level=debug" into the [logging] section
> of /etc/NetworkManager/NetworkManager.conf (create the section if it
> doesn't exist)
>
> 2) reboot (or just rmmod/modprobe r8169 if that's enough to trigger it)
>
> 3) grab syslog and send it to me (privately if you wish)

Spent a while trying to reproduce this on a tg3 today, twiddling the
carrier from the code, but failed to do so... if you're at all able to
get the debug logs here as mentioned here, that would help greatly.

Contrary to the suggestion above, changing the log level in
NetworkManager.conf does not immediately start logging in that level,
you do need to restart NM. So with that in mind, I'd suggest a full
reboot of the system just to make sure we have the exact same situation
as when you experience the problem.

(I'm still mystified how the netdev features thing "fixes" this
though...)

Jeff Kirsher

unread,
May 6, 2013, 7:40:02 PM5/6/13
to
On Mon, 2013-05-06 at 19:09 -0400, David Miller wrote:
> From: Linus Torvalds <torv...@linux-foundation.org>
> Date: Mon, 6 May 2013 15:57:15 -0700
>
> > On Mon, May 6, 2013 at 2:05 PM, David Miller <da...@davemloft.net> wrote:
> >>
> >> Just a small pile of fixes:
> >
> > Ok, apparently the e1000e problem didn't get fixed yet. Should I just
> > revert ce43a2168c59 for now, or do we have a fix for that forthcoming
> > shortly?
>
> It is coming very shortly.

I am preparing the patch now to send upstream...
signature.asc

Linus Torvalds

unread,
May 6, 2013, 8:00:03 PM5/6/13
to
On Mon, May 6, 2013 at 2:05 PM, David Miller <da...@davemloft.net> wrote:
>
> Just a small pile of fixes:

Ok, apparently the e1000e problem didn't get fixed yet. Should I just
revert ce43a2168c59 for now, or do we have a fix for that forthcoming
shortly?

Linus

David Miller

unread,
May 6, 2013, 10:20:02 PM5/6/13
to
From: Jeff Kirsher <jeffrey....@intel.com>
Date: Mon, 06 May 2013 16:32:46 -0700
Jeff, I have an early plane to catch so I have to get to bed now.

Please just send Linus the fix directly, thanks.

Jeff Kirsher

unread,
May 7, 2013, 2:10:01 AM5/7/13
to
On Mon, 2013-05-06 at 22:10 -0400, David Miller wrote:
> From: Jeff Kirsher <jeffrey....@intel.com>
> Date: Mon, 06 May 2013 16:32:46 -0700
>
> > On Mon, 2013-05-06 at 19:09 -0400, David Miller wrote:
> >> From: Linus Torvalds <torv...@linux-foundation.org>
> >> Date: Mon, 6 May 2013 15:57:15 -0700
> >>
> >> > On Mon, May 6, 2013 at 2:05 PM, David Miller <da...@davemloft.net> wrote:
> >> >>
> >> >> Just a small pile of fixes:
> >> >
> >> > Ok, apparently the e1000e problem didn't get fixed yet. Should I just
> >> > revert ce43a2168c59 for now, or do we have a fix for that forthcoming
> >> > shortly?
> >>
> >> It is coming very shortly.
> >
> > I am preparing the patch now to send upstream...
>
> Jeff, I have an early plane to catch so I have to get to bed now.
>
> Please just send Linus the fix directly, thanks.

Done.
signature.asc

David Miller

unread,
May 8, 2013, 9:10:03 PM5/8/13
to

1) Propagate return error values properly in irda, spider_net, sfc,
and bfin_mac. From Wei Yongjun.

2) Fix fec driver OOPS on rapid link up/down, from Frank Li.

3) FIX VF resource allocation and chip message payload length errors
in be2net driver, from Sathya Perla.

4) Fix inner protocol inspection during GSO from Pravin B Shelar.

Please pull, thanks a lot!

The following changes since commit de9c9f86be0dc3495de98dc65c80abe6e7c7d643:

Merge tag 'remoteproc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc (2013-05-07 14:04:56 -0700)

are available in the git repository at:


git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 4f924b2aa4d3cb30f07e57d6b608838edcbc0d88:

if_cablemodem.h: Add parenthesis around ioctl macros (2013-05-08 13:13:30 -0700)

----------------------------------------------------------------
Dan Williams (3):
qmi_wwan/cdc_ether: add device IDs for Dell 5804 (Novatel E371) WWAN card
usbnet: allow status interrupt URB to always be active
sierra_net: keep status interrupt URB active

Frank Li (1):
net: fec: fix kernel oops when plug/unplug cable many times

Josh Boyer (1):
if_cablemodem.h: Add parenthesis around ioctl macros

Pravin B Shelar (1):
gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()

Sathya Perla (4):
be2net: provision VF resources before enabling SR-IOV
be2net: fix payload_len value for GET_MAC_LIST cmd req
be2net: fix EQ from getting full while cleaning RX CQ
be2net: disable TX in be_close()

Sebastian Hesselbarth (1):
net: of_mdio: fix behavior on missing phy device

Wei Yongjun (4):
net/irda: fix error return code in bfin_sir_open()
net/spider_net: fix error return code in spider_net_open()
sfc: fix return value check in efx_ptp_probe_channel()
bfin_mac: fix error return code in bfin_mac_probe()

drivers/net/ethernet/adi/bfin_mac.c | 3 ++-
drivers/net/ethernet/emulex/benet/be_cmds.c | 5 ++---
drivers/net/ethernet/emulex/benet/be_main.c | 30 +++++++++++++++-------------
drivers/net/ethernet/freescale/fec.h | 10 ++++++----
drivers/net/ethernet/freescale/fec_main.c | 44 ++++++++++++++++++++++++++++++-----------
drivers/net/ethernet/sfc/ptp.c | 4 +++-
drivers/net/ethernet/toshiba/spider_net.c | 3 ++-
drivers/net/irda/bfin_sir.c | 6 ++++--
drivers/net/usb/cdc_ether.c | 7 +++++++
drivers/net/usb/qmi_wwan.c | 7 +++++++
drivers/net/usb/sierra_net.c | 38 ++++++++++++++++++++++++++++-------
drivers/net/usb/usbnet.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
drivers/of/of_mdio.c | 11 ++++-------
include/linux/usb/usbnet.h | 5 +++++
include/uapi/linux/if_cablemodem.h | 12 +++++------
net/core/dev.c | 11 +++++++++++
net/ipv4/gre.c | 8 +-------
net/ipv4/udp.c | 4 +---
18 files changed, 212 insertions(+), 73 deletions(-)

Pavel Simerda

unread,
May 9, 2013, 5:10:06 AM5/9/13
to
Fixed in NetworkManager master:

commit 7aefd5b5f4547c294092753b5cc355c95763134a
Author: Dan Winship <da...@gnome.org>
Date: Mon May 6 12:10:01 2013 -0400

platform: fix use of ethtool

The bits in the result of ETHTOOL_GFEATURES are not in any defined
order; you need to use ETHTOOL_GSTRINGS to get the names associated
with each bit to find what each one does. Fix
NMPlatformLinux:link_supports_vlans() to do this.

https://bugzilla.gnome.org/show_bug.cgi?id=699649

diff --git a/src/platform/nm-linux-platform.c b/src/platform/nm-linux-platform.c
index 699c107..eeaeb2e 100644
--- a/src/platform/nm-linux-platform.c
+++ b/src/platform/nm-linux-platform.c
@@ -1024,6 +1024,40 @@ ethtool_get (const char *name, gpointer edata)
return TRUE;
}

+static int
+ethtool_get_stringset_index (const char *ifname, int stringset_id, const char *string)
+{
+ auto_g_free struct ethtool_sset_info *info;
+ auto_g_free struct ethtool_gstrings *strings;
+ guint32 len, i;
+
+ info = g_malloc0 (sizeof (*info) + sizeof (guint32));
+ info->cmd = ETHTOOL_GSSET_INFO;
+ info->reserved = 0;
+ info->sset_mask = 1ULL << stringset_id;
+
+ if (!ethtool_get (ifname, info))
+ return -1;
+ if (!info->sset_mask)
+ return -1;
+
+ len = info->data[0];
+
+ strings = g_malloc0 (sizeof (*strings) + len * ETH_GSTRING_LEN);
+ strings->cmd = ETHTOOL_GSTRINGS;
+ strings->string_set = stringset_id;
+ strings->len = len;
+ if (!ethtool_get (ifname, strings))
+ return -1;
+
+ for (i = 0; i < len; i++) {
+ if (!strcmp ((char *) &strings->data[i * ETH_GSTRING_LEN], string))
+ return i;
+ }
+
+ return -1;
+}
+
static gboolean
link_supports_carrier_detect (NMPlatform *platform, int ifindex)
{
@@ -1041,26 +1075,39 @@ link_supports_carrier_detect (NMPlatform *platform, int ifindex)
return name && ethtool_get (name, &edata);
}

-#define NETIF_F_VLAN_CHALLENGED (1 << 10)
-
static gboolean
link_supports_vlans (NMPlatform *platform, int ifindex)
{
auto_nl_object struct rtnl_link *rtnllink = link_get (platform, ifindex);
const char *name = nm_platform_link_get_name (ifindex);
- struct {
- struct ethtool_gfeatures features;
- struct ethtool_get_features_block features_block;
- } edata = { .features = { .cmd = ETHTOOL_GFEATURES, .size = 1 } };
+ auto_g_free struct ethtool_gfeatures *features;
+ int index, block, bit, size;

/* Only ARPHRD_ETHER links can possibly support VLANs. */
if (!rtnllink || rtnl_link_get_arptype (rtnllink) != ARPHRD_ETHER)
return FALSE;

- if (!name || !ethtool_get (name, &edata))
+ if (!name)
+ return FALSE;
+
+ index = ethtool_get_stringset_index (name, ETH_SS_FEATURES, "vlan-challenged");
+ if (index == -1) {
+ debug ("vlan-challenged ethtool feature does not exist?");
+ return FALSE;
+ }
+
+ block = index / 32;
+ bit = index % 32;
+ size = block + 1;
+
+ features = g_malloc0 (sizeof (*features) + size * sizeof (struct ethtool_get_features_block));
+ features->cmd = ETHTOOL_GFEATURES;
+ features->size = size;
+
+ if (!ethtool_get (name, features))
return FALSE;

- return !(edata.features.features[0].active & NETIF_F_VLAN_CHALLENGED);
+ return !(features->features[block].active & (1 << bit));
}

static gboolean

David Miller

unread,
May 13, 2013, 4:10:03 PM5/13/13
to

Several small bug fixes all over:

1) be2net driver uses wrong payload length when submitting MAC list get
requests to the chip. From Sathya Perla.

2) Fix mwifiex memory leak on driver unload, from Amitkumar Karwar.

3) Prevent random memory access in batman-adv, from Marek Lindner.

4) batman-adv doesn't check for pskb_trim_rcsum() errors, also from Marek
Lindner.

5) Fix fec crashes on rapid link up/down, from Frank Li.

6) Fix inner protocol grovelling in GSO, from Pravin B Shelar.

7) Link event validation fix in qlcnic from Rajesh Borundia.

8) Not all FEC chips can support checksum offload, fix from Shawn Guo.

9) EXPORT_SYMBOL + inline doesn't make any sense, from Denis Efremov.

10) Fix race in passthru mode during device removal in macvlan, from
Jiri Pirko.

11) Fix RCU hash table lookup socket state race in ipv6, leading to NULL
pointer derefs, from Eric Dumazet.

12) Add several missing HAS_DMA kconfig dependencies, from Geert
Uyttterhoeven.

13) Fix bogus PCI resource management in 3c59x driver, from Sergei
Shtylyov.

14) Fix info leak in ipv6 GRE tunnel driver, from Amerigo Wang.

15) Fix device leak in ipv6 IPSEC policy layer, from Cong Wang.

16) DMA mapping leak fix in qlge from Thadeu Lima de Souza Cascardo.

17) Missing iounmap on probe failure in bna driver, from Wei Yongjun.

Please pull, thanks a lot!

The following changes since commit 70eba4226d9718946941c7be0c8cb66d431e7686:

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux (2013-05-09 16:42:10 -0700)

are available in the git repository at:


git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to ba21fc696dd28ea7a5880345faf0168619a478d2:

bna: add missing iounmap() on error in bnad_init() (2013-05-13 12:54:38 -0700)

----------------------------------------------------------------
Amerigo Wang (2):
virtio_net: use default napi weight by default
ipv6,gre: do not leak info to user-space

Amitkumar Karwar (1):
mwifiex: fix memory leak issue when driver unload

Antonio Quartulli (2):
batman-adv: make DAT drop ARP requests targeting local clients
batman-adv: reorder clean up routine in order to avoid race conditions

Bing Zhao (1):
mwifiex: clear is_suspended flag when interrupt is received early

Cong Wang (1):
xfrm6: release dev before returning error

Daniel Drake (1):
mwifiex: fix setting of multicast filter

David S. Miller (3):
Merge branch 'wireless'
Merge branch 'qlcnic'
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Denis Efremov (1):
ipv4: ip_output: remove inline marking of EXPORT_SYMBOL functions

Eric Dumazet (1):
ipv6: do not clear pinet6 field

Felix Fietkau (1):
ath9k: fix key allocation error handling for powersave keys

Geert Uytterhoeven (6):
net/ethernet: NET_CALXEDA_XGMAC should depend on HAS_DMA
net/ethernet: STMMAC_ETH should depend on HAS_DMA
net/wireless: ATH9K should depend on HAS_DMA
net/ethernet: ARM_AT91_ETHER should depend on HAS_DMA
net/ethernet: MACB should depend on HAS_DMA
caif: CAIF_VIRTIO should depend on HAS_DMA

Himanshu Madhani (2):
qlcnic: Fix missing bracket in module parameter.
qlcnic: Fix ethtool supported port status for 83xx

Jiri Pirko (1):
macvlan: fix passthru mode race between dev removal and rx path

John W. Linville (1):
Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Manish Chopra (2):
qlcnic: Fix setting MAC address
qlcnic: Fix bug in diagnostics test reset recovery path

Marek Lindner (2):
batman-adv: check proto length before accessing proto string buffer
batman-adv: check return value of pskb_trim_rcsum()

Or Gerlitz (1):
net/mlx4_core: Add missing report on VST and spoof-checking dev caps

Petri Gynther (1):
emac: Fix EMAC soft reset on 460EX/GT

Rajesh Borundia (2):
qlcnic: Fix mailbox response handling.
qlcnic: Fix validation of link event command.

Rony Efraim (1):
net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode

Sergei Shtylyov (1):
3c59x: fix PCI resource management

Shahed Shaikh (1):
qlcnic: Fix ethtool strings

Shawn Guo (1):
net: fec: enable hardware checksum only on imx6q-fec

Sony Chacko (1):
qlcnic: Fix reset recovery after transmit timeout

Stanislaw Gruszka (2):
ath5k: do not reschedule tx_complete_work on stop
iwl4965: workaround connection regression on passive channel

Sujith Manoharan (2):
ath9k: Fix beacon reconfiguration
ath9k: Update initvals for AR9565

Thadeu Lima de Souza Cascardo (1):
qlge: fix dma map leak when the last chunk is not allocated

Thommy Jakobsson (1):
B43: Handle DMA RX descriptor underrun

Wei Yongjun (1):
bna: add missing iounmap() on error in bnad_init()

drivers/net/caif/Kconfig | 2 +-
drivers/net/ethernet/3com/3c59x.c | 25 ++++++-----
drivers/net/ethernet/brocade/bna/bnad.c | 5 ++-
drivers/net/ethernet/cadence/Kconfig | 3 +-
drivers/net/ethernet/calxeda/Kconfig | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 20 ++++++---
drivers/net/ethernet/ibm/emac/core.c | 36 +++++++++++----
drivers/net/ethernet/mellanox/mlx4/en_resources.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 4 +-
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 29 +++++++-----
drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 2 +
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 95 ++++++++++++++++++++++++++++++++++-----
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h | 4 +-
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 22 +++++----
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c | 54 +++++++++++-----------
drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h | 2 +-
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 46 ++++++++++++++-----
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 8 ++--
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c | 3 --
drivers/net/ethernet/qlogic/qlge/qlge_main.c | 7 +++
drivers/net/ethernet/stmicro/stmmac/Kconfig | 2 +-
drivers/net/macvlan.c | 7 +--
drivers/net/virtio_net.c | 2 +-
drivers/net/wireless/ath/ath5k/base.c | 5 +++
drivers/net/wireless/ath/ath9k/Kconfig | 2 +-
drivers/net/wireless/ath/ath9k/ar9565_1p0_initvals.h | 138 +++++++++++++++++++++++++++++----------------------------
drivers/net/wireless/ath/ath9k/main.c | 10 +++--
drivers/net/wireless/b43/dma.c | 19 ++++++++
drivers/net/wireless/b43/dma.h | 4 +-
drivers/net/wireless/b43/main.c | 43 ++++++++----------
drivers/net/wireless/iwlegacy/4965-mac.c | 3 +-
drivers/net/wireless/mwifiex/cfg80211.c | 3 --
drivers/net/wireless/mwifiex/cmdevt.c | 1 +
drivers/net/wireless/mwifiex/main.c | 1 +
drivers/net/wireless/mwifiex/sta_ioctl.c | 21 +++------
include/linux/mlx4/qp.h | 29 +++++++++++-
include/net/sock.h | 12 +++++
net/batman-adv/distributed-arp-table.c | 13 ++++++
net/batman-adv/main.c | 18 +++++---
net/batman-adv/network-coding.c | 8 +++-
net/core/sock.c | 12 -----
net/ipv4/ip_output.c | 2 +-
net/ipv6/ip6_gre.c | 2 +
net/ipv6/tcp_ipv6.c | 12 +++++
net/ipv6/udp.c | 13 +++++-
net/ipv6/udp_impl.h | 2 +
net/ipv6/udplite.c | 2 +-
net/ipv6/xfrm6_policy.c | 4 +-
48 files changed, 498 insertions(+), 263 deletions(-)

Sergei Shtylyov

unread,
May 13, 2013, 5:30:03 PM5/13/13
to
Hello.

On 05/14/2013 12:08 AM, David Miller wrote:

> Several small bug fixes all over:
>
[...]
> 13) Fix bogus PCI resource management in 3c59x driver, from Sergei
> Shtylyov.

I must note that this coming August I will be celebrating 7 (!) years
from the original posting of that patch. It has really been an unlucky one,
even in this list it goes by #13. :-)

WBR, Sergei
0 new messages