Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

freebsd-net Digest, Vol 364, Issue 2

1 view

Skip to first unread message

freebsd-n...@freebsd.org

unread,

Mar 23, 2010, 8:00:15 AM3/23/10

to freeb...@freebsd.org

Send freebsd-net mailing list submissions to
freeb...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-net
or, via email, send a message with subject or body 'help' to
freebsd-n...@freebsd.org

You can reach the person managing the list at
freebsd-...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-net digest..."

Today's Topics:

1. RE: Bug in tcp_output? (Chris Harrer)
2. Re: Please pay attention to fix bug kern/141285 (Prokofyev S.P.)
3. MFC of igb fixes? (Charles Owens)

----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Mar 2010 08:27:18 -0400
From: "Chris Harrer" <cjha...@comcast.net>
Subject: RE: Bug in tcp_output?
To: "'Rui Paulo'" <rpa...@freebsd.org>, "'Bruce Evans'"
<br...@optusnet.com.au>
Cc: freeb...@freebsd.org
Message-ID: <000001cac9bb$03bdfb10$0b39f130$@net>
Content-Type: text/plain; charset="iso-8859-1"

Bruce and Rui,

I owe you an apology, sorry. As I was reading Bruce's response, I was
saying to myself that something isn't adding up and sure enough I went back
and looked and noticed that I was running with a locally modified tcp_var.h
(which changed the snd_cwnd container).

The "clean" version does indeed work as expected.

Again, sorry for the "noise".

Thanks.

Chris
-----Original Message-----
From: Rui Paulo [mailto:rpa...@gmail.com] On Behalf Of Rui Paulo
Sent: Sunday, March 21, 2010 9:15 AM
To: Bruce Evans
Cc: Chris Harrer; freeb...@freebsd.org
Subject: Re: Bug in tcp_output?

On 21 Mar 2010, at 07:17, Bruce Evans wrote:

> On Sat, 20 Mar 2010, Rui Paulo wrote:
>
>> On 18 Mar 2010, at 20:19, Chris Harrer wrote:
>>>
>>> In the following block of code, running on a x86_64 platform, I believe
that
>>> cwin should be declared as an int:
>>> ...
>>> else {
>>>
>>> long cwin; ß-- Should be an int
>>> ...
>>> if (len > 0) {
>>>
>>> cwin = tp->snd_cwnd -
>>>
>>> (tp->snd_nxt - tp->sack_newdata) -
>>>
>>> sack_bytes_rxmt;
>>>
>>> if (cwin < 0)
>>>
>>> cwin = 0;
>>>
>>> len = lmin(len, cwin);
>>>
>>> }
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> Consider the case where:
>>>
>>> sack_rxmit = 0
>>>
>>> sack_bytes_rxmt = 0x2238
>>>
>>> off = 0
>>>
>>> len =0xa19c
>>>
>>> tp->snd_cwnd = 0x2238
>>>
>>> tp->snd_nxt = 0xdd6d7974
>>>
>>> tp->sack_newdata = 0xdd6d6858
>>>
>>> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, but
>>> instead huge. This causes the remaining data on the socket’s so->so_snd
>>> buffer to be sent to the network causing more problems at the receiver
which
>>> is already dropping frames.
>>
>> I see. This is most likely a bug. Can you send-pr so this doesn't get
lost?
>
> What bug do you see? This is most likely not a bug. I only see the
> following bugs
> - the suggestion to increase the fragility of the code by changing cwin to
> int
> - lots of whitespace lossage
> - the style bug in the declaration of cwin (nested declaration)
> - lots fragile but working code. It depends on the machine being a normal
> 2's complement one. It would fail on normal 1's complement machines and
> on abnormal 2's complement ones, but so would many other things in the
> kernel.
> - type and arithmetic errors that are not made at runtime resulting in a
> value that wouldn't work, though the runtime value would.
>
> Relevant code quoted again, with the whitespace fixed:
>
>>> cwin = tp->snd_cwnd -
>>> (tp->snd_nxt - tp->sack_newdata) -
>>> sack_bytes_rxmt;
>
> On 64-bit machines, with the above values, this is:
>
> rhs = (u_long)0x2238UL -
> ((tcp_seq)0xdd6d7974 -
> (tcp_seq)0xdd6d6858) -
> (int)0x2238;
> = (u_long)0x2238UL -
> ((uint32_t)0xdd6d7974 -
> (uint32_t)0xdd6d6858) -
> (int)0x2238;
> = (u_long)0x2238UL -
> (u_int)0x111c -
> (int)0x2238;
> = (u_long)0x111c -
> (int)0x2238;
> = (u_long)0x111c -
> (u_long)0x2238;
> = (u_long)0xffffffffffffeee4;
> cwin = (long)rhs;
> = -(long)0x111c;
>
> I might have made arithmetic errors too, but I'm sure that I got the
> conversions essentially correct. On machines with 64-bit u_longs,
> almost everything is evaluated modulo 2^64. This gives a large positive
> value, but not one with the top bits set up to only the 31st as would
> happen on machines with 32-bit u_longs. Then the final conversion to
> long gives a negative value.

Right. I made some bad calculations.

>
> This is fragile, but it is a standard 2's complement hack. It would
> fail mainly on normal ones complement machines when the rhs is
> (u_long)0xFF...FF. Then the lhs is probably negative 0, which is
> not less than 0.
>
> The fragility is essentially the same on machines with 32-bit u_longs.
> Almost everything is evaluated modulo 2^32...
>
> Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire
> calculation) is exessive but doesn't cause any problems.
>
> Using a signed type for sack_bytes_rxmt asks for sign extension bugs but
> doesn't get them. Here it is promoted to a u_long so there are no
> sign extension bugs for it here.
>
> Using a signed type for cwin is essential for the comparison of cwin
> with 0 to work.

Right.

> This signed type should have the same size as the rhs
> to avoid even more fragility (if it were int, then you would have to
> worry about the value being changed to a non-working value by the
> implementation-defined conversion of the rhs to cwin not just for
> values larger than LONG_MAX but also for ones larger than INT_MAX.
> `int' should work in practice. This and other things depend on the
> difference of the tcp_seq's not being anywhere near as large as
> 0x7fffffff).

I assumed that Chris saw a problem with this code after being hit by some
TCP/IP interop issue. Was this the case?

--
Rui Paulo

------------------------------

Message: 2
Date: Mon, 22 Mar 2010 15:40:13 +0200
From: "Prokofyev S.P." <pr...@skylinetele.com>
Subject: Re: Please pay attention to fix bug kern/141285
To: pyu...@gmail.com
Cc: j...@FreeBSD.org, freeb...@freebsd.org
Message-ID: <4BA7733D...@skylinetele.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 19.03.2010 19:47, Pyun YongHyeon wrote:
> On Fri, Mar 19, 2010 at 10:44:50AM -0700, Pyun YongHyeon wrote:
>
>> On Fri, Mar 19, 2010 at 04:40:44PM +0200, Prokofyev S.P. wrote:
>>
>>> Hi ALL !
>>>
>>> Please pay attention to fix bug kern/141285(kern/141843) !
>>>
>>>
>> igb(4) also has a similar issue but it seems igb(4) does not even
>> advertise IFCAP_VLAN_HWFILTER capability. igb(4) may have to remove
>> VLAN event handler or should implement IFCAP_VLAN_HWFILTER to
>> support VLAN hardware filtering.
>>
>> I have a patch for the hardware VLAN filtering of em(4). But it
>> wouldn't address the issue reported in the PR. The root cause of
>> issue was em(4) wants to reset controller whenever new VLAN is
>> registered/unregistered. I'm not sure this is requirement of
>> hardware. If this is requirement of hardware there is no way to
>> avoid the controller reset unless you disable vlanhwfilter feature.
>>
>> #ifconfig em0 -vlanhwfilter
>>
>> em(4) in HEAD disabled VLAN hardware filtering by default so if you
>> use that version you wouldn't encounter the issue again. Attached
>> patch is small diff for VLAN hardware filtering which tries to
>> avoid unnecessary controller reset and added missing lock. If
>> hardware allows dynamic changing of VLAN filtering table we could
>> completely bypass the controller reset. Jack may know the details.
>>
> Oops, posted old patch. Here is new one.
>
Thank you Pyun.
I have applied your patch (rebuild/reinstall kernel) and have lost
access to test server after reboot.

ping Test Server from Server A
Server A (10.25.223.4) -> Test Server(10.25.223.2)

tcpdump -enp -i em0 (via console on Test Server):
14:39:44.752754 00:1b:fc:af:a1:b4 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q
(0x8100), length 64: vlan 77, p 0, ethertype ARP, Request who-has
10.25.223.2 tell 10.25.223.4, length 46
14:39:44.752765 00:30:48:96:cc:56 > 00:1b:fc:af:a1:b4, ethertype 802.1Q
(0x8100), length 46: vlan 77, p 0, ethertype ARP, Reply 10.25.223.2
is-at 00:30:48:96:cc:56, length 28
.........
but do not see reply on Server A.

I "see" the TestServer after ifconfig em0 down/up. The problem with bug
kern/141285 disappear.

If write ifconfig_em0="up -vlanhwtag" in /etc/rc.conf, then after
reboot I have network access to TestServer, but this is not very nice
decision of a problem.

------------------------------

Message: 3
Date: Mon, 22 Mar 2010 11:33:27 -0400
From: Charles Owens <cow...@greatbaysoftware.com>
Subject: MFC of igb fixes?
To: Jack Vogel <jfv...@gmail.com>
Cc: freeb...@freebsd.org
Message-ID: <4BA78DC7...@greatbaysoftware.com>
Content-Type: text/plain; charset=ISO-8859-1

Hello Jack,

I'm wondering if you have any thoughts with regard to the expected
timeframe for MFC of commit *203049*
<http://svn.freebsd.org/viewvc/base?view=revision&revision=203049> to
RELENG_8? With igb NICS we're seeing flakiness with link-state
handling... and I'm worried that we could fall victim to problems others
have seen when putting the NIC under load.

Would you happen to have a version of the patch that is ready for
RELENG_8? (we'd actually be applying it against RELENG_8_0, as that's
what we're currently bundling with our product). I do note that you
made a few other minor fixes in the week or so following -- should these
be in the picture as well?

Any help or advise is appreciated.

The system in question is based on the Intel S5520UR motherboard... the
NICs are on-board (_not_ PCI-card-based). During boot, the NICs are
detected as shown below. At boot the NICs will always show link-state
as being active, whether or not a cable is plugged in.

igb0: <Intel(R) PRO/1000 Network Connection version - 1.7.3> port 0x3020-0x303f mem 0xb1b20000-0xb1b3ffff,0xb1b44000-0xb1b47fff irq 40 at device 0.0 on pci1
igb0: Using MSIX interrupts with 3 vectors
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: Ethernet address: 00:15:17:b4:cf:e4
igb1: <Intel(R) PRO/1000 Network Connection version - 1.7.3> port 0x3000-0x301f mem 0xb1b00000-0xb1b1ffff,0xb1b40000-0xb1b43fff irq 28 at device 0.1 on pci1
igb1: Using MSIX interrupts with 3 vectors
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: Ethernet address: 00:15:17:b4:cf:e5

Thank you,

Charles

--
Charles Owens
Great Bay Software, Inc.

------------------------------

End of freebsd-net Digest, Vol 364, Issue 2
*******************************************

0 new messages