Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

freebsd-net Digest, Vol 363, Issue 7

2 views

Skip to first unread message

freebsd-n...@freebsd.org

unread,

Mar 21, 2010, 8:00:25 AM3/21/10

to freeb...@freebsd.org

Send freebsd-net mailing list submissions to
freeb...@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-net
or, via email, send a message with subject or body 'help' to
freebsd-n...@freebsd.org

You can reach the person managing the list at
freebsd-...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-net digest..."

Today's Topics:

1. Re: kern/144898: [wpi] [panic] wpi panics system
(lin...@FreeBSD.org)
2. Re: why zero-copy sockets(9) are not popular? (Bruce Simpson)
3. Re: kern/144689: [re] TCP transfer corruption using if_re
(Steven Noonan)
4. Re: why zero-copy sockets(9) are not popular? (Alexander Bubnov)
5. Re: Bug in tcp_output? (Bruce Evans)
6. Re: kern/144917: Flowtable crashes system (Evgenii Davidov)

----------------------------------------------------------------------

Message: 1
Date: Sat, 20 Mar 2010 13:54:10 GMT
From: lin...@FreeBSD.org
Subject: Re: kern/144898: [wpi] [panic] wpi panics system
To: lin...@FreeBSD.org, freebs...@FreeBSD.org,
freeb...@FreeBSD.org
Message-ID: <201003201354....@freefall.freebsd.org>

Old Synopsis: wpi panics system
New Synopsis: [wpi] [panic] wpi panics system

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Mar 20 13:53:51 UTC 2010
Responsible-Changed-Why:
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=144898

------------------------------

Message: 2
Date: Sat, 20 Mar 2010 17:53:37 +0000
From: Bruce Simpson <b...@incunabulum.net>
Subject: Re: why zero-copy sockets(9) are not popular?
To: freeb...@freebsd.org
Message-ID: <4BA50BA1...@incunabulum.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 03/20/10 10:06, Alexander Bubnov wrote:
> Hello, all!
> Anybody knows why zero copy is not popular although this technique allows
> to increase performance of servers? It is very hard to find any examples of
> zero-copy for FreeBSD.
>

Transmit is easy. Receive is hard.

The whole concept of zero-copy revolves around being able to use
page-flipping to map buffers in user and kernel space, to amortize the
cost of copies across that system boundary.

The compromise usually taken is to use the sendfile() API, or rely on
TCP Segmentation Offload (TSO), much like Microsoft's Chimney stack does
in Windows 7. Unfortunately, sendfile() only covers transmit. TSO only
offloads up to the point where sockets hit the card; TSO can offload TCP
stream reassembly, but you still have to copy from the kernel buffers
into userland.

True zero-copy sockets generally require scatter/gather DMA engine
support, and TCP/IP header splitting, to do zero-copy recieve.

S/G PCI DMA cores are often custom designed, and you tend not to find
them in off-the-shelf VHDL libraries. That IP (as in intellectual
property) still has cost.

Historically the only cards in FreeBSD which supported this, were the
Tigon-II, which got bought by Broadcom (bge is the Tigon-III). Modified
firmware was required to do this.

------------------------------

Message: 3
Date: Sat, 20 Mar 2010 14:38:55 -0700
From: Steven Noonan <ste...@uplinklabs.net>
Subject: Re: kern/144689: [re] TCP transfer corruption using if_re
To: pyu...@gmail.com
Cc: freeb...@freebsd.org, bug-fo...@freebsd.org,
yon...@freebsd.org
Message-ID:
<f488382f1003201438w549...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Tue, Mar 16, 2010 at 1:46 PM, Pyun YongHyeon <pyu...@gmail.com> wrote:
> On Tue, Mar 16, 2010 at 12:31:22PM -0700, Steven Noonan wrote:
>> On Tue, Mar 16, 2010 at 11:23 AM, Pyun YongHyeon <pyu...@gmail.com> wrote:
>
> [...]
>
>> > The real issue looks like PHY read failure which can result in
>> > unexpected behavior. I don't see rgephy(4) related message here,
>> > would you show me the output of "devinfo -rv | grep phy"?
>> > By chance are you using PCMCIA ethernet controller?
>>
>> I am. It's a Netgear GA511. I think I said in my original post that it
>> was connected via cardbus.
>>
>> xerxes ~ # devinfo -rv | grep phy
>> Â Â Â Â Â Â Â Â Â Â rgephy0 pnpinfo oui=0x732 model=0x11 rev=0x3 at phyno=1
>> Â Â Â Â Â Â Â Â inphy0 pnpinfo oui=0xaa00 model=0x33 rev=0x0 at phyno=1
>>
>
> Ok, thanks for the info. Did the controller ever work before?
> Or you start seeing the issue on 8.0-RELEASE?
>

Uh, hm. This is weird, now I'm getting the problem not just using
re(4), but also with fxp(4) (which is my on-board card). I don't think
it's a driver bug here.

Could this be a TCP stack bug?

- Steven

------------------------------

Message: 4
Date: Sun, 21 Mar 2010 08:21:25 +0300
From: Alexander Bubnov <alexande...@gmail.com>
Subject: Re: why zero-copy sockets(9) are not popular?
To: Bruce Simpson <b...@incunabulum.net>
Cc: freeb...@freebsd.org
Message-ID:
<c3e287ff1003202221p15f...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Bruce, many thanks for comprehensive answer!

2010/3/20 Bruce Simpson <b...@incunabulum.net>

> On 03/20/10 10:06, Alexander Bubnov wrote:
>
>> Hello, all!
>> Anybody knows why zero copy is not popular although this technique allows
>> to increase performance of servers? It is very hard to find any examples
>> of
>> zero-copy for FreeBSD.
>>
>>
>
> Transmit is easy. Receive is hard.
>
> The whole concept of zero-copy revolves around being able to use
> page-flipping to map buffers in user and kernel space, to amortize the cost
> of copies across that system boundary.
>
> The compromise usually taken is to use the sendfile() API, or rely on TCP
> Segmentation Offload (TSO), much like Microsoft's Chimney stack does in
> Windows 7. Unfortunately, sendfile() only covers transmit. TSO only offloads
> up to the point where sockets hit the card; TSO can offload TCP stream
> reassembly, but you still have to copy from the kernel buffers into
> userland.
>
> True zero-copy sockets generally require scatter/gather DMA engine support,
> and TCP/IP header splitting, to do zero-copy recieve.
>
> S/G PCI DMA cores are often custom designed, and you tend not to find them
> in off-the-shelf VHDL libraries. That IP (as in intellectual property) still
> has cost.
>
> Historically the only cards in FreeBSD which supported this, were the
> Tigon-II, which got bought by Broadcom (bge is the Tigon-III). Modified
> firmware was required to do this.
>
>
> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net...@freebsd.org"
>

--
/BR, Alexander

------------------------------

Message: 5
Date: Sun, 21 Mar 2010 18:17:23 +1100 (EST)
From: Bruce Evans <br...@optusnet.com.au>
Subject: Re: Bug in tcp_output?
To: Rui Paulo <rpa...@freebsd.org>
Cc: freeb...@freebsd.org, Chris Harrer <cjha...@comcast.net>
Message-ID: <2010032117...@delplex.bde.org>
Content-Type: text/plain; charset="x-unknown"

On Sat, 20 Mar 2010, Rui Paulo wrote:

> On 18 Mar 2010, at 20:19, Chris Harrer wrote:
>>
>> In the following block of code, running on a x86_64 platform, I believe that
>> cwin should be declared as an int:
>> ...
>> else {
>>
>> long cwin; ß-- Should be an int
>> ...
>> if (len > 0) {
>>
>> cwin = tp->snd_cwnd -
>>
>> (tp->snd_nxt - tp->sack_newdata) -
>>
>> sack_bytes_rxmt;
>>
>> if (cwin < 0)
>>
>> cwin = 0;
>>
>> len = lmin(len, cwin);
>>
>> }
>>
>> }
>>
>> }
>>
>>
>>
>> Consider the case where:
>>
>> sack_rxmit = 0
>>
>> sack_bytes_rxmt = 0x2238
>>
>> off = 0
>>
>> len =0xa19c
>>
>> tp->snd_cwnd = 0x2238
>>
>> tp->snd_nxt = 0xdd6d7974
>>
>> tp->sack_newdata = 0xdd6d6858
>>
>> In this case cwin evaluates to 0x00000000ffffe37c, which is not <0, but
>> instead huge. This causes the remaining data on the socket’s so->so_snd
>> buffer to be sent to the network causing more problems at the receiver which
>> is already dropping frames.
>
> I see. This is most likely a bug. Can you send-pr so this doesn't get lost?

What bug do you see? This is most likely not a bug. I only see the
following bugs
- the suggestion to increase the fragility of the code by changing cwin to
int
- lots of whitespace lossage
- the style bug in the declaration of cwin (nested declaration)
- lots fragile but working code. It depends on the machine being a normal
2's complement one. It would fail on normal 1's complement machines and
on abnormal 2's complement ones, but so would many other things in the
kernel.
- type and arithmetic errors that are not made at runtime resulting in a
value that wouldn't work, though the runtime value would.

Relevant code quoted again, with the whitespace fixed:

>> cwin = tp->snd_cwnd -
>> (tp->snd_nxt - tp->sack_newdata) -
>> sack_bytes_rxmt;

On 64-bit machines, with the above values, this is:

rhs = (u_long)0x2238UL -
((tcp_seq)0xdd6d7974 -
(tcp_seq)0xdd6d6858) -
(int)0x2238;
= (u_long)0x2238UL -
((uint32_t)0xdd6d7974 -
(uint32_t)0xdd6d6858) -
(int)0x2238;
= (u_long)0x2238UL -
(u_int)0x111c -
(int)0x2238;
= (u_long)0x111c -
(int)0x2238;
= (u_long)0x111c -
(u_long)0x2238;
= (u_long)0xffffffffffffeee4;
cwin = (long)rhs;
= -(long)0x111c;

I might have made arithmetic errors too, but I'm sure that I got the
conversions essentially correct. On machines with 64-bit u_longs,
almost everything is evaluated modulo 2^64. This gives a large positive
value, but not one with the top bits set up to only the 31st as would
happen on machines with 32-bit u_longs. Then the final conversion to
long gives a negative value.

This is fragile, but it is a standard 2's complement hack. It would
fail mainly on normal ones complement machines when the rhs is
(u_long)0xFF...FF. Then the lhs is probably negative 0, which is
not less than 0.

The fragility is essentially the same on machines with 32-bit u_longs.
Almost everything is evaluated modulo 2^32...

Using 64-bit u_longs for tp->snd_cwnd (and thus for almost the entire
calculation) is exessive but doesn't cause any problems.

Using a signed type for sack_bytes_rxmt asks for sign extension bugs but
doesn't get them. Here it is promoted to a u_long so there are no
sign extension bugs for it here.

Using a signed type for cwin is essential for the comparison of cwin
with 0 to work. This signed type should have the same size as the rhs
to avoid even more fragility (if it were int, then you would have to
worry about the value being changed to a non-working value by the
implementation-defined conversion of the rhs to cwin not just for
values larger than LONG_MAX but also for ones larger than INT_MAX.
`int' should work in practice. This and other things depend on the
difference of the tcp_seq's not being anywhere near as large as
0x7fffffff).

Bruce

------------------------------

Message: 6
Date: Sun, 21 Mar 2010 12:04:55 +0300
From: Evgenii Davidov <da...@korolev-net.ru>
Subject: Re: kern/144917: Flowtable crashes system
To: freeb...@FreeBSD.org
Message-ID: <20100321090...@korolev-net.ru>
Content-Type: text/plain; charset=koi8-r

úÄÒÁ×ÓÔ×ÕÊÔÅ,

On Sat, Mar 20, 2010 at 11:06:35PM +0000, Doychin Dokov ÐÉÛÅÔ:

> >Description:
> It seems like flowtable has been merged and enabled by default in 8.0.... which is a really really bad idea.
> On a system which handles two full BGP tables it makes one of the CPU cores run at 100% right after most of the prefixes get installed in the routing table.

i saw the same effect with ospf

--
Evgenii V Davidov

------------------------------

End of freebsd-net Digest, Vol 363, Issue 7
*******************************************

0 new messages