Short desription
----------------
TCP Segmentation Offload (TSO) result in broken IPv4-packets sent out
from Atheros AR8121/AR8113/AR8114 with the atl1e driver.
Work around
-----------
Turn off TSO.
Long desription
----------------
When I run NFS over TCP (default options) and read large files from a
server with Atheros AR8121/AR8113/AR8114 Ethernet chip, I only get
~25Mbyte/s performance. I get ~5000 retransmitted packets per GByte
data, according to RetransSegs in /proc/net/snmp . wireshark in the
client show that the server send out a sequence of frames. All but the
last one are 1500 bytes IP-packets. The last one is shorter, but the
IP-header still say 1500 byte. The client then requests retransmit,
and the retransmitted frame arrives with correct IP-header.
If I mount NFS using UDP instead, performance is ~110Mbyte/s.
TCP Segmentation Offload (TSO) is default enabled in the atl1e
Ethernet-driver. When I run a patched 2.6.30.10, enabling ethtool to
turn off TSO (using ac936929092dc6a5409b627c4c67305ab9b626b3 by Ben
Hutchings), and turn off TSO, the problem disappears. Performance is
~110Mbyte/s and no broken IP-packets arrive.
Capture of 146-byte Ethernet frame with bad IP-header:
No. Time Source Destination Protocol Info
98329 11.034129 flash.netinsight.se sid.netinsight.se RPC Continuation
Frame 98329 (146 bytes on wire, 146 bytes captured)
Arrival Time: Jan 15, 2010 13:35:16.224491000
[Time delta from previous captured frame: 0.000009000 seconds]
[Time delta from previous displayed frame: 0.000009000 seconds]
[Time since reference or first frame: 11.034129000 seconds]
Frame Number: 98329
Frame Length: 146 bytes
Capture Length: 146 bytes
[Frame is marked: False]
[Protocols in frame: eth:ip:tcp:rpc]
[Coloring Rule Name: TCP]
[Coloring Rule String: tcp]
Ethernet II, Src: AsustekC_ae:69:6d (00:26:18:ae:69:6d), Dst: sid.netinsight.se (00:18:f3:52:22:3f)
Internet Protocol, Src: flash.netinsight.se (10.100.0.88), Dst: sid.netinsight.se (10.100.1.25)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
Total Length: 1500
Identification: 0x331e (13086)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 64
Protocol: TCP (0x06)
Header checksum: 0xebc5 [correct]
Source: flash.netinsight.se (10.100.0.88)
Destination: sid.netinsight.se (10.100.1.25)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: accessbuilder (888), Seq: 93989617, Ack: 516997, Len: 80
Remote Procedure Call
0000 00 18 f3 52 22 3f 00 26 18 ae 69 6d 08 00 45 00 ...R"?.&..im..E.
0010 05 dc 33 1e 40 00 40 06 eb c5 0a 64 00 58 0a 64 ..3.@.@....d.X.d
0020 01 19 08 01 03 78 a4 07 57 23 e6 50 1f 1b 80 10 .....x..W#.P....
0030 01 f5 dd 8d 00 00 01 01 08 0a 05 28 ca 7e 38 93 ...........(.~8.
0040 67 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 g...............
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 00 ..
Software info:
I've tested with Debian 2.6.26 (stable) and 2.6.30 (testing), as well
as 2.6.30.10 from kernel.org. Same result.
Architecture: amd64 (x86_64)
Hardware info:
lspci -vvv:
03:00.0 Ethernet controller: Attansic Technology Corp. Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
Subsystem: ASUSTeK Computer Inc. Device 831c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 27
Region 0: Memory at fbfc0000 (64-bit, non-prefetchable) [size=256K]
Region 2: I/O ports at ec00 [size=128]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0f00c Data: 4189
Capabilities: [58] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag- AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [180] Device Serial Number ff-18-26-00-6d-69-ae-ff
Kernel driver in use: ATL1E
Kernel modules: atl1e
ethtool -i eth0:
driver: ATL1E
version: 1.0.0.7-NAPI
firmware-version: L1e
bus-info: 0000:03:00.0
I've also reported this upstream.
/ Anders
--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Do you know which specific chip it is?
> ~25Mbyte/s performance. I get ~5000 retransmitted packets per GByte
> data, according to RetransSegs in /proc/net/snmp . wireshark in the
> client show that the server send out a sequence of frames. All but the
> last one are 1500 bytes IP-packets. The last one is shorter, but the
> IP-header still say 1500 byte. The client then requests retransmit,
> and the retransmitted frame arrives with correct IP-header.
Please can you send a longer packet capture in pcap format?
[...]
> I've also reported this upstream.
Since this is network-related, you should mail net...@vger.kernel.org
not linux-kernel@vger.
Ben.
--
Ben Hutchings
I'm not a reverse psychological virus. Please don't copy me into your sig.
> It is an ASUS M4A78 PRO motherboard with the Atheros
> AR8121/AR8113/AR8114 on-board.
>
> >> ~25Mbyte/s performance. I get ~5000 retransmitted packets
> per GByte >> data, according to RetransSegs in
> /proc/net/snmp . wireshark in the >> client show that the
> server send out a sequence of frames. All but the >> last
> one are 1500 bytes IP-packets. The last one is shorter, but
> the >> IP-header still say 1500 byte. The client then
> requests retransmit, >> and the retransmitted frame arrives
> with correct IP-header.
i just test it on Linux localhost.localdomain 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux.
with hardware, Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
device id : 1969:1026 (rev b0)
i upload/download a 382M it work well with retransmit packet:
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 2 4 2 0 2 532501 220631 6 0 2
I also test it on kernel 2.6.33-rc1 sync from git. but it fail to boot kernel
JY> Anders Boström <and...@netinsight.net> wrote:
>> It is an ASUS M4A78 PRO motherboard with the Atheros
>> AR8121/AR8113/AR8114 on-board.
>>
>> >> ~25Mbyte/s performance. I get ~5000 retransmitted packets
>> per GByte >> data, according to RetransSegs in
>> /proc/net/snmp . wireshark in the >> client show that the
>> server send out a sequence of frames. All but the >> last
>> one are 1500 bytes IP-packets. The last one is shorter, but
>> the >> IP-header still say 1500 byte. The client then
>> requests retransmit, >> and the retransmitted frame arrives
>> with correct IP-header.
JY> i just test it on Linux localhost.localdomain 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux.
JY> with hardware, Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
JY> device id : 1969:1026 (rev b0)
JY> i upload/download a 382M it work well with retransmit packet:
Have you tested NFS over TCP? The block-size the application uses can
have an effect on this. What application did you use? Block-size?
/ Anders
>> Have you tested NFS over TCP? The block-size the application
>> uses can have an effect on this. What application did you
>> use? Block-size?
>>
JY> yes, I tested NFS over TCP.
One strange observation is that I can only reproduce this problem when
transmitting data from a NFS-server using TCP with Atheros
AR8121/AR8113/AR8114.
I've tried to reproduce the problem using test-programs, like nttcp
and netpipe, without any success. One observation is that the
test-programs *only* generates 1500 bytes IP-packets. When
the NFS-server sends data, a sequence of 1500 bytes IP-packets are
generated, ending with a shorter packet. And this last packet in the
sequence has 1500 in the IP-header length field, but is shorter.
/ Anders
I ran tcpdump over your packet capture and saw:
13:48:39.122723 00:26:18:ae:69:6d > 00:18:f3:52:22:3f, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 32664, offset 0, flags [DF], proto TCP (6), length 1500)
10.100.0.88.2049 > 10.100.1.25.888: Flags [.], cksum 0x3ebd (correct), seq 21720:23168, ack 157, win 501, options [nop,nop,TS val 152460082 ecr 1212787170], length 1448
13:48:39.122733 00:18:f3:52:22:3f > 00:26:18:ae:69:6d, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 39773, offset 0, flags [DF], proto TCP (6), length 52)
10.100.1.25.888 > 10.100.0.88.2049: Flags [.], cksum 0x5cfc (correct), ack 23168, win 58293, options [nop,nop,TS val 1212787170 ecr 152460082], length 0
13:48:39.122742 00:26:18:ae:69:6d > 00:18:f3:52:22:3f, ethertype IPv4 (0x0800), length 1462: truncated-ip - 52 bytes missing! (tos 0x0, ttl 64, id 32664, offset 0, flags [DF], proto TCP (6), length 1500)
10.100.0.88.2049 > 10.100.1.25.888: Flags [.], seq 23168:24616, ack 157, win 501, options [nop,nop,TS val 152460082 ecr 1212787170], length 1448
13:48:39.122747 00:26:18:ae:69:6d > 00:18:f3:52:22:3f, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 32666, offset 0, flags [DF], proto TCP (6), length 1500)
10.100.0.88.2049 > 10.100.1.25.888: Flags [.], cksum 0x33a1 (correct), seq 24564:26012, ack 157, win 501, options [nop,nop,TS val 152460082 ecr 1212787170], length 1448
Based on the TCP sequence numbers, it seems that the length of the
broken packet is correct but its IP header is wrong.
My understanding is that the length of the TCP payload in a GSO skb must
always be a multiple of the gso_size, so that hardware is not required
to adjust length fields. So I see several possible explanations:
1. Something generated invalid GSO skbs (unlikely; other hardware should
show the same problem)
2. The driver constructed TSO DMA descriptors for a non-GSO skb
3. The hardware is continuing to apply TSO to packets with non-TSO DMA
descriptors
Ben.
--
Ben Hutchings
Any smoothly functioning technology is indistinguishable from a rigged demo.
No, there is no such requirement. The trailer skb can be of any
size less than or equal to gso_size.
However, if the hardware assumed this then yes it would explain
the problem.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> Cc: b...@decadent.org.uk; net...@vger.kernel.org;
> 565...@bugs.debian.org; Xiong Huang
> Subject: Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e:
> TSO is broken
> One strange observation is that I can only reproduce this
> problem when transmitting data from a NFS-server using TCP
> with Atheros AR8121/AR8113/AR8114.
>
> I've tried to reproduce the problem using test-programs, like
> nttcp and netpipe, without any success. One observation is
> that the test-programs *only* generates 1500 bytes
> IP-packets. When the NFS-server sends data, a sequence of
> 1500 bytes IP-packets are generated, ending with a shorter
> packet. And this last packet in the sequence has 1500 in the
> IP-header length field, but is shorter.
>
following is my test cese,
a nfs server server with ar8131chip, device id 1063. export /tmp/ dir as the nfs share directory,
the client, mount the server_ip:/tmp to local dir /mnt/nfs, ust a python script to write and read data on the
/mnt/nfs/testnfs.log. it works fine.
Can you give me some advice on how to reproduce this bug??
Best wishes
jie
JY> Anders Boström <and...@netinsight.net> wrote:
>> Cc: b...@decadent.org.uk; net...@vger.kernel.org;
>> 565...@bugs.debian.org; Xiong Huang
>> Subject: Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e:
>> TSO is broken
>> One strange observation is that I can only reproduce this
>> problem when transmitting data from a NFS-server using TCP
>> with Atheros AR8121/AR8113/AR8114.
>>
>> I've tried to reproduce the problem using test-programs, like
>> nttcp and netpipe, without any success. One observation is
>> that the test-programs *only* generates 1500 bytes
>> IP-packets. When the NFS-server sends data, a sequence of
>> 1500 bytes IP-packets are generated, ending with a shorter
>> packet. And this last packet in the sequence has 1500 in the
>> IP-header length field, but is shorter.
>>
JY> following is my test cese,
JY> a nfs server server with ar8131chip, device id 1063. export /tmp/ dir as the nfs share directory,
JY> the client, mount the server_ip:/tmp to local dir /mnt/nfs, ust a python script to write and read data on the
JY> /mnt/nfs/testnfs.log. it works fine.
OK, the device-ID in our NFS-server is 1026, rev. b0. So it is
possible that the problem is specific to that chip/version.
JY> Can you give me some advice on how to reproduce this bug??
The only suggestion I have is to try to find a board with a 1026-chip
on it.
My test-case is just copy of a 1 Gbyte file from the
NFS-server to /dev/null , after making sure that the file isn't cached
on the client by reading huge amounts of other data.
/ Anders
>
> JY> Can you give me some advice on how to reproduce this bug??
>
> The only suggestion I have is to try to find a board with a
> 1026-chip on it.
>
> My test-case is just copy of a 1 Gbyte file from the
> NFS-server to /dev/null , after making sure that the file
> isn't cached on the client by reading huge amounts of other data.
>
just to check, if the kernel version is 2.6.26-2 ??
Best wishes
jie
JY> Anders Boström <and...@netinsight.net> wrote:
JY> following is my test cese,
>>
JY> a nfs server server with ar8131chip, device id 1063.
>> export /tmp/ dir as the nfs share directory, JY> the client,
>> mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
>> script to write and read data on the JY>
>> /mnt/nfs/testnfs.log. it works fine.
>>
>> OK, the device-ID in our NFS-server is 1026, rev. b0. So it
>> is possible that the problem is specific to that chip/version.
JY> oops, its my mistake in writing, my case is 1026 device ID
>>
JY> Can you give me some advice on how to reproduce this bug??
>>
>> The only suggestion I have is to try to find a board with a
>> 1026-chip on it.
>>
>> My test-case is just copy of a 1 Gbyte file from the
>> NFS-server to /dev/null , after making sure that the file
>> isn't cached on the client by reading huge amounts of other data.
>>
JY> just to check, if the kernel version is 2.6.26-2 ??
I've tested with
Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
result.
/ Anders