Gigabit Ethernet performance with Realtek 8111E

Michael Laß

unread,

Nov 5, 2011, 7:53:23 AM11/5/11

to

Hi!

I've got a small NAS with Intel D525MW (Atom) board inside using FreeBSD
9.0-RC1 as operating system. It has an onboard Realtek 8111E ethernet
adapter. I'm experiencing heavy performance problems when transfering
files from a specific PC in my network to that NAS. I did the following
tests by transfering large amount of data between the diferrent machines
(using dd and nc):

NAS -> Linux1: ~ 400Mbit/s
NAS -> Linux2: ~ 400Mbit/s
Linux1 -> NAS: heavy fluctuation, between 700Mbit/s and 0bit/s
Linux2 -> NAS: ~ 400Mbit/s
Linux1 -> Linux2: ~ 400Mbit/s
Linux2 -> Linux1: ~ 400Mbit/s

As you can see everythink works fine except for transfering data from
Linux1 to that NAS box. The following graph shows the problem:
http://dl.dropbox.com/u/25455527/network-problems.png

While the transfer rate drops to zero the NAS also has a very bad ping
up to one second. Ping of Linux1 is perfectly fine during these outages.

I also had a quick look on the data stream with wireshark on Linux1 and
it shows a lot of TCP Dup ACK (up to 263 Dup ACKs created by NAS for one
frame).

What can be eliminated as a cause is:
- Switch (I tried connecting Linux1 and NAS directly)
- Cable (I changed that a few times)
- Harddisk I/O (I'm only writing from /dev/zero to /dev/null)

The sevirity of that problem varies from one minute to another but can
always be reproduced with a few tries.

When limiting either NAS or Linux1 to 100Mbit I'm getting a steady
transfer rate of about 90Mbit/s.
When decreasing the MTU on NAS to 1200 the problem seems to disappear,
getting a transfer rate of about 160Mbit/s.

ifconfig re0:
> re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=388b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
> ether 38:60:77:3e:af:a5
> inet 192.168.178.54 netmask 0xffffff00 broadcast 192.168.178.255
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> media: Ethernet autoselect (1000baseT <full-duplex>)
> status: active

pciconf -lv:
> re0@pci0:1:0:0: class=0x020000 card=0xd6258086 chip=0x816810ec rev=0x06 hdr=0x00
> vendor = 'Realtek Semiconductor Co., Ltd.'
> device = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
> class = network
> subclass = ethernet

Because Linux1 seems to be involved in that problem: It's running Linux
3.0 and it has an "Atheros Communications AR8121/AR8113/AR8114" onboard.

Does anyone have an idea what could be the problem here? Decreasing the
MTU is some kind of solution but the performance is still not optimal
and a MTU of 1500 should be no problem.

Greetings,
Michael Laß

_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net...@freebsd.org"

Rick Macklem

unread,

Nov 5, 2011, 11:03:50 AM11/5/11

to

try typing:
# sysctl dev.re.0.stats=1
- this will dump out the stats on the chip
if the "Rx missed frames" count is non-zero, you're probably snookered,
to put it technically:-)
- That's what I get for a re chip is this laptop and I haven't found
a way around it. I just live with flakey net performance.

rick

Adrian Chadd

unread,

Nov 5, 2011, 12:57:26 PM11/5/11

to

.. I do seem to recall dup acks being a problem in the stack, no?

Adrian

Michael Laß

unread,

Nov 6, 2011, 7:02:03 AM11/6/11

to

Hi!

Am Samstag, den 05.11.2011, 11:03 -0400 schrieb Rick Macklem:
> try typing:
> # sysctl dev.re.0.stats=1
> - this will dump out the stats on the chip
> if the "Rx missed frames" count is non-zero, you're probably snookered,
> to put it technically:-)
> - That's what I get for a re chip is this laptop and I haven't found
> a way around it. I just live with flakey net performance.

Rx missed frames is >0 indeed. Every time I see those drops in speed the
number of missed frames increases by approx. 20-50.

When searching for this problem I found your old thread on
freebsd-current[1]. It seems that the problem is way less severe here.
Some transfers even don't cause any problems. Others however spend more
time at 0kbit/s than actually transferring data...
It also seems like transfers are stabilizing after some seconds but that
is not always the case.
In good times the rate of missed frames is below 0.01%.

I think the Dup ACKs are just a result of these lost packages. I do not
see them always when these problems occur.

Was there any progress after your last mail on 8th of Nov.?

Greetings,
Michael

[1]:
http://lists.freebsd.org/pipermail/freebsd-current/2010-October/020793.html
http://lists.freebsd.org/pipermail/freebsd-current/2010-November/020797.html

Rick Macklem

unread,

Nov 6, 2011, 10:49:57 AM11/6/11

to

Nope. For my case, when Rx frames are missed, there is a Fifo overflow
reported. I'm no hardware guy, but my understanding is that, sometimes,
the dma engine transferring data to the receive buffers doesn't keep up
and the fifo fills up.

I did try assorted hacks on the driver, but none of them got rid of
the problem. For my case the combination of these two things did
reduce the # of Rx packets missed, but not down to 0.
- disable msi interrupts (there's an option in the driver)
- comment out the few lines of code that disabled/re-enabled
interrupts (I don't think this code is broken, but for some reason,
leaving the interrupts enabled reduced the # of Rx missed for this
laptop. Maybe the dma engine stops running when interrupts are being
switched on/off? Just pure conjecture, of course.)
Also, only both of the above together made a difference. Each one
individually didn't help.

I heard that there was a driver for BSD out there somewhere that puts
all the Realtek chips in 8139 compatible mode and drives them that way,
but I never even gotten as far as searching for this driver.

Good luck with it, rick

Adrian Chadd

unread,

Nov 6, 2011, 12:30:02 PM11/6/11

to

Hi,

You've triggered off a little memory cell deep in my ath(4) 11n
hacking. I saw similar issues with TX/RX interrupt handling and fifo
underrun/overrun/timeouts. It turns out there were bugs in the
interrupt handling code. :-)

Someone who can read the driver and/or has access to the datasheets
should check to make sure that interrupt disable/enable:

* Doesn't clear the RX interrupt condition. Ie, if you disable
interrupts and an RX interrupt status occurs, then when you re-enable
it, it should _immediately_ trigger another interrupt rather then
waiting for the next interrupt to occur;
* Whether MSI is doing the same thing.

This has me a little concerned. Ie, given the trouble people have with
e1000 and MSI, I wonder if there's either a bug in the interrupt
handling in both cases, or whether we're doing something "wrong" with
MSI interrupts that show up under network load.

I'm CC'ing jhb@ so he may provide some helpful hints on how legacy/MSI
interrupts are expected to work in this instance.

Adrian

YongHyeon PYUN

unread,

Nov 6, 2011, 6:40:54 PM11/6/11

to

Some revisions of RealTek controller have FIFO overrun issue but
I'm not sure whether you're seeing the issue. Try enabling flow
control and see whether that makes any difference. You can enable
it by issuing 'ifconfig re0 media flow'.

> When decreasing the MTU on NAS to 1200 the problem seems to disappear,
> getting a transfer rate of about 160Mbit/s.
>
> ifconfig re0:
> > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > options=388b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
> > ether 38:60:77:3e:af:a5
> > inet 192.168.178.54 netmask 0xffffff00 broadcast 192.168.178.255
> > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> > media: Ethernet autoselect (1000baseT <full-duplex>)
> > status: active
>

> pciconf -lv:
> > re0@pci0:1:0:0: class=0x020000 card=0xd6258086 chip=0x816810ec rev=0x06 hdr=0x00
> > vendor = 'Realtek Semiconductor Co., Ltd.'
> > device = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
> > class = network
> > subclass = ethernet
>

Show me the dmesg output. RealTek uses the same device PCI ids so it's
impossible to know which controller you have from the pciconf(8)
output.

> Because Linux1 seems to be involved in that problem: It's running Linux
> 3.0 and it has an "Atheros Communications AR8121/AR8113/AR8114" onboard.
>
> Does anyone have an idea what could be the problem here? Decreasing the
> MTU is some kind of solution but the performance is still not optimal
> and a MTU of 1500 should be no problem.
>
> Greetings,
> Michael Laß

YongHyeon PYUN

unread,

Nov 7, 2011, 12:59:53 PM11/7/11

to

This should be read as 'ifconfig re0 mediaopt flow'.

Michael Laß

unread,

Nov 11, 2011, 4:21:20 PM11/11/11

to

Hi!

Sorry for my late response.

Am Montag, den 07.11.2011, 09:59 -0800 schrieb YongHyeon PYUN:
> >
> > Some revisions of RealTek controller have FIFO overrun issue but
> > I'm not sure whether you're seeing the issue. Try enabling flow
> > control and see whether that makes any difference. You can enable
> > it by issuing 'ifconfig re0 media flow'.
>
> This should be read as 'ifconfig re0 mediaopt flow'.

It may be that enabling flow control helps a bit but it definately does
not solve the problem. There are still hundreds of packets missed in
just one or two minutes. Maybe there is no difference at all.

> > Show me the dmesg output. RealTek uses the same device PCI ids so it's
> > impossible to know which controller you have from the pciconf(8)
> > output.

I think the relevant part is this one:
> re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet> port 0x1000-0x10ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci1
> re0: Using 1 MSI-X message
> re0: Chip rev. 0x2c000000
> re0: MAC rev. 0x00000000
> miibus0: <MII bus> on re0
> rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
> rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> re0: Ethernet address: 38:60:77:3e:af:a5

Full dmesg output is also attached.

Greetings,
Michael

PS: In my first mail I wrote that I can reproduce the problem only with
one of two connected hosts. I think the reason is that the other host
only produces a maximum of 250Mbit/s while the problematic transfers go
up to 550Mbit/s.

Michael Laß

unread,

Nov 11, 2011, 4:36:03 PM11/11/11

to

Attachment has been removed. So this is another try:
http://pastebin.com/PArR9D8N

YongHyeon PYUN

unread,

Nov 11, 2011, 6:55:26 PM11/11/11

to

On Fri, Nov 11, 2011 at 10:21:20PM +0100, Michael La?? wrote:
> Hi!
>
> Sorry for my late response.
>
> Am Montag, den 07.11.2011, 09:59 -0800 schrieb YongHyeon PYUN:
> > >
> > > Some revisions of RealTek controller have FIFO overrun issue but
> > > I'm not sure whether you're seeing the issue. Try enabling flow
> > > control and see whether that makes any difference. You can enable
> > > it by issuing 'ifconfig re0 media flow'.
> >
> > This should be read as 'ifconfig re0 mediaopt flow'.
>
> It may be that enabling flow control helps a bit but it definately does
> not solve the problem. There are still hundreds of packets missed in
> just one or two minutes. Maybe there is no difference at all.
>

Ok, try attached patch and let me know how it works.

> > > Show me the dmesg output. RealTek uses the same device PCI ids so it's
> > > impossible to know which controller you have from the pciconf(8)
> > > output.
>
> I think the relevant part is this one:
> > re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet> port 0x1000-0x10ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci1
> > re0: Using 1 MSI-X message
> > re0: Chip rev. 0x2c000000
> > re0: MAC rev. 0x00000000
> > miibus0: <MII bus> on re0
> > rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
> > rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> > re0: Ethernet address: 38:60:77:3e:af:a5
>

Your controller is RTL8168E.

re.oflow.diff

Michael Laß

unread,

Nov 12, 2011, 7:44:28 AM11/12/11

to

Hi!

Am Freitag, den 11.11.2011, 15:55 -0800 schrieb YongHyeon PYUN:
>
> Ok, try attached patch and let me know how it works.

Unfortunately it does not make any difference at all.

> Your controller is RTL8168E.

So I should not trust that Intel data sheet... ;)

Greetings,
Michael

Michael Laß

unread,

Nov 12, 2011, 9:04:13 AM11/12/11

to

I should have tried this earlier:

When using Windows instead of Linux on the other host I don't have any
problems. I'm getting a constant transfer rate of over 500Mbit/s without
a single frame missed on the freebsd-machine.

So maybe this problem is more related to the ATL1E driver in the linux
kernel.

Greetings,
Michael