Re: NetBSD in BSD Router / Firewall Testing

Thor Lancelot Simon

unread,

Nov 30, 2006, 6:49:38 PM11/30/06

to Hubert Feyrer

On Fri, Dec 01, 2006 at 12:06:17AM +0100, Hubert Feyrer wrote:
>
> [adding tech-net@ as I don't really know what to answer...
>
> Context: adding NetBSD in the benchmark at
> http://www.tancsa.com/blast.html, with the wm(4) driver in
> -current, as it's not available in 3.1]
>
>
> On Thu, 30 Nov 2006, Mike Tancsa wrote:
> >Gave it a try and I posted the results on the web page. The Intel driver
> >doesnt seem to work too well. Is there debugging in this kernel ?
>
> That sounds indeed not so bright. I do not know about the wm(4) driver,
> but maybe someone on tech-net@ (CC:d) has an idea. IIRC that's with a
> -current (HEAD) GENERIC kernel and the wm(4) driver, while bge(4) driver
> works ok.

There are some severe problems with the test configuration.

1) The published test results freely mix configurations where the switch
applies and removes the vlan tags with configurations where the host
does so. This is not a good idea:

1) The efficiency of the switch itself will differ in these configurations
2) The difference in frame size will actually measurably impact the PPS.
3) One of the device drivers you're testing doesn't do hardware VLAN
tag insertion/removal in NetBSD due to a bug (wm). Obviously, this
one is our fault, not yours.

2) The NetBSD kernels you're testing don't have options GATEWAY, so they
don't have the fastroute code.

3) There is a problem with autonegotiation either on your switch, on the
particular wm adapter you're using, or in NetBSD -- there's not quite
enough data to tell which. But look at the number of input errors on
the wm adapter in your test with NetBSD-current: it's 3 million. This
alone is probably responsible for most of the performance difference
between the wm and bge test cases with NetBSD kernels (and the hardware
vlan support in the bge driver may be responsible for the rest).

4) You don't appear to be using hardware IP checksum offload. You're going
to have trouble turning this on with a mismatched kernel and ifconfig
executable, however. :-/

With these fixed, we can probably help diagnose any remaining issues
pretty quickly.

Thor

Mike Tancsa

unread,

Nov 30, 2006, 7:44:10 PM11/30/06

to t...@rek.tjls.com, Hubert Feyrer

At 06:49 PM 11/30/2006, Thor Lancelot Simon wrote:

>There are some severe problems with the test configuration.
>
>1) The published test results freely mix configurations where the switch
> applies and removes the vlan tags with configurations where the host
> does so. This is not a good idea:

Hi,
The switch is always involved. The ports are only in trunk mode for
the trunking tests. Otherwise, its switchport access. The same
"limitations" apply to all tested configurations. When I swapped in
a faster CPU briefly, I was seeing rates of +1Mpps on RELENG_4 with
no dropped packets with no firewall in the kernel. Thats the same
hardware, so I am not sure how its inadequate hardware on NetBSD
tests all of a sudden.

> 1) The efficiency of the switch itself will differ in these configurations

Why ? The only thing being changed from test to test is the OS.

> 2) The difference in frame size will actually measurably impact the PPS.

Framesize is always the same. UDP packet with a 10byte payload. The
generators are the same devices all the time. I am not using
different frame sizes for different setups to try and make something
look good and other things bad.

> 3) One of the device drivers you're testing doesn't do hardware VLAN
> tag insertion/removal in NetBSD due to a bug (wm). Obviously, this
> one is our fault, not yours.

When I did the wm tests, this was just plugged into the switch with a
port based VLAN (the cisco equiv of switchport access). There was no
trunking going on.

>2) The NetBSD kernels you're testing don't have options GATEWAY, so they
> don't have the fastroute code.

Like I said to Hubert, I am not a NetBSD person and just did the
default install. I am happy to re-test with a more appropriate kernel
config. FreeBSD and dfly both have fastforward (I am guessing like
your fastroute) as a sysctl tuneable so I could add that to the kernel...

>3) There is a problem with autonegotiation either on your switch, on the
> particular wm adapter you're using, or in NetBSD -- there's not quite
> enough data to tell which. But look at the number of input errors on
> the wm adapter in your test with NetBSD-current: it's 3 million. This
> alone is probably responsible for most of the performance difference

... Or the kernel just was not able forward fast enough. FYI, The
switch proper negotiated with all the other OSes tested nor were
there errors on the switchport.

---Mike

Thor Lancelot Simon

unread,

Nov 30, 2006, 9:43:39 PM11/30/06

to Mike Tancsa

On Thu, Nov 30, 2006 at 07:41:45PM -0500, Mike Tancsa wrote:
> At 06:49 PM 11/30/2006, Thor Lancelot Simon wrote:
>
> > 1) The efficiency of the switch itself will differ in these
> > configurations
>
> Why ? The only thing being changed from test to test is the OS.

Because the switch hardware does not forward packets at the same rate
when it is inserting and removing VLAN tags as it does when it's not.
The effect will be small, but measurable.

> > 2) The difference in frame size will actually measurably impact the PPS.
>
> Framesize is always the same. UDP packet with a 10byte payload.

No. The Ethernet packets with the VLAN tag on them are not, in fact,
the same size as those without it; and for a packet as small as a 10
byte UDP packet, this will make quite a large difference if you actually
have a host that can inject packets at anywhere near wire speed.

> generators are the same devices all the time. I am not using
> different frame sizes for different setups to try and make something
> look good and other things bad.

I didn't say that you were, just to be clear. But that does not mean
that running some tests with tagging turned on, and others not, is
good benchmarking practice: you should run the exact same set of tests
for all host configurations, because doing otherwise yields distorted
results.

> >3) There is a problem with autonegotiation either on your switch, on the
> > particular wm adapter you're using, or in NetBSD -- there's not quite
> > enough data to tell which. But look at the number of input errors on
> > the wm adapter in your test with NetBSD-current: it's 3 million. This
> > alone is probably responsible for most of the performance difference
>

> .... Or the kernel just was not able forward fast enough.

No; that will simply not cause the device driver to report an input
error, whereas your netstat output shows that it reported three *million*
of them. Something is wrong at the link layer. It could be in the NetBSD
driver for the Intel gigabit PHY, but there's not enough data in your
report to be sure. FWIW, I work for a server load balancer vendor that
ships a FreeBSD-based product, and I consequently do a lot of load testing.
Even with tiny UDP packets, I get better forwarding performance from
basically _every_ OS you tested than you seem to, which is why I think
there's something that's not quite right with your test rig. I am just
doing my best to point out the first things that come to mind when I look
at the data you've put online.

I note that you snipped the text where I noted that because you're
testing the wm card with mismatched kernel and ifconfig, you're not
using its hardware checksum offload. That's one thing you should
definitely fix, and if you don't have that turned on for other
kernels you're testing, of course you should probably fix it there too.

--
Thor Lancelot Simon t...@rek.tjls.com
"The liberties...lose much of their value whenever those who have greater
private means are permitted to use their advantages to control the course
of public debate." -John Rawls

Mike Tancsa

unread,

Nov 30, 2006, 10:17:22 PM11/30/06

to t...@rek.tjls.com

At 09:43 PM 11/30/2006, Thor Lancelot Simon wrote:
>On Thu, Nov 30, 2006 at 07:41:45PM -0500, Mike Tancsa wrote:
> > At 06:49 PM 11/30/2006, Thor Lancelot Simon wrote:
> >
> > > 1) The efficiency of the switch itself will differ in these
> > > configurations
> >
> > Why ? The only thing being changed from test to test is the OS.
>
>Because the switch hardware does not forward packets at the same rate
>when it is inserting and removing VLAN tags as it does when it's not.
>The effect will be small, but measurable.

But the same impact will hurt *all* the OSes tested equally, not just
NetBSD. Besides, supposedly the switch is rated to 17Mpps. No doubt
there is a bit of vendor exaggeration, but I doubt they would stretch
the number by a factor of 10. Still, even if they did, I would not
be able to push over 1Mpps on my RELENG_4 setup.

> > > 2) The difference in frame size will actually measurably
> impact the PPS.
> >
> > Framesize is always the same. UDP packet with a 10byte payload.
>
>No. The Ethernet packets with the VLAN tag on them are not, in fact,

I did both sets of tests. eg . the line

RELENG_6 UP i386 FastFWD Polling

means that em0 was in the equiv of 0/4 and em1 0/5 with port 0/4
switchport access 44
and port 0/5
switchport access 88

where as the test

RELENG_4, FastFWD, vlan44 and vlan88 off single int, em1 Polling, HZ=2000

has a switch config of
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 44,88
switchport mode trunk

on port 5.

I tested NetBSD 3.1 bge nic against config b). So I dont see how you
cant compare the results of that to

HEAD, FastFWD, vlan44 and vlan88 off single int, em1 (Nov 24th sources)
RELENG_4, FastFWD, vlan44 and vlan88 off single int, em1 Polling, HZ=2000
HEAD, FastFWD, vlan44 and vlan88 off single int, bge0 (Nov 24th sources)
RELENG_6, FastFWD, INTR_FAST, vlan44 and vlan44 off single int, em1

which had the exact same switch config.

>the same size as those without it; and for a packet as small as a 10
>byte UDP packet, this will make quite a large difference if you actually
>have a host that can inject packets at anywhere near wire speed.

Thats why I use at least 2...

> > generators are the same devices all the time. I am not using
> > different frame sizes for different setups to try and make something
> > look good and other things bad.
>
>I didn't say that you were, just to be clear. But that does not mean
>that running some tests with tagging turned on, and others not, is
>good benchmarking practice: you should run the exact same set of tests
>for all host configurations, because doing otherwise yields distorted
>results.

I did where I could. I am not saying compare the trunking performance
of NetBSD to the non trunking performance of FreeBSD. I am looking
at trunking to trunking, non trunking to non trunking. I did the
majority of my testing with the Intel PCIe dual port card which
NetBSD 3.1 does not support. So, since I had some bge tests, I ran
the bge tests in vlan mode which I dont see why you cant compare that
to vlan mode on FreeBSD using the same bge card. Its the exact same
switch config for both set of tests, and the same traffic generators
so I dont see why its not a valid comparison.

> > >3) There is a problem with autonegotiation either on your switch, on the
> > > particular wm adapter you're using, or in NetBSD -- there's not quite
> > > enough data to tell which. But look at the number of input errors on
> > > the wm adapter in your test with NetBSD-current: it's 3 million. This
> > > alone is probably responsible for most of the performance difference
> >
> > .... Or the kernel just was not able forward fast enough.
>
>No; that will simply not cause the device driver to report an input
>error, whereas your netstat output shows that it reported three *million*
>of them. Something is wrong at the link layer. It could be in the NetBSD
>driver for the Intel gigabit PHY, but there's not enough data in your
>report to be sure. FWIW, I work for a server load balancer vendor that
>ships a FreeBSD-based product, and I consequently do a lot of load testing.
>Even with tiny UDP packets, I get better forwarding performance from
>basically _every_ OS you tested than you seem to, which is why I think
>there's something that's not quite right with your test rig. I am just
>doing my best to point out the first things that come to mind when I look
>at the data you've put online.

Stock FreeBSD, or modified FreeBSD ? With RELENG_4 I can push over
1Mpps. All of the test setups I used saw input errors when I tried
to push too many packets through the box. I really dont know much
about NetBSD but it too will have some sort of limit as to how much
it can forward. Once its limit is hit, how does it report that
? Does it just silently drop the packet ? Or does it show up as an
input error ?

>I note that you snipped the text where I noted that because you're
>testing the wm card with mismatched kernel and ifconfig, you're not
>using its hardware checksum offload. That's one thing you should
>definitely fix, and if you don't have that turned on for other
>kernels you're testing, of course you should probably fix it there too.

It didnt seem to make much difference on FreeBSD (i.e. turn hardware
checksums on or off for routing performance) but I will see if I can
get the box rebuilt to sync the base with the kernel.

---Mike

Mike Tancsa

unread,

Dec 1, 2006, 1:08:43 AM12/1/06

to t...@rek.tjls.com

At 09:43 PM 11/30/2006, Thor Lancelot Simon wrote:

>I note that you snipped the text where I noted that because you're
>testing the wm card with mismatched kernel and ifconfig, you're not
>using its hardware checksum offload. That's one thing you should
>definitely fix, and if you don't have that turned on for other
>kernels you're testing, of course you should probably fix it there too.

OK, I updated the base as well and rebuilt the kernel. There doesnt
seem to be much difference, perhaps +5Kpps by turning it on. But it
seems to be the driver, as I get FAR better results with the bge nic
(see below)

# ifconfig wm0
wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=300<IP4CSUM_Rx,IP4CSUM_Tx>
address: 00:15:17:0b:70:98
media: Ethernet autoselect (1000baseT
full-duplex,flowcontrol,rxpause,txpause)
status: active
inet 192.168.88.223 netmask 0xffffff00 broadcast 192.168.88.255
inet6 fe80::215:17ff:fe0b:7098%wm0 prefixlen 64 scopeid 0x5
# ifconfig wm1
wm1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=300<IP4CSUM_Rx,IP4CSUM_Tx>
address: 00:15:17:0b:70:99
media: Ethernet autoselect (1000baseT
full-duplex,flowcontrol,rxpause,txpause)
status: active
inet 192.168.44.223 netmask 0xffffff00 broadcast 192.168.44.255
inet6 fe80::215:17ff:fe0b:7099%wm1 prefixlen 64 scopeid 0x6
# netstat -ni
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
wm0 1500 <Link> 00:15:17:0b:70:98 32226898
281780 15 0 0
wm0 1500 192.168.88/24 192.168.88.223 32226898
281780 15 0 0
wm0 1500 fe80::/64 fe80::215:17ff:fe 32226898
281780 15 0 0
wm1 1500 <Link> 00:15:17:0b:70:99 34 0 7117358 0 0
wm1 1500 192.168.44/24 192.168.44.223 34 0 7117358 0 0
wm1 1500 fe80::/64 fe80::215:17ff:fe 34 0 7117358 0 0

There are no errors on the switchport.

And in SMP which is GERNIC.MP with

options GATEWAY # packet forwarding

NetBSD 4.99.4 (ROUTER) #1: Thu Nov 30 19:23:52 EST 2006
mdta...@r2-netbsd.sentex.ca:/usr/obj/sys/arch/i386/compile/ROUTER
total memory = 2047 MB
avail memory = 2002 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xf21a0
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (OEM00000 PROD00000000)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Unknown K7 (Athlon) (686-class), 2015.10 MHz, id 0x20fb1
cpu0: features f7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features f7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu0: features f7dbfbff<FXSR,SSE,SSE2,HTT,LONG,3DNOW2,3DNOW>
cpu0: features2 1<SSE3>
cpu0: "AMD Athlon(tm) 64 X2 Dual Core Processor 3800+"
cpu0: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu0: L2 cache 512 KB 64B/line 16-way
cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: AMD Power Management features: f<TTP,VID,FID,TS>
cpu0: calibrating local timer
cpu0: apic clock running at 201 MHz
cpu0: 8 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: AMD Unknown K7 (Athlon) (686-class), 2015.00 MHz, id 0x20fb1
cpu1: features f7dbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features f7dbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu1: features f7dbfbff<FXSR,SSE,SSE2,HTT,LONG,3DNOW2,3DNOW>
cpu1: features2 1<SSE3>
cpu1: "AMD Athlon(tm) 64 X2 Dual Core Processor 3800+"
cpu1: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu1: L2 cache 512 KB 64B/line 16-way
cpu1: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu1: AMD Power Management features: f<TTP,VID,FID,TS>
mpbios: bus 0 is type PCI
mpbios: bus 1 is type PCI
mpbios: bus 2 is type PCI
mpbios: bus 3 is type PCI
mpbios: bus 4 is type PCI
mpbios: bus 5 is type PCI
mpbios: bus 6 is type ISA
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 24 pins
ioapic0: misconfigured as apic 0
ioapic0: remapped to apic 2
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
NVIDIA nForce4 Memory Controller (miscellaneous memory, revision
0xa3) at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 1 function 0
pcib0: NVIDIA product 0x0050 (rev. 0xa3)
NVIDIA nForce4 SMBus (SMBus serial bus, revision 0xa2) at pci0 dev 1
function 1 not configured
viaide0 at pci0 dev 6 function 0
viaide0: NVIDIA nForce4 IDE Controller (rev. 0xf2)
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at viaide0 channel 1
ppb0 at pci0 dev 9 function 0: NVIDIA nForce4 PCI Host Bridge (rev. 0xa2)
pci1 at ppb0 bus 5
pci1: i/o space, memory space enabled
vga1 at pci1 dev 8 function 0: ATI Technologies Rage XL (AGP) (rev. 0x65)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
nfe0 at pci0 dev 10 function 0: ioapic0 pin 3 (irq 3), address
00:13:d4:ae:9b:6b
makphy0 at nfe0 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb1 at pci0 dev 11 function 0: NVIDIA nForce4 PCIe Host Bridge (rev. 0xa3)
pci2 at ppb1 bus 4
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
bge0 at pci2 dev 0 function 0: Broadcom BCM5751 Gigabit Ethernet
bge0: interrupting at ioapic0 pin 11 (irq 11)
bge0: pcie mode=0x105000
bge0: ASIC BCM5750 A1 (0x4001), Ethernet address 00:10:18:14:15:12
bge0: setting short Tx thresholds
brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb2 at pci0 dev 12 function 0: NVIDIA nForce4 PCIe Host Bridge (rev. 0xa3)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
bge1 at pci3 dev 0 function 0: Broadcom BCM5751 Gigabit Ethernet
bge1: interrupting at ioapic0 pin 10 (irq 10)
bge1: pcie mode=0x105000
bge1: ASIC BCM5750 A1 (0x4001), Ethernet address 00:10:18:14:27:d5
bge1: setting short Tx thresholds
brgphy1 at bge1 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb3 at pci0 dev 13 function 0: NVIDIA nForce4 PCIe Host Bridge (rev. 0xa3)
pci4 at ppb3 bus 2
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
bge2 at pci4 dev 0 function 0: Broadcom BCM5751 Gigabit Ethernet
bge2: interrupting at ioapic0 pin 5 (irq 5)
bge2: pcie mode=0x105000
bge2: ASIC BCM5750 A1 (0x4001), Ethernet address 00:10:18:14:38:d2
bge2: setting short Tx thresholds
brgphy2 at bge2 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb4 at pci0 dev 14 function 0: NVIDIA nForce4 PCIe Host Bridge (rev. 0xa3)
pci5 at ppb4 bus 1
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci5 dev 0 function 0: Intel PRO/1000 PT (82571EB), rev. 6
wm0: interrupting at ioapic0 pin 7 (irq 7)
wm0: PCI-Express bus
wm0: 65536 word (16 address bits) SPI EEPROM
wm0: Ethernet address 00:15:17:0b:70:98
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
wm1 at pci5 dev 0 function 1: Intel PRO/1000 PT (82571EB), rev. 6
wm1: interrupting at ioapic0 pin 5 (irq 5)
wm1: PCI-Express bus
wm1: 65536 word (16 address bits) SPI EEPROM
wm1: Ethernet address 00:15:17:0b:70:99
igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
pchb0 at pci0 dev 24 function 0
pchb0: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb1 at pci0 dev 24 function 1
pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb2 at pci0 dev 24 function 2
pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb3 at pci0 dev 24 function 3
pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
pcppi0: children must have an explicit unit
midi0 at pcppi0: PC speaker (CPU-intensive output)
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff
npx0: reported by CPUID; using exception 16
pcppi0: attached to attimer0
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <ST340014A>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 1: <AOPEN 8X8 DVD Dual AAN, , 1.4A> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd1 at atabus1 drive 0: <ST340014A>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(viaide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
cd0(viaide0:1:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
cpu1: CPU 1 running
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
#

The best I can get is about 125Kpps

However, if I switch to the 2 bge nics (ie NON trunked mode), I get
close to 600 Kpps on the one stream and a max of 360Kpps when I have
the stream in the opposite direction going. This is comparable to
the other boxes. However, the driver did wedge and I had to ifconfig
down/up it to recover once during testing.

Nov 30 19:36:21 r2-netbsd /netbsd: bge1: pcie mode=0x105000
Nov 30 19:38:00 r2-netbsd /netbsd: bge2: pcie mode=0x105000
Nov 30 19:54:18 r2-netbsd /netbsd: bge: failed on len 52?
Nov 30 19:54:49 r2-netbsd last message repeated 10930 times
Nov 30 19:55:55 r2-netbsd last message repeated 14526 times
Nov 30 19:56:11 r2-netbsd /netbsd: ed on len 52?
Nov 30 19:56:11 r2-netbsd /netbsd: bge: failed on len 52?
Nov 30 19:56:12 r2-netbsd last message repeated 719 times
Nov 30 19:56:20 r2-netbsd /netbsd: ed on len 52?
Nov 30 19:56:20 r2-netbsd /netbsd: bge: failed on len 52?
Nov 30 19:56:21 r2-netbsd last message repeated 717 times

---Mike

Mike Tancsa

unread,

Dec 1, 2006, 9:34:29 AM12/1/06

to Hubert Feyrer

At 06:06 PM 11/30/2006, Hubert Feyrer wrote:

>[adding tech-net@ as I don't really know what to answer...
>
> Context: adding NetBSD in the benchmark at
> http://www.tancsa.com/blast.html, with the wm(4) driver in
> -current, as it's not available in 3.1]
>
>
>On Thu, 30 Nov 2006, Mike Tancsa wrote:
>>Gave it a try and I posted the results on the web page. The Intel
>>driver doesnt seem to work too well. Is there debugging in this kernel ?
>
>That sounds indeed not so bright. I do not know about the wm(4)
>driver, but maybe someone on tech-net@ (CC:d) has an idea. IIRC
>that's with a -current (HEAD) GENERIC kernel and the wm(4) driver,
>while bge(4) driver works ok.
>

>What I wonder is: how does the bge(4) driver perform under -current,
>do you have numbers for that? (Just to make sure it's not -curren that's hosed)

Done and posted. I also looked at netstat -q and indeed it reports
dropped packets

# netstat -q
arpintrq:
queue length: 0
maximum queue length: 50
packets dropped: 151
ipintrq:
queue length: 0
maximum queue length: 256
packets dropped: 133721212
ip6intrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq1:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq2:
queue length: 0
maximum queue length: 256
packets dropped: 0
clnlintrq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoediscinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoeinq:
queue length: 0
maximum queue length: 256
packets dropped: 0

Steven M. Bellovin

unread,

Dec 1, 2006, 9:55:29 AM12/1/06

to Mike Tancsa

On Fri, 01 Dec 2006 09:31:23 -0500
Mike Tancsa <mi...@sentex.net> wrote:

>
> # netstat -q
> arpintrq:
> queue length: 0
> maximum queue length: 50
> packets dropped: 151

I'm not sure this one matters much in the real world -- I suspect it can
only happen when a large number of addresses are polled in a very short
time. (OTOH, it might happen if a scanning worm was working through
the router.)

> ipintrq:
> queue length: 0
> maximum queue length: 256
> packets dropped: 133721212

This is the second report we've seen recently of packet drops in this
queue. We need to understand what's going on, I think.

--Steve Bellovin, http://www.cs.columbia.edu/~smb

Steven M. Bellovin

unread,

Dec 1, 2006, 11:26:04 AM12/1/06

to Mike Tancsa

On Fri, 01 Dec 2006 10:00:09 -0500
Mike Tancsa <mi...@sentex.net> wrote:

> At 09:55 AM 12/1/2006, Steven M. Bellovin wrote:
>
> > > ipintrq:
> > > queue length: 0
> > > maximum queue length: 256
> > > packets dropped: 133721212
> >
> >This is the second report we've seen recently of packet drops in this
> >queue. We need to understand what's going on, I think.
>

> Hi,
>
> I am guessing I am just overwhelming the box no ? Each of my
> generator boxes are blasting about 600Kpps in opposite directions
> through the box 10 byte UDP packets. Even when doing just the one
> stream in NetBSD, the box (r2) acting as the router is totally
> unresponsive from the serial console and OOB NIC.
>
I'd have expected the problem to show as drops on the output queue, not
ipintrq, unless you're running at near-100% CPU. The previous case did
not involve CPU exhaustion -- does yours?

--Steve Bellovin, http://www.cs.columbia.edu/~smb

Mike Tancsa

unread,

Dec 1, 2006, 11:57:57 AM12/1/06

to Steven M. Bellovin

At 11:25 AM 12/1/2006, Steven M. Bellovin wrote:
> >
>I'd have expected the problem to show as drops on the output queue, not
>ipintrq, unless you're running at near-100% CPU. The previous case did
>not involve CPU exhaustion -- does yours?

Hi,

I think it does in this case. As I cannot interact with the box at
the time of testing its hard to tell. But if I moderate the blast to
a slower rate, top seems to indicate its approaching full utilization
for interrupt processing. I am using FreeBSD's
/usr/src/tools/tools/netrate to generate the traffic.

At 100K, interrupt usage gets to 30%

load
averages: 0.06, 0.08, 0.08
up 0 days, 11:22 06:49:02
37 processes: 1 runnable, 35 sleeping, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 28.3% interrupt, 71.7% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 31M Act, 484K Wired, 4100K Exec, 5284K File, 1950M Free
Swap: 128M Total, 128M Free

200K

load
averages: 0.13, 0.09, 0.08
up 0 days, 11:23 06:50:06
38 processes: 1 runnable, 36 sleeping, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 50.0% interrupt, 50.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 31M Act, 484K Wired, 4100K Exec, 5300K File, 1950M Free
Swap: 128M Total, 128M Free

As it gets to 450Kpps, the box gets a little sluggish and difficult
to interact with.

load
averages: 0.15, 0.11, 0.09
up 0 days, 11:26 06:53:19
38 processes: 1 runnable, 36 sleeping, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 97.2% interrupt, 2.8% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 31M Act, 484K Wired, 4100K Exec, 5300K File, 1950M Free
Swap: 128M Total, 128M Free

Jason Thorpe

unread,

Dec 1, 2006, 12:21:43 PM12/1/06

to Mike Tancsa

On Nov 30, 2006, at 10:06 PM, Mike Tancsa wrote:

> wm0 1500 <Link> 00:15:17:0b:70:98 32226898 281780
> 15 0 0

That's still a lot of input errors.

There are a few reasons for these to accumulate:

- Receive ring overrun. Unfortunately, the log message for this is
wrapped in #ifdef WM_DEBUG, so you'll need to tweak the driver and
rebuild the kernel to see the log message.

- Failure to allocate a new receive buffer. When this happens, the
received packet is dropped the its buffer recycled. Again,
unfortunately, this has a debug-only kernel printf associated with it.

- The chip reported some sort of error with the packet. It logs
messages in system log for the following:

- symbol error
- receive sequence error
- CRC error

A carrier extension error or a Rx data error could also occur, but
these are not logged.

-- thorpej

Jason Thorpe

unread,

Dec 1, 2006, 12:24:21 PM12/1/06

to Mike Tancsa

On Dec 1, 2006, at 7:00 AM, Mike Tancsa wrote:

> At 09:55 AM 12/1/2006, Steven M. Bellovin wrote:
>
>> > ipintrq:
>> > queue length: 0
>> > maximum queue length: 256
>> > packets dropped: 133721212
>>
>> This is the second report we've seen recently of packet drops in this
>> queue. We need to understand what's going on, I think.
>
> Hi,
>
> I am guessing I am just overwhelming the box no ? Each of my
> generator boxes are blasting about 600Kpps in opposite directions
> through the box 10 byte UDP packets. Even when doing just the one
> stream in NetBSD, the box (r2) acting as the router is totally
> unresponsive from the serial console and OOB NIC.

I've jumped into this thread late -- what exactly is your
configuration? Are you using IP Filter or PF anywhere in the mix
here? If not, then it would be good to know why IP Fast Forwarding
isn't kicking in here (bypasses the IP input queue completely).

-- thorpej

Mike Tancsa

unread,

Dec 1, 2006, 1:10:08 PM12/1/06

to Jason Thorpe

At 12:23 PM 12/1/2006, Jason Thorpe wrote:

>On Dec 1, 2006, at 7:00 AM, Mike Tancsa wrote:
>
>>At 09:55 AM 12/1/2006, Steven M. Bellovin wrote:
>>
>>> > ipintrq:
>>> > queue length: 0
>>> > maximum queue length: 256
>>> > packets dropped: 133721212
>>>
>>>This is the second report we've seen recently of packet drops in this
>>>queue. We need to understand what's going on, I think.
>>
>>Hi,
>>
>>I am guessing I am just overwhelming the box no ? Each of my
>>generator boxes are blasting about 600Kpps in opposite directions
>>through the box 10 byte UDP packets. Even when doing just the one
>>stream in NetBSD, the box (r2) acting as the router is totally
>>unresponsive from the serial console and OOB NIC.
>
>I've jumped into this thread late -- what exactly is your
>configuration?

Hi,

Details of the test setup at
http://www.tancsa.com/blast.html

>Are you using IP Filter

On NetBSD, enabled and disabled.... But not removed from the kernel

>or PF anywhere in the mix

Only on FreeBSD, but it was far too slow

>here? If not, then it would be good to know why IP Fast Forwarding
>isn't kicking in here (bypasses the IP input queue completely).

I was told options GATEWAY would do it. Perhaps because I am testing
SMP ? Dont know. This week was my first experience with NetBSD.

---Mike

Mike Tancsa

unread,

Dec 1, 2006, 3:36:47 PM12/1/06

to Jonathan Stone

At 01:49 PM 12/1/2006, Jonathan Stone wrote:

>As sometime principial maintaner of NetBSD's bge(4) driver, and the
>author of many of the changes and chip-variant support subsequently
>folded into OpenBSD's bge(4) by br...@openbsd.org, I'd like to speak
>to a couple of points here.

First off, thanks for the extended insights! This has been a most
interesting exercise for me.

>I beleive the UDP packets in Mike's tests are all so small that, even
>with a VLAN tag added, the Ethernet payload (IPv4 header, UDP header,
>10 bytes UDP payload), plus 14-byte Ethernet header, plus 4-byte CRC,
>is still less than the ETHER_MIN_MTU. If so, I don't see how
>framesize is a factor, since the packets will be padded to the minimum
>valid Ethernet payload in any case. OTOH, Switch forwarding PPS may
>well show a marginal degradation due to VLAN insertion; but we're
>still 2 or 3 orders of magnitude away from those limits.

Unfortunately, my budget is not so high that I can afford to have a
high end gigE switch in my test area. I started off with a linksys,
which I managed to hang under moderately high loads. I had an
opportunity to test the Netgear and it was a pretty reasonable price
(~$650 USD) for what it claims its capable of (17Mpps). It certainly
hasnt locked up and I tried putting a bunch of boxes on line and
forwarding packets as fast as all 8 of the boxes could and there
didnt seem to be any ill effects on the switch. Similarly, trunking,
although a bit wonky to configure (I am far more used to Cisco land)
at least works and doesnt seem to degrade overall performance.

>Second point: NetBSD's bge(4) driver includes support for runtime
>manual tuning of interrupt mitigation. I chose the tuning values
>based on empirical measurements of large TCP flows on bcm5700s and bcm5704s.
>
>If my (dimming) memory serves, the default value of 0 yields
>thresh-holds close to Bill Paul's original FreeBSD driver. A value of
>1 yields an bge interrrupt for every two full-sized Ethernet
>frames. Each increment of the sysctl knob will, roughly, halve receive
>interrupt rate, up to a maximum of 5, which interrupts about every 30
>to 40 full-sized TCP segments.

I take it this is it
# sysctl -d hw.bge.rx_lvl
hw.bge.rx_lvl: BGE receive interrupt mitigation level
# sysctl hw.bge.rx_lvl
hw.bge.rx_lvl = 0
#

With ipf enabled and 10 poorly written rules.

rx_lvl pps

0 219,181
1 229,334
2 280,508
3 328,896
4 333,585
5 346,974

Blasting for 10 seconds with the value set to 5, here is the before
and after for netstat -i and netstat -q after doing
[4600X2-88-176]# ./netblast 192.168.44.1 500 10 10

start: 1165001022.659075049
finish: 1165001032.659352738
send calls: 5976399
send errors: 0
approx send rate: 597639
approx error rate: 0
[4600X2-88-176]#

# netstat -q
arpintrq:
queue length: 0
maximum queue length: 50
packets dropped: 153

ipintrq:
queue length: 0
maximum queue length: 256

packets dropped: 180561075
ip6intrq: