Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FreeBSD 10 network performance problems

197 views
Skip to first unread message

Rumen Telbizov

unread,
Sep 20, 2014, 12:12:27 PM9/20/14
to freebsd...@freebsd.org
Hello everyone,

I am in the process of upgrading our main PF firewalls from 9.2-RC4 to
10.1-BETA1 (r271684) and as part of the process I have been testing the
forwarding capability of FreeBSD 10 (pf firewall disabled) to have a
base-line and find any bottlenecks on a 10GbE network.

My tests show that for the first 3-4Gbit/s of traffic things are great and
then the machine simply "hits the wall" at around 4-5Gbit/s of traffic.
There's no gradual degradation but a hard drop to 0% idle and 50-50% split
of system and interrupt CPU load. I ran some diagnostics and I was hoping
someone could point me in the right direction as to what is happening and
what I could do to improve the situation. The details I have collected so
far are below:

I run multiple iperf tcp multithreaded instances to generate traffic which
traverses the firewall. As mentioned above for the first 3-4Gbit/s traffic
the machine doesn't even break a sweat.

*top header* during this load when things are good:
CPU 0: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 8.7% interrupt, 91.3% idle
CPU 3: 0.0% user, 0.0% nice, 0.0% system, 17.3% interrupt, 82.7% idle
CPU 4: 0.0% user, 0.0% nice, 0.0% system, 12.6% interrupt, 87.4% idle
CPU 5: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% idle
CPU 6: 0.0% user, 0.0% nice, 0.0% system, 13.4% interrupt, 86.6% idle
CPU 7: 0.0% user, 0.0% nice, 0.0% system, 26.0% interrupt, 74.0% idle
CPU 8: 0.0% user, 0.0% nice, 0.0% system, 16.5% interrupt, 83.5% idle
CPU 9: 0.0% user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle
CPU 10: 0.0% user, 0.0% nice, 0.0% system, 19.7% interrupt, 80.3% idle
CPU 11: 0.0% user, 0.0% nice, 0.0% system, 7.1% interrupt, 92.9% idle
Full output at http://pastebin.com/gaaisXV8

*bmon* at the same time:
Interfaces │ RX bps pps %│ TX bps
pps %
ix0 │ 240.33MiB 242.20K │ 221.41MiB 236.68K
ix1 │ 246.51MiB 248.80K │ 261.61MiB 250.95K
>lagg0 │ 483.45MiB 492.42K │ 479.54MiB 488.02K

MiB (RX
Bytes/second) MiB (TX
Bytes/second)
499.17 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
496.49 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
415.98 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
413.74 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
332.78 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
331.00 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
249.59 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
248.25 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
166.39 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
165.50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
83.20
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 82.75
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50 55 60
K (RX
Packets/second) K (TX
Packets/second)
508.27 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
505.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
423.56 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
420.95 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
338.85 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
336.76 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
254.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
252.57 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
169.42 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
168.38 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
84.71
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 84.19
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50 55 60

To profile it without having it degrade sooner I reduced traffic to 2Gbit/s
and ran dtrace hotkernel and lock profiling. Here are the results:

*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`__mtx_lock_flags 5812 0.8%
kernel`__mtx_unlock_flags 7200 1.0%
kernel`acpi_cpu_c1 7799 1.1%
kernel`__rw_rlock 11196 1.5%
kernel`spinlock_exit 14547 2.0%
kernel`cpu_idle 166700 22.8%
kernel`sched_idletd 461883 63.1%
Full output at http://pastebin.com/w7WfFwPG

*lock profiling* for 60 seconds:
$ head -n 2 good-locks ; cat good-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
22 24 94378 18481 264549 0
0 0 18639 /usr/src/sys/kern/sched_ule.c:886 (spin mutex:sched
lock 10)
22 24 112366 20360 219179 0
0 0 17220 /usr/src/sys/kern/sched_ule.c:888 (spin mutex:sched
lock 2)
18 319 3486 22352 4233 0
5 0 1640 /usr/src/sys/kern/subr_taskqueue.c:344 (sleep
mutex:taskqueue)
26 66 3219768 204875 14616220 0
0 0 133154 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
25 90 1923012 2353820 14615738 0
0 0 1562097 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep
mutex:rtentry)
26 91 1398443 2391458 14616137 0
0 0 1604332 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/qiG3ZuAH

Again, above measurements demonstrate the state of the good/healthy system
under moderate traffic load of 3-4Gbit/s with pf disabled and fast
forwarding enabled. Here are the same measurements when I add an additional
1Gbit/s of traffic. The point when performance tanks varies between
4-5Gbit/s but it's always sudden, without any gradual degradation but
instead idle simply drops down to 0. Let's take a look:

*top header* during this load when things are bad:
CPU 0: 0.0% user, 0.0% nice, 44.6% system, 55.4% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 15.1% system, 84.9% interrupt, 0.0% idle
CPU 2: 0.0% user, 0.0% nice, 59.0% system, 40.3% interrupt, 0.7% idle
CPU 3: 0.0% user, 0.0% nice, 57.6% system, 42.4% interrupt, 0.0% idle
CPU 4: 0.0% user, 0.0% nice, 34.5% system, 63.3% interrupt, 2.2% idle
CPU 5: 0.0% user, 0.0% nice, 51.8% system, 48.2% interrupt, 0.0% idle
CPU 6: 0.0% user, 0.0% nice, 33.8% system, 66.2% interrupt, 0.0% idle
CPU 7: 0.7% user, 0.0% nice, 49.6% system, 49.6% interrupt, 0.0% idle
CPU 8: 0.0% user, 0.0% nice, 66.2% system, 33.8% interrupt, 0.0% idle
CPU 9: 0.0% user, 0.0% nice, 35.3% system, 64.7% interrupt, 0.0% idle
CPU 10: 0.0% user, 0.0% nice, 54.7% system, 45.3% interrupt, 0.0% idle
CPU 11: 0.0% user, 0.0% nice, 34.5% system, 65.5% interrupt, 0.0% idle
Full output at http://pastebin.com/9an9ZWv2

*bmon *at the same time shows inconsistent performance with big dips:
Interfaces │ RX bps pps %│ TX bps
pps %
ix0 │ 153.89MiB 151.53K │ 179.69MiB 159.91K
ix1 │ 176.56MiB 161.29K │ 145.17MiB 148.13K
>lagg0 │ 327.23MiB 333.05K │ 322.14MiB 328.13K

MiB (RX
Bytes/second) MiB
(TX Bytes/second)
657.86
.............|.............................................
648.60 ..............|.............................................
548.21
.............|.............................................
540.50 ..............|.............................................
438.57
..........|..|.........................|.....|.......|..|..
432.40 ...........|..|.........................|.....|.......|..|..
328.93
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
219.29
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
109.64
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50
55 60
K (RX
Packets/second) K
(TX Packets/second)
670.41
.............|.............................................
660.27 ..............|.............................................
558.67
.............|.............................................
550.22 ..............|.............................................
446.94
..........|..|.........................|.....|.......|..|..
440.18 ...........|..|.........................|.....|.......|..|..
335.20
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
330.13 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
223.47
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
220.09 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
111.73
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
110.04 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50
55 60

*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`_rw_runlock_cookie 7709 1.1%
kernel`__rw_rlock 11182 1.6%
kernel`ip_fastforward 12231 1.7%
kernel`__mtx_lock_flags 22004 3.1%
kernel`__mtx_unlock_flags 35614 5.0%
kernel`__mtx_lock_sleep 560768 78.5%
Full output at http://pastebin.com/NurKwkWL

*lock profiling* for 60 seconds:
$ head -n 2 bad-locks ; cat bad-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
401766 191987 1857397 194162 179 10376 1084
0 2 /usr/src/sys/kern/kern_sysctl.c:1601 (sx:sysctl mem)
21064 207907 62556 249066 396 157 628
0 3 /usr/src/sys/kern/kern_sysctl.c:1499 (sleep mutex:Giant)
1 370663 17 372573 86 0 4332
0 2 /usr/src/sys/kern/kern_exit.c:429 (sx:allproc)
14 648 8856844 46296098 15513956 0 2 0
1467849 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
15 958 13107581 252445472 15513486 0 16 0
9444644 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep mutex:rtentry)
12 779 12500816 286324556 15513872 0 18 0
9788497 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/d54QmP13

System hardware is: 12 x E5-2440 @ 2.40GHz, 24GB RAM, Dual port fiber
Intel(R) PRO/10GbE in lacp lagg
System configuration details available at http://pastebin.com/tPvs1MeD

It seems to me like heavy lock contention but I don't understand why it
happens in such an abrupt manner after a given threshold.
Other things I tried:

- Upgrade ix (ixgbe) driver to the latest from Intel (2.5.25) - for some
reason I cannot send any packets out
- enable flowtables - no difference

Any help is highly appreciated. I could provide further details and run
additional tests upon request.

Regrads,
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Adrian Chadd

unread,
Sep 20, 2014, 4:32:06 PM9/20/14
to Rumen Telbizov, freebsd...@freebsd.org
Hi,

The forwarding paths don't 100% use flowtable, so there's still
routing table lookups.

All those highly contended locks are the rtentry locking because the
forwarding and fast forwarding paths don't use flowtable.

This shouldn't be a difficult problem to solve; someone just has to go
through those places above, figure out what code is doing the lookups
and convert them to use flowtable.

The one place that's currently a pain is how IPv4 redirects are
handled; you can turn them off with a sysctl.


-a

Vladislav Prodan

unread,
Sep 20, 2014, 8:48:25 PM9/20/14
to freebsd...@freebsd.org



--- Original message ---
From: "Rumen Telbizov" <telb...@gmail.com>
Date: 20 September 2014, 19:12:17



> Hello everyone,
>
> I am in the process of upgrading our main PF firewalls from 9.2-RC4 to
> 10.1-BETA1 (r271684) and as part of the process I have been testing the
> forwarding capability of FreeBSD 10 (pf firewall disabled) to have a
> base-line and find any bottlenecks on a 10GbE network.
>


Please show the output of these commands:

pciconf -lv | grep -A 4 ix\[0-9\]
netstat -m
ngctl list | wc -l
sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)'

--
Vladislav V. Prodan
System & Network Administrator
support.od.ua

Rumen Telbizov

unread,
Sep 21, 2014, 5:31:55 PM9/21/14
to freebsd...@freebsd.org, qin...@freebsd.org, km...@freebsd.org
Thank you for your answers Adrian and Vladislav.

Adrian:
I read this paper,
http://conferences.sigcomm.org/sigcomm/2009/workshops/presto/papers/p37.pdf,
and I left with the impression that the locking contentions on *rtentry*
have been solved some time around FreeBSD 8 release with the new routing
architecture and flowtable. I was wondering if this is really the case or
maybe I am dealing with an edge-case here. I cc Qing Li and Kip Macy for
further visibility and comments (original report at
http://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080170.html).

On the other hand https://wiki.freebsd.org/NetworkPerformanceTuning
advises: "*Do not use FLOWTABLE. It is still untested (2012-02-23).*" Is
that still the case? As mentioned previously I tried this kernel option
earlier and it had no effect.

Additionally, on https://wiki.freebsd.org/NewNetworking I saw that there
are still open items with regards to "*rtentry locking*" and "*Contention
between CPUs when forwarding between multi-queue interfaces*". Not quite
sure if this is what I am dealing with.

I also wonder if this lock contention is something new or I am dealing with
some strange edge-case. I read that people are able to push 10Gbit/s on
FreeBSD 9.2 (https://calomel.org/network_performance.html). Anybody else
seeing this around 4-5Gbit/s ?


Vladislav:
Here are the details that you requested (freshly booted system):
# pciconf -lv | grep -A 4 ix\[0-9\]
ix0@pci0:5:0:0: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
ix1@pci0:5:0:1: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet

# netstat -m
100358/50182/150540 mbufs in use (current/cache/total)
2048/47288/49336/1526116 mbuf clusters in use (current/cache/total/max)
2048/47287 mbuf+clusters out of packet secondary zone in use (current/cache)
0/7/7/763057 4k (page size) jumbo clusters in use (current/cache/total/max)
98300/11/98311/226091 9k jumbo clusters in use (current/cache/total/max)
0/0/0/127176 16k jumbo clusters in use (current/cache/total/max)
913885K/107248K/1021134K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

# ngctl list | wc -l
2

# sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)'
net.inet.tcp.rfc1323: 1
net.inet.tcp.mssdflt: 536
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
net.inet.tcp.keepinit: 75000
net.inet.tcp.delacktime: 100
net.inet.tcp.v6mssdflt: 1220
net.inet.tcp.cc.algorithm: newreno
net.inet.tcp.cc.available: newreno
net.inet.tcp.hostcache.cachelimit: 15360
net.inet.tcp.hostcache.hashsize: 512
net.inet.tcp.hostcache.bucketlimit: 30
net.inet.tcp.hostcache.count: 6
net.inet.tcp.hostcache.expire: 3600
net.inet.tcp.hostcache.prune: 300
net.inet.tcp.hostcache.purge: 0
net.inet.tcp.log_in_vain: 0
net.inet.tcp.blackhole: 0
net.inet.tcp.delayed_ack: 1
net.inet.tcp.drop_synfin: 0
net.inet.tcp.rfc3042: 1
net.inet.tcp.rfc3390: 1
net.inet.tcp.experimental.initcwnd10: 1
net.inet.tcp.rfc3465: 1
net.inet.tcp.abc_l_var: 2
net.inet.tcp.ecn.enable: 0
net.inet.tcp.ecn.maxretries: 1
net.inet.tcp.insecure_rst: 0
net.inet.tcp.recvbuf_auto: 1
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.recvbuf_max: 2097152
net.inet.tcp.path_mtu_discovery: 1
net.inet.tcp.tso: 1
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.sendbuf_inc: 8192
net.inet.tcp.sendbuf_max: 2097152
net.inet.tcp.reass.maxsegments: 95400
net.inet.tcp.reass.cursegments: 0
net.inet.tcp.reass.overflows: 0
net.inet.tcp.sack.enable: 1
net.inet.tcp.sack.maxholes: 128
net.inet.tcp.sack.globalmaxholes: 65536
net.inet.tcp.sack.globalholes: 0
net.inet.tcp.minmss: 216
net.inet.tcp.log_debug: 0
net.inet.tcp.tcbhashsize: 262144
net.inet.tcp.do_tcpdrain: 1
net.inet.tcp.pcbcount: 17
net.inet.tcp.icmp_may_rst: 1
net.inet.tcp.isn_reseed_interval: 0
net.inet.tcp.soreceive_stream: 0
net.inet.tcp.syncookies: 1
net.inet.tcp.syncookies_only: 0
net.inet.tcp.syncache.bucketlimit: 30
net.inet.tcp.syncache.cachelimit: 15375
net.inet.tcp.syncache.count: 0
net.inet.tcp.syncache.hashsize: 512
net.inet.tcp.syncache.rexmtlimit: 3
net.inet.tcp.syncache.rst_on_sock_fail: 1
net.inet.tcp.msl: 30000
net.inet.tcp.rexmit_min: 30
net.inet.tcp.rexmit_slop: 200
net.inet.tcp.always_keepalive: 1
net.inet.tcp.fast_finwait2_recycle: 0
net.inet.tcp.finwait2_timeout: 60000
net.inet.tcp.keepcnt: 8
net.inet.tcp.rexmit_drop_options: 0
net.inet.tcp.per_cpu_timers: 0
net.inet.tcp.timer_race: 0
net.inet.tcp.maxtcptw: 27767
net.inet.tcp.nolocaltimewait: 0
net.inet.udp.checksum: 1
net.inet.udp.maxdgram: 9216
net.inet.udp.recvspace: 42080
net.inet.udp.log_in_vain: 0
net.inet.udp.blackhole: 0
net.isr.dispatch: direct
net.isr.maxthreads: 1
net.isr.bindthreads: 0
net.isr.maxqlimit: 10240
net.isr.defaultqlimit: 256
net.isr.maxprot: 16
net.isr.numthreads: 1
net.graph.threads: 12
net.graph.maxalloc: 4096
net.graph.maxdata: 512
net.graph.abi_version: 12
net.graph.msg_version: 8
net.graph.maxdgram: 20480
net.graph.recvspace: 20480
net.graph.family: 32
net.graph.data.proto: 1
net.graph.control.proto: 2

Once again, I am ready to provide additional metrics and run more tests
upon request.

Thank you,
Rumen Telbizov

K. Macy

unread,
Sep 21, 2014, 6:08:39 PM9/21/14
to Rumen Telbizov, Tom Elite, freebsd...@freebsd.org
What you're dealing with is hardly an edge case. Most people don't need to
push more than a couple of Gbps in production.

Flowtable is hardly "untested." However, it has been a source of friction
at times because it can be somewhat brittle, having limits on the number of
cache entries that it can store that are frequently too low for people with
very large numbers of active flows. Without raising this limit
substantially these systems will fail in a rather spectacular fashion.
Additionally, flowtable was not written with the intent of being a routing
cache. It was developed to support stateful multipath routing for load
balancing. In its current incarnation, stripped of much of the code for its
initial purpose, it's really just a band-aid around locking problems in
routing. That said, the handful of commercial users of FreeBSD that do have
large amounts of traffic (10s of Gbps) per system that I personally know of
all have flowtable enabled.

Unfortunately, at least in terms of what is in HEAD, little has been done
to fix the contention that flowtable works around. For your purposes the
response that Adrian gave you is the closest to "optimal."

I hope that helps.
-K

Vladislav Prodan

unread,
Sep 21, 2014, 6:20:10 PM9/21/14
to Rumen Telbizov, freebsd...@freebsd.org



--- Original message ---
From: "Rumen Telbizov" <telb...@gmail.com>
Date: 22 September 2014, 00:31:51



> Thank you for your answers Adrian and Vladislav.
>
> Vladislav:

> Here are the details that you requested (freshly booted system):
> Once again, I am ready to provide additional metrics and run more tests
> upon request.
>
> Thank you,
> Rumen Telbizov


1) Try to add to the config /boot/loader.conf :
net.isr.defaultqlimit=2048
net.isr.maxqlimit=40960

2) Try to add to the config /etc/sysctl.conf :
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.sendbuf_inc=524288
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.sendspace=65536

3) On the network card turn off TSO, rxcsum and txcsum
ifconfig ix0 -rxcsum -tso -txcsum lro


--
Vladislav V. Prodan
System & Network Administrator
support.od.ua

Oliver Pinter

unread,
Sep 21, 2014, 6:51:32 PM9/21/14
to Rumen Telbizov, freebsd...@freebsd.org
> ..............|.............................................
> 648.60 ..............|.............................................
> 548.21
> ..............|.............................................
> 540.50 ..............|.............................................
> 438.57
> ...........|..|.........................|.....|.......|..|..
> 432.40 ...........|..|.........................|.....|.......|..|..
> 328.93
> |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
> 324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
> 219.29
> |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 109.64
> |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 1 5 10 15 20 25 30 35 40 45 50 55
> 60 1 5 10 15 20 25 30 35 40 45 50
> 55 60
> K (RX
> Packets/second) K
> (TX Packets/second)
> 670.41
> ..............|.............................................
> 660.27 ..............|.............................................
> 558.67
> ..............|.............................................
> 550.22 ..............|.............................................
> 446.94
> ...........|..|.........................|.....|.......|..|..
Try to reenable the idle ticks. I observed similar problem with em driver.

sysctl kern.eventtimer.idletick=1
sysctl kern.eventtimer.periodic=1

Adrian Chadd

unread,
Sep 22, 2014, 2:29:41 AM9/22/14
to K. Macy, Rumen Telbizov, Tom Elite, freebsd...@freebsd.org
Hi!

On 21 September 2014 15:08, K. Macy <km...@freebsd.org> wrote:
> What you're dealing with is hardly an edge case. Most people don't need to
> push more than a couple of Gbps in production.
>
> Flowtable is hardly "untested." However, it has been a source of friction
> at times because it can be somewhat brittle, having limits on the number of
> cache entries that it can store that are frequently too low for people with
> very large numbers of active flows. Without raising this limit
> substantially these systems will fail in a rather spectacular fashion.
> Additionally, flowtable was not written with the intent of being a routing
> cache. It was developed to support stateful multipath routing for load
> balancing. In its current incarnation, stripped of much of the code for its
> initial purpose, it's really just a band-aid around locking problems in
> routing. That said, the handful of commercial users of FreeBSD that do have
> large amounts of traffic (10s of Gbps) per system that I personally know of
> all have flowtable enabled.
>
> Unfortunately, at least in terms of what is in HEAD, little has been done
> to fix the contention that flowtable works around. For your purposes the
> response that Adrian gave you is the closest to "optimal."

Gleb and I spent a bunch of time late last year and early this year
finding and fixing a lot of the corner cases with Flowtable handling.

I'm pretty sure that it'll behave predictably and reliably for people
now. If it doesn't then please file PRs. It's no longer some corner of
the codebase that isn't well understood. At least two people besides
the author (Gleb and I) understand what it is, how it works and how it
ties into things.

What's missing is someone sitting down and adding flowtable support to
the rest of the forwarding path(s). It shouldn't be too hard - as long
as you have an mbuf that has the IPv4/IPv6 header setup as that's what
flowtable currently expects when doing lookups - but it has to be
done.

So I thoroughly recommend someone who has the test setup and the
desire to do so and post results. I have enough equipment now to test
this out and develop it but I'm really busy doing work, wireless RSS
stuff. So I just don't have the spare cycles to do it.

I do think it'll be a pretty simple task.

In theory - once you have the flowtable code working correctly in the
forwarding path you shouldn't see any rtentry lock contention except
during route changes (which will invalidate flowtable entries and
cause normal routing table lookups until the flowtable has all the
route entries in question.)



-a

Rumen Telbizov

unread,
Sep 22, 2014, 2:11:23 PM9/22/14
to Adrian Chadd, Tom Elite, freebsd...@freebsd.org, K. Macy
Thank you all for the answers and directions.

I tried all of the suggested changes to sysctl.conf and loader.conf
settings above - they made no difference whatsoever. I believe they might
help in certain situations but will improve things marginally. What I am
dealing with is a pretty hard and sudden drop of performance due to lock
contention.

Adrian:
What you're saying makes sense. If we can avoid the locking completely,
this problem might go away. I'll see if I can get some help and try to
tackle this challenge.

Cheers,
Rumen Telbizov
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>

Alexander V. Chernikov

unread,
Sep 30, 2014, 9:13:40 AM9/30/14
to Rumen Telbizov, freebsd...@freebsd.org
On 20.09.2014 20:12, Rumen Telbizov wrote:
> Hello everyone,
>
> I am in the process of upgrading our main PF firewalls from 9.2-RC4 to
> 10.1-BETA1 (r271684) and as part of the process I have been testing the
> forwarding capability of FreeBSD 10 (pf firewall disabled) to have a
> base-line and find any bottlenecks on a 10GbE network.
Can you try to split default route into 2 (0/1, 128/1) and see if there
are any difference?
Can you share "pmcstat -TS instructions -w1" ?
There are some things that can be improved in case of lagg, are you OK
with testing some patches?
> ..............|.............................................
> 648.60 ..............|.............................................
> 548.21
> ..............|.............................................
> 540.50 ..............|.............................................
> 438.57
> ...........|..|.........................|.....|.......|..|..
> 432.40 ...........|..|.........................|.....|.......|..|..
> 328.93
> |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
> 324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
> 219.29
> |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 109.64
> |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
> 1 5 10 15 20 25 30 35 40 45 50 55
> 60 1 5 10 15 20 25 30 35 40 45 50
> 55 60
> K (RX
> Packets/second) K
> (TX Packets/second)
> 670.41
> ..............|.............................................
> 660.27 ..............|.............................................
> 558.67
> ..............|.............................................
> 550.22 ..............|.............................................
> 446.94
> ...........|..|.........................|.....|.......|..|..
0 new messages