Strange packet drops with heavy firewalling

Benny Amorsen

unread,

Apr 9, 2010, 5:56:06 AM4/9/10

to net...@vger.kernel.org

I have a netfilter-box which is dropping packets. ethtool -S counts
10-20 rx_discards per second on the interface.

The switch does not have flow control enabled; with flow control enabled
the rx_discards turn into tx_on_sent which ultimately cause the same
problem (the load is pretty constant so the switch has to drop the
packets instead).

perf top shows something like:
5201.00 - 6.7% : _spin_unlock_irqrestore
4232.00 - 5.5% : finish_task_switch
3597.00 - 4.6% : tg3_poll [tg3]
3257.00 - 4.2% : handle_IRQ_event
2515.00 - 3.2% : tick_nohz_restart_sched_tick
1947.00 - 2.5% : nf_ct_tuple_equal
1927.00 - 2.5% : tg3_start_xmit [tg3]
1879.00 - 2.4% : kmem_cache_alloc_node
1625.00 - 2.1% : tick_nohz_stop_sched_tick
1619.00 - 2.1% : ipt_do_table
1595.00 - 2.1% : ip_route_input
1547.00 - 2.0% : kmem_cache_free
1474.00 - 1.9% : __alloc_skb
1424.00 - 1.8% : fget_light
1391.00 - 1.8% : nf_iterate

The rule set is quite large (more than 4000 rules), but organized so
that each packet only has to traverse a few rules before getting
accepted or rejected.

When the problem started we were using a different server, an old
two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on
one CPU with that server. After replacing the server with a ProLiant
DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage
rarely exceeds 10% on any CPU, but the packet loss persists.

We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit
Ethernet PCI Express nics, and the kernel is
kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably
installing a better ethernet card, perhaps an Intel 82576-based one, so
that we can get multiqueue support.

The traffic is about 300Mbps (twice that if you count both in and out,
like Cisco).

/Benny

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Eric Dumazet

unread,

Apr 9, 2010, 7:47:42 AM4/9/10

to Benny Amorsen, net...@vger.kernel.org

might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
RX ring from 200 to 511 if you want more buffers ?)

> We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit
> Ethernet PCI Express nics, and the kernel is
> kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably
> installing a better ethernet card, perhaps an Intel 82576-based one, so
> that we can get multiqueue support.
>

Sure, but before this, could you check

cat /proc/net/softnet_stat
cat /proc/interrupts
(check eth0 IRQS are delivered to one cpu)

grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*
(might need to increase ip_conntrack_buckets)

ethtool -c eth0
(might change coalesce params to reduce number of irqs)

ethtool -g eth0

Benny Amorsen

unread,

Apr 9, 2010, 8:33:31 AM4/9/10

to Eric Dumazet, net...@vger.kernel.org

Eric Dumazet <eric.d...@gmail.com> writes:

> might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
> RX ring from 200 to 511 if you want more buffers ?)

I tried that already actually. (I didn't expect it to cause traffic
interruption, but it did. Oh well.)

It didn't make a difference, at least not one I could detect from the
number of packet drops and the CPU utilization.

> cat /proc/net/softnet_stat

000002d9 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
42bc8143 00000000 0000024c 00000000 00000000 00000000 00000000 00000000 00000000
0000031b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1c5a35e9 00000000 000005f7 00000000 00000000 00000000 00000000 00000000 00000000

I am not quite sure how to interpret that...

> cat /proc/interrupts

79: 1240 4050590849 1253 1263 PCI-MSI-edge eth0
80: 12 9 14 3613521843 PCI-MSI-edge eth1

> (check eth0 IRQS are delivered to one cpu)

Yes CPU1 handles eth0 and CPU3 handles eth1.

> grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*

nf_conntrack_acct:1
nf_conntrack_buckets:8192
nf_conntrack_checksum:1
nf_conntrack_count:49311
nf_conntrack_events:1
nf_conntrack_events_retry_timeout:15
nf_conntrack_expect_max:2048
nf_conntrack_generic_timeout:600
nf_conntrack_icmp_timeout:30
nf_conntrack_log_invalid:1
nf_conntrack_max:1048576
nf_conntrack_tcp_be_liberal:0
nf_conntrack_tcp_loose:1
nf_conntrack_tcp_max_retrans:3
nf_conntrack_tcp_timeout_close:10
nf_conntrack_tcp_timeout_close_wait:60
nf_conntrack_tcp_timeout_established:432000
nf_conntrack_tcp_timeout_fin_wait:120
nf_conntrack_tcp_timeout_last_ack:30
nf_conntrack_tcp_timeout_max_retrans:300
nf_conntrack_tcp_timeout_syn_recv:60
nf_conntrack_tcp_timeout_syn_sent:120
nf_conntrack_tcp_timeout_time_wait:120
nf_conntrack_tcp_timeout_unacknowledged:300
nf_conntrack_udp_timeout:30
nf_conntrack_udp_timeout_stream:180

> (might need to increase ip_conntrack_buckets)

You got me there. I had forgotten nf_conntrack.hashsize=1048576
and nf_conntrack.expect_hashsize=32768 on the kernel command line. It
was on the hot standby firewall, but not on the primary one. I will do a
failover to the hot standby sometime during the weekend.

It still isn't possible to change without a reboot, is it?

> ethtool -c eth0
> (might change coalesce params to reduce number of irqs)

Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 20
rx-frames: 5
rx-usecs-irq: 0
rx-frames-irq: 5

tx-usecs: 72
tx-frames: 53
tx-usecs-irq: 0
tx-frames-irq: 5

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

I played quite a lot with the parameters but it did not seem to make any
difference. I didn't try adaptive though, but the load is fairly static
so it didn't seem appropriate.

> ethtool -g eth0

Ring parameters for eth0:
Pre-set maximums:
RX: 511
RX Mini: 0
RX Jumbo: 0
TX: 511
Current hardware settings:
RX: 200
RX Mini: 0
RX Jumbo: 0
TX: 511

Right now RX is 200, but when it was 511 it didn't seem to make a
difference.

Thank you very much for the help! I will report back whether it was the
hash buckets.

/Benny

Eric Dumazet

unread,

Apr 9, 2010, 9:29:22 AM4/9/10

to Benny Amorsen, net...@vger.kernel.org

Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :

> Thank you very much for the help! I will report back whether it was the
> hash buckets.

OK

You could try :

ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100

(to reduce tx completion irqs)

Before buying multiqueue devices, you also could try net-next-2.6 kernel,
because RPS (Remote Packet Steering) is in.

In your setup, this might help a bit, distribute the packets to all cpus,
with appropriate cache handling.

Benny Amorsen

unread,

Apr 12, 2010, 2:20:46 AM4/12/10

to Eric Dumazet, net...@vger.kernel.org

Eric Dumazet <eric.d...@gmail.com> writes:

> Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :
>
>> Thank you very much for the help! I will report back whether it was the
>> hash buckets.
>
> OK
>
> You could try :
>
> ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
> ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100
>
> (to reduce tx completion irqs)

Alas, even with the hash buckets I still have the same problem. Perhaps
slightly less severe, but it's still there.

I implemented the other changes you suggested as well except for the
ethtool -G. I may try to switch to net-next if I can find an easy way to
make an RPM out of it.

Thank you for the help!

/proc/sys/net/netfilter/nf_conntrack_acct:1
/proc/sys/net/netfilter/nf_conntrack_buckets:1048576
/proc/sys/net/netfilter/nf_conntrack_checksum:1
/proc/sys/net/netfilter/nf_conntrack_count:43430
/proc/sys/net/netfilter/nf_conntrack_events:1
/proc/sys/net/netfilter/nf_conntrack_events_retry_timeout:15
/proc/sys/net/netfilter/nf_conntrack_expect_max:2048
/proc/sys/net/netfilter/nf_conntrack_generic_timeout:600
/proc/sys/net/netfilter/nf_conntrack_icmp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_log_invalid:1
/proc/sys/net/netfilter/nf_conntrack_max:1048576
/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal:0
/proc/sys/net/netfilter/nf_conntrack_tcp_loose:1
/proc/sys/net/netfilter/nf_conntrack_tcp_max_retrans:3
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close:10
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established:432000
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_last_ack:30
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_max_retrans:300
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_recv:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_sent:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_unacknowledged:300
/proc/sys/net/netfilter/nf_conntrack_udp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream:180

/Benny

Benny Lyne Amorsen

unread,

Apr 12, 2010, 10:44:39 AM4/12/10

to zhigang gong, net...@vger.kernel.org

man, 12 04 2010 kl. 16:16 +0800, skrev zhigang gong:

> How do you know the per CPU usage data, by oprofile? I'm just a little
> surprised with the result, as it shows your new core is running 10x
> faster than your old core :).

Well the old server had only two CPU's plus hyperthreading, and the
CPU's were Pentium-4-based. Add a slow memory bus to that and you have a
fairly slow system. It's almost 5 years old, so Moore's law says 2**3
increase in number of transistors...

In about the same time frame Linux has gone from being able to fill
1Gbps ethernet to being able to fill 10Gbps ethernet

> What's the average packet size?

I asked the switch (I can't find a handy equivalent to ifstat which
counts packets instead of bytes). The 5 minute average packet sizes seem
to vary in the range 450 to 550 bytes.

> If your packet size is 64 bytes, then the pps(packet per second) rate
> should be about 585Kpps. As I know, this value is almost the best
> result when the standard linux kernel is processing the networking
> traffic with a normal 1Gb ethernet card (without multi-queue support)
> on a intel box. If it is the case, to buy a better ethernet card with
> multi-queue support should be a good choice. Otherwise, it may not
> help.

I am far from that, perhaps 1/10th of that. I do a lot more processing
on at least some of the packets though (the ones starting new flows).

Benny Amorsen

unread,

Apr 12, 2010, 1:06:30 PM4/12/10

to zhigang gong, net...@vger.kernel.org

man, 12 04 2010 kl. 23:33 +0800, skrev zhigang gong:

>
> Now, I agree with Eric's analysis, there may be some bursts, for
> example a burst of a bunch of first packets for different new flows.
> What mode are you using the ethernet driver in? I guess it's NAPI,
> right?

I presume so.

> And whether your time consumption workload is handled in soft-irq
> context or in a user space process?

soft-irq, the box is doing pure iptables. The only time it does a little
bit of user-space work is when I use conntrackd, but killing conntrackd
does not affect the packet loss measurably.

I switched to a 82576-based card, and now I get:

3341.00 - 4.9% : _spin_lock
2506.00 - 3.7% : irq_entries_start
2163.00 - 3.2% : _spin_lock_irqsave
1616.00 - 2.4% : native_read_tsc
1572.00 - 2.3% : igb_poll [igb]
1386.00 - 2.0% : get_partial_node
1236.00 - 1.8% : igb_clean_tx_irq [igb]
1205.00 - 1.8% : igb_xmit_frame_adv [igb]
1170.00 - 1.7% : ipt_do_table
1049.00 - 1.6% : fget_light
1015.00 - 1.5% : tick_nohz_stop_sched_tick
967.00 - 1.4% : fput
945.00 - 1.4% : __slab_free
919.00 - 1.4% : datagram_poll
874.00 - 1.3% : dev_queue_xmit

And it seems the packet loss is gone!

# ethtool -S eth0|fgrep drop
tx_dropped: 0
rx_queue_drop_packet_count: 0
dropped_smbus: 0
rx_queue_0_drops: 0
rx_queue_1_drops: 0
rx_queue_2_drops: 0
rx_queue_3_drops: 0

I'm a bit surprised by this though:

99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0
100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1
101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2
102: 0 0 0 0 PCI-MSI-edge eth1-tx-3
103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0
104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1
105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2
106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3
107: 0 0 1 0 PCI-MSI-edge eth1
108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0
109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1
110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2
111: 0 0 0 0 PCI-MSI-edge eth0-tx-3
112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0
113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1
114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2
115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3
116: 0 1 0 0 PCI-MSI-edge eth0

irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
which to my mind should cause the same problem as before (where CPU1 and
CPU3 was handling all packets). Yet the box clearly works much better
than before.

Anyway, this brings the saga to an end from my point of view. Thank you
very much for looking into this, you and Eric Dumazet have been
invaluable!

Eric Dumazet

unread,

Apr 13, 2010, 1:56:26 AM4/13/10

to Changli Gao, Benny Amorsen, zhigang gong, net...@vger.kernel.org

Le mardi 13 avril 2010 à 07:18 +0800, Changli Gao a écrit :

> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen <benny+...@amorsen.dk> wrote:
> >
> > 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0
> > 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1
> > 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2
> > 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3
> > 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0
> > 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1
> > 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2
> > 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3
> > 107: 0 0 1 0 PCI-MSI-edge eth1
> > 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0
> > 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1
> > 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2
> > 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3
> > 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0
> > 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1
> > 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2
> > 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3
> > 116: 0 1 0 0 PCI-MSI-edge eth0
> >
> >
> > irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> > which to my mind should cause the same problem as before (where CPU1 and
> > CPU3 was handling all packets). Yet the box clearly works much better
> > than before.
>

> irqbalanced? I don't think it can work properly. Try RPS in netdev and
> linux-next tree, and if cpu load isn't even, try this patch:
> http://patchwork.ozlabs.org/patch/49915/ .
>
>

Dont try RPS on multiqueue devices !

If number of queues matches CPU numbers, it brings nothing but extra
latencies !

Benny, I am not sure your irqbalance is up2date with multiqueue devices,
you might need to disable it and manually irqaffine each interrupt

echo 01 >/proc/irq/100/smp_affinity
echo 02 >/proc/irq/101/smp_affinity
echo 04 >/proc/irq/102/smp_affinity
echo 08 >/proc/irq/103/smp_affinity
echo 10 >/proc/irq/104/smp_affinity
echo 20 >/proc/irq/105/smp_affinity
echo 40 >/proc/irq/106/smp_affinity
echo 80 >/proc/irq/107/smp_affinity

echo 01 >/proc/irq/108/smp_affinity
echo 02 >/proc/irq/109/smp_affinity
echo 04 >/proc/irq/110/smp_affinity
echo 08 >/proc/irq/111/smp_affinity
echo 10 >/proc/irq/112/smp_affinity
echo 20 >/proc/irq/113/smp_affinity
echo 40 >/proc/irq/114/smp_affinity
echo 80 >/proc/irq/115/smp_affinity

Benny Amorsen

unread,

Apr 13, 2010, 3:56:25 AM4/13/10

to Eric Dumazet, Changli Gao, zhigang gong, net...@vger.kernel.org

Eric Dumazet <eric.d...@gmail.com> writes:

> Benny, I am not sure your irqbalance is up2date with multiqueue devices,
> you might need to disable it and manually irqaffine each interrupt

True, that would probably help. Irqbalance might just believe that the
load is so low that it isn't worth rebalancing. The CPU's are spending
more than 90% of their time idling.

I'll keep monitoring the server, and if it starts dropping packets again
or load increases I'll check whether irqbalanced does the right thing,
and if not I'll implement your suggestion.

Thank you very much!

/Benny

Eric Dumazet

unread,

Apr 13, 2010, 8:53:04 AM4/13/10

to Paweł Staszewski, Changli Gao, Benny Amorsen, zhigang gong, net...@vger.kernel.org

Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit :
> W dniu 2010-04-13 01:18, Changli Gao pisze:

> > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+...@amorsen.dk> wrote:
> >

> >> 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0
> >> 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1
> >> 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2
> >> 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3
> >> 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0
> >> 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1
> >> 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2
> >> 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3
> >> 107: 0 0 1 0 PCI-MSI-edge eth1
> >> 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0
> >> 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1
> >> 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2
> >> 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3
> >> 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0
> >> 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1
> >> 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2
> >> 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3
> >> 116: 0 1 0 0 PCI-MSI-edge eth0
> >>
> >>
> >> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> >> which to my mind should cause the same problem as before (where CPU1 and
> >> CPU3 was handling all packets). Yet the box clearly works much better
> >> than before.
> >>

> > irqbalanced? I don't think it can work properly. Try RPS in netdev and
> > linux-next tree, and if cpu load isn't even, try this patch:
> > http://patchwork.ozlabs.org/patch/49915/ .
> >
> >
> >

> Yes without irqbalance - and with irq affinity set by hand router will
> work much better.
>
> But I don't think that RPS will help him - I make some tests with RPS
> and AFFINITY - results in attached file.
> Test router make traffic management (hfsc) for almost 9k users

Thanks for sharing Pawel.

But obviously you are mixing apples and oranges.

Are you aware that HFSC and other trafic shapers do serialize access to
data structures ? If many cpus try to access these structures in //, you
have a lot of cache line misses. HFSC is a real memory hog :(

Benny do have firewalling (highly parallelized these days, iptables was
well improved in this area), but no traffic control.

Anyway, Benny has now multiqueue devices, and therefore RPS will not
help him. I suggested RPS before his move to multiqueue, and multiqueue
is the most sensible way to improve things, when no central lock is
used. Every cpu can really work in //.

Paweł Staszewski

unread,

Apr 13, 2010, 9:39:06 AM4/13/10

to Eric Dumazet, Changli Gao, Benny Amorsen, zhigang gong, net...@vger.kernel.org

W dniu 2010-04-13 14:53, Eric Dumazet pisze:

Thanks Eric for explanation why RPS is useless for traffic management
routers.

> Benny do have firewalling (highly parallelized these days, iptables was
> well improved in this area), but no traffic control.
>
>

Hmm so maybe better choice for traffic management is use iptables for
"filter classification" instead of "u32 filters"- something like
iptables CLASSIFY target

Benny Amorsen

unread,

Apr 15, 2010, 9:23:50 AM4/15/10

to Eric Dumazet, Changli Gao, zhigang gong, net...@vger.kernel.org

Benny Amorsen <benny+...@amorsen.dk> writes:

> I'll keep monitoring the server, and if it starts dropping packets again
> or load increases I'll check whether irqbalanced does the right thing,
> and if not I'll implement your suggestion.

It did start dropping packets (although very few, a few packets dropped
at once perhaps every ten minutes). Irqbalanced didn't move the
interrupts.

Doing

echo 01 >/proc/irq/99/smp_affinity
echo 02 >/proc/irq/100/smp_affinity
echo 04 >/proc/irq/101/smp_affinity

and so on like Erik Dumazet suggested seems to have helped, but not
entirely solved the problem.

The problem now manifests itself this way in ethtool -S:
rx_no_buffer_count: 270
rx_queue_drop_packet_count: 270

I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
of course, but it is a bit strange that a machine which can do 200Mbps
at 92% idle can't handle subsecond peaks close to 1Gbps...

I wish ifstat could report errors so I could see what the traffic rate
was when the problem occurred...

Eric Dumazet

unread,

Apr 15, 2010, 9:42:53 AM4/15/10

to Benny Amorsen, Changli Gao, zhigang gong, net...@vger.kernel.org

Le jeudi 15 avril 2010 à 15:23 +0200, Benny Amorsen a écrit :
> Benny Amorsen <benny+...@amorsen.dk> writes:
>
> > I'll keep monitoring the server, and if it starts dropping packets again
> > or load increases I'll check whether irqbalanced does the right thing,
> > and if not I'll implement your suggestion.
>
> It did start dropping packets (although very few, a few packets dropped
> at once perhaps every ten minutes). Irqbalanced didn't move the
> interrupts.
>
> Doing
>
> echo 01 >/proc/irq/99/smp_affinity
> echo 02 >/proc/irq/100/smp_affinity
> echo 04 >/proc/irq/101/smp_affinity
>
> and so on like Erik Dumazet suggested seems to have helped, but not
> entirely solved the problem.
>
> The problem now manifests itself this way in ethtool -S:
> rx_no_buffer_count: 270
> rx_queue_drop_packet_count: 270
>
> I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
> of course, but it is a bit strange that a machine which can do 200Mbps
> at 92% idle can't handle subsecond peaks close to 1Gbps...
>

Even with multiqueue, its quite possible one queue gets more than one
packet per micro second. Time to process a packet might be greater then
1 us even on recent hardware. So bursts of 1000 small packets with same
flow information, hit one queue, one cpu, and fill rx ring.

Loosing these packets is OK, its very likely its an attack :)

> I wish ifstat could report errors so I could see what the traffic rate
> was when the problem occurred...

yes, it could be added I guess.

Reply all

Reply to author

Forward