Bizarre Problem: Packets being dropped at edge routers

18 views
Skip to first unread message

Jacob Chappell

unread,
Aug 12, 2015, 4:05:59 PM8/12/15
to GENI Users
Hello, everyone.

I'm facing an interesting and bizarre problem with a slice, OVS, and default Linux kernel routing. I have attached an image of my topology as well as my manifest.

The problem is as follows. Packets are being dropped at the edge routers in my topology (srcr, dstr). Here are the results of some pings.

h1 ping srcr = SUCCESS
h1 ping hb = SUCCESS
h1 ping hlhb = SUCCESS
h1 ping lb = SUCCESS
h1 ping dstr = FAILED
h1 ping h2 = FAILED

h2 ping dstr = SUCCESS
h2 ping hb = SUCCESS
h2 ping hlhb = SUCCESS
h2 ping lb = SUCCESS
h2 ping srcr = FAILED
h2 ping h1 = FAILED

As you can see, the problem is symmetric. When pinging from the h1 side, packets are eaten by dstr (but make it through srcr). When pinging from the h2 side, packets are eaten by srcr (but make it through dstr).

Based on my debugging, I believe this problem is happening somewhere in the Linux kernel. To support this, I will provide more details about my configuration. All routers in my topology (srcr, hb, hlhb, lb, dstr) have the same configuration. It can be summarized as follows:

1. Each is running OVS 2.3.1.
2. There is one OVS bridge per physical link (eth1, eth2, etc.), with the physical link attached to that bridge, and the IP address of the link moved from the interface to the bridge (see <http://groups.geni.net/geni/wiki/HowTo/ConfigureOVSWithLayer3Routing>).
3. All default routes installed by GENI are preserved with ethX replaced by br-ethX (the naming convention of my bridges, see routes below).
4. Each OVS bridge has the following two rules installed, which redirect all traffic to and from the Linux kernel:
 cookie=0x0, duration=X, table=0, n_packets=X, n_bytes=X, idle_age=X, priority=1,in_port=1 actions=LOCAL
 cookie=0x0, duration=X, table=0, n_packets=X, n_bytes=X, idle_age=X, priority=1,in_port=LOCAL actions=output:1

Here is the output of the routing tables for srcr and dstr, just for reference.

Routing table of srcr:
default via 172.16.0.1 dev eth0
10.0.0.0/8 via 10.10.7.2 dev br-eth2  proto static
10.10.1.0/24 dev br-eth1  proto kernel  scope link  src 10.10.1.2
10.10.2.0/24 dev br-eth3  proto kernel  scope link  src 10.10.2.1
10.10.3.2/31 via 10.10.2.2 dev br-eth3  proto static
10.10.5.0/24 dev br-eth4  proto kernel  scope link  src 10.10.5.2
10.10.6.0/31 via 10.10.5.1 dev br-eth4  proto static
10.10.7.0/24 dev br-eth2  proto kernel  scope link  src 10.10.7.1
172.16.0.0/12 dev eth0  proto kernel  scope link  src 172.17.5.26

Routing table of dstr:
10.0.0.0/8 via 10.10.6.1 dev br-eth2  proto static
10.10.2.0/31 via 10.10.6.1 dev br-eth2  proto static
10.10.2.0/23 via 10.10.3.2 dev br-eth3  proto static
10.10.3.0/24 dev br-eth3  proto kernel  scope link  src 10.10.3.1
10.10.4.0/24 dev br-eth4  proto kernel  scope link  src 10.10.4.2
10.10.6.0/24 dev br-eth2  proto kernel  scope link  src 10.10.6.2
10.10.6.0/23 via 10.10.8.1 dev br-eth1  proto static
10.10.7.0/31 via 10.10.6.1 dev br-eth2  proto static
10.10.8.0/24 dev br-eth1  proto kernel  scope link  src 10.10.8.2
172.16.0.0/12 dev eth0  proto kernel  scope link  src 172.17.5.24

Let us suppose we are pinging from h1 to h2. I am able to verify that the ping packet makes it to dstr by running tcp dump on the incoming interface:
sudo tcpdump -i br-eth1 icmp
16:03:26.158032 IP H1-lan0 > H2-lan3: ICMP echo request, id 18466, seq 13, length 64
16:03:27.165882 IP H1-lan0 > H2-lan3: ICMP echo request, id 18466, seq 14, length 64
16:03:28.173689 IP H1-lan0 > H2-lan3: ICMP echo request, id 18466, seq 15, length 64
...

I am also able to check the packet counter for the OVS rule on br-eth1 which shows a match, indicating that the ping packet is being sent to LOCAL (the Linux kernel). However, the packet never comes out of any of the other interfaces. I run the same tcpdump command on br-eth2, br-eth3, br-eth4, and even eth0 (the control interface, which the default route forwards to). Nothing. This seems to indicate that the Linux kernel is eating the packet.

Does anyone have any intuitions about what might be going on here? I am completely stumped.

Please let me know if you need anymore details or want me to try anything else.

Thanks,
Jacob Chappell
rspec.xml
topology.png

Sarah Edwards

unread,
Aug 12, 2015, 5:50:14 PM8/12/15
to geni-...@googlegroups.com, Sarah Edwards, chapp...@gmail.com
Hi Jacob,

First, thank you for all of the detail as it is very helpful.

Second, this is an interesting question and there is a lot to potentially go wrong here.

In general, when I see an issue with traffic being dropped in a multi-homed system running Ubuntu I immediately think about reverse path filtering [1].   I'm not saying that's what's happening here, but this is a frequent enough problem that I would feel remiss to not mention it.   So check out  [1] and see if that seems relevant to you.

Have you tried any simpler topologies?  If not, keep your current setup, and then in a new slice try the following to see if you still see packet loss:
 * A linear topology of H1 - srcr - dstr - H2 (or maybe H1 - srcr - HB - dstr - H2 but I think simpler is better)
 * The above with two paths between srcr and dstr (this may be tricky to draw in Jacks, if you need help let me know)

If neither of those yield insight, and if no one else has any thoughts, we can get someone added to your slice to poke around some.

Thanks,

Sarah



--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<rspec.xml><topology.png>

*******************************************************************************
Sarah Edwards
GENI Project Office

BBN Technologies
Cambridge, MA
phone:    (617) 873-2329
email:    sedw...@bbn.com





Sarah Edwards

unread,
Aug 12, 2015, 6:31:33 PM8/12/15
to geni-...@googlegroups.com, Sarah Edwards, chapp...@gmail.com
Hi Jacob,

Some more questions for context:
 * What is your overarching goal?  
 * You are running OVS, but don't mention why you are using OVS.  Do you have a controller?  If so, what is the controller doing?
 * How do you want the traffic to flow through your network?

Thanks,
Sarah

Nicholas Bastin

unread,
Aug 12, 2015, 6:33:40 PM8/12/15
to geni-...@googlegroups.com
On Wed, Aug 12, 2015 at 10:05 AM, Jacob Chappell <chapp...@gmail.com> wrote:
> I'm facing an interesting and bizarre problem with a slice, OVS, and default
> Linux kernel routing. I have attached an image of my topology as well as my
> manifest.

So first, regardless of all the things I will ask below, why are you
using OVS? You seem to want to use OVS to do a thing that linux will
just do by default, but make it more difficult and complicated.

> Routing table of dstr:
> 10.0.0.0/8 via 10.10.6.1 dev br-eth2 proto static
> 10.10.2.0/31 via 10.10.6.1 dev br-eth2 proto static
> 10.10.2.0/23 via 10.10.3.2 dev br-eth3 proto static
> 10.10.3.0/24 dev br-eth3 proto kernel scope link src 10.10.3.1
> 10.10.4.0/24 dev br-eth4 proto kernel scope link src 10.10.4.2

Linux isn't really built to make this work. As per your manifest your
interfaces are all assigned /24 subnets. Setting a /31 route on a
different interface will not work in most cases - linux ARP will need
to know which interface to answer on, and you have confused it. As a
result, the binding will be for the "first" interface used, and then
flip-flop, but never function on both at the same time.

> Let us suppose we are pinging from h1 to h2. I am able to verify that the
> ping packet makes it to dstr by running tcp dump on the incoming interface:

What do your arp tables look like on all the vms in question? (h1/h2,
and srcr/dstr) If you ask for the arp table while traffic is flowing,
does it change?

--
Nick

Jacob Chappell

unread,
Aug 12, 2015, 7:10:36 PM8/12/15
to geni-...@googlegroups.com
Hi Nick and Sarah,

I'll have to get back to you on the whole ARP thing, but I can go ahead and answer some of your questions now.

I omitted my reasoning for using OVS, because I didn't think it was relevant and didn't want to overcomplicate the matter. There is another rule in all the switches I didn't tell you about. It sends all UDP traffic to our OpenFlow controller for processing. The controller may install more specific rules that modify fields in the packet.

The reason the switches are setup the way they are is _precisely_ so that we can use Linux for routing and handling normal traffic. Before we can test the controller specific stuff though, we need to make sure normal traffic (ping, TCP traffic, etc.) can make it through as usual.

Regarding the weirdness with the routes, GENI set those up upon slice creation. All I did was change ethX to br-ethX and add proto static so that they wouldn't disappear in case of reboot. I agree a couple of the routes are weird and maybe even unnecessarily complicated. The difference between the /23 and /31 routes seems to be a round-about/obfuscated way of matching 10.10.2.1 and 10.10.2.2 differently.

Lastly, I want to mention that this has worked in the past on numerous experiments that were IPv6 based. Our process was exactly the same, except we copied GENI's IPv4 routes/addresses and inserted equivalent IPv6 routes/addresses. The latest experiment I ran where this worked on IPv6 was Tuesday.

Thanks for all of your help!
Jacob Chappell
> --
> GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
>
> If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
> ---
> You received this message because you are subscribed to a topic in the Google Groups "GENI Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/geni-users/QYWULn-KKgE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to geni-users+...@googlegroups.com.

Jacob Chappell

unread,
Aug 13, 2015, 12:04:47 PM8/13/15
to geni-...@googlegroups.com
Another update for some of yesterday's questions.

Nick,

Here is the arp table of dstr before h1 pings h2.

10.10.3.2 dev br-eth3 lladdr 66:9a:cf:0a:92:4e STALE
128.163.232.20 dev eth0 lladdr fe:ff:ff:ff:ff:ff STALE
172.16.0.1 dev eth0 lladdr fe:ff:ff:ff:ff:ff REACHABLE
10.10.4.1 dev br-eth4 lladdr 02:e2:49:b2:2b:68 STALE
10.10.8.1 dev br-eth1 lladdr 2a:df:ac:bf:f5:49 STALE
10.10.6.1 dev br-eth2 lladdr 32:c7:f5:8e:5c:45 STALE
172.17.253.254 dev eth0 lladdr fe:ff:ff:ff:ff:ff STALE
172.16.0.3 dev eth0 lladdr fe:ff:ff:ff:ff:ff STALE

The arp table during a ping from h1 to h2 is the same. So the ping itself does not cause dstr's arp table to update any further (which makes sense, since it already has all the entries it should need).

Sarah,

I tried creating this topology: h1 - srcr - dstr - h2. I ran the exact same configuration (minus one caveat I will mention below) and it works.
 The biggest difference here is, I suppose, that all the routes are link scoped (i.e. there are no "gateways", per se).

The caveat is my larger topology had an install script that I did not run on this simple linear topology. The script was in place for backwards compatibility with older versions of OVS. I don't see how it could have caused this problem (given that the problem only occurs at edge routers, and this script ran on ALL routers), but I think it is worth mentioning. I will write the contents of the script below.

## BEGIN SCRIPT ##
#!/bin/bash

modprobe openvswitch

ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \
--remote=db:Open_vSwitch,Open_vSwitch,manager_options \
--private-key=db:Open_vSwitch,SSL,private_key \
--certificate=db:Open_vSwitch,SSL,certificate \
--pidfile --detach --log-file
ovs-vsctl --no-wait init
ovs-vswitchd --pidfile --detach
## END SCRIPT ##

Jacob Chappell
Reply all
Reply to author
Forward
0 new messages