Can't ping hosts using multiple egre-tunnels to the same destination

53 views
Skip to first unread message

gonçalo Semedo

unread,
May 19, 2014, 1:50:01 PM5/19/14
to geni-...@googlegroups.com
Hi all,

I have a wan topology with OVS switches connected by egre-tunnels.

When I have a topology like the attached image topo_2_link (unique path from host1 to host 2, using 3 switches), Host1 can ping Host2.

If I have a topology like the attached image topo_3_link (two paths from host1 to host 2, using 3 switches, or using only two switches), Host1 can't ping Host2 if the 3 switches path is the chosen by the algorithm.

Why when I create another path adding another egre-tunnel in my topology the other path stops working?

Thanks
Gonçalo

topo_2_links.png
topo_3_links.png

Sarah Edwards

unread,
May 20, 2014, 2:54:14 PM5/20/14
to geni-...@googlegroups.com, Sarah Edwards
Hi Gonçalo,

I think we've reached the point where this would  be much easier to debug if we could log into your slice.

I've created a project for this purpose and added you, me, Niky and Vic to it.

Could you:
 (1) Create a slice in the project "ExperimentDebug".  (Niky, Vic and I will be automatically added to that slice)
 (2) Reserve your topology using omni.  Make each of the three calls in quick succession.
        omni createsliver MYSLICE MYRSPEC -a sox-ig -r ExperimentDebug
        omni createsliver MYSLICE MYRSPEC -a max-ig -r ExperimentDebug
        omni createsliver MYSLICE MYRSPEC -a gatech-ig -r ExperimentDebug
Where MYSLICE is the name of your slice and MYRSPEC is the full path to the file containing your RSpec.
 (3) Send us instructions on how to start your controller and any steps to reproduce the error. 

Let us know if you have any problems.

Thanks,
Sarah

--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<topo_2_links.png><topo_3_links.png>

*******************************************************************************
Sarah Edwards
GENI Project Office

BBN Technologies
Cambridge, MA
phone:    (617) 873-2329
email:    sedw...@bbn.com





gonçalo Semedo

unread,
May 21, 2014, 8:16:16 AM5/21/14
to geni-...@googlegroups.com
Thanks Sarah!

Ok I did what you told me.

I created a slice named egreDebug and reserved my topology using omni.

The controller is installed on the northwestern-ig  node.

Just run sudo java -jar target/floodlight.jar -cf src/main/resources/floodlightdefaultproprieties

In the console, will be shown the chosen path by the algorithm, and every time the path includes all 3 switches the ping will fail.

Note that hosts are found using arp, so sometimes you have to empty hosts arp cache.

Thanks,
Gonçalo

Niky Riga

unread,
May 21, 2014, 11:28:32 AM5/21/14
to geni-...@googlegroups.com
Hi Goncalo,

Unfortunately the extra accounts were not created in your slice which means that either you didn't use omni, or
you have an old version of omni  (you can check that by omni --version)

Unfortunately you will have to delete and recreate your slice. You can either:
 1. update the omni version you are using (you will also have to submit your rspec to nortwestern-ig), or
 2. upload the public key I sent you before in your profile in the portal, and *then* do the reservation
     through flack or an older omni.

Bestm
Niky



May 21, 2014 at 8:16 AM
--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
May 19, 2014 at 1:50 PM
Hi all,

I have a wan topology with OVS switches connected by egre-tunnels.

When I have a topology like the attached image topo_2_link (unique path from host1 to host 2, using 3 switches), Host1 can ping Host2.

If I have a topology like the attached image topo_3_link (two paths from host1 to host 2, using 3 switches, or using only two switches), Host1 can't ping Host2 if the 3 switches path is the chosen by the algorithm.

Why when I create another path adding another egre-tunnel in my topology the other path stops working?

Thanks
Gonçalo

gonçalo Semedo

unread,
May 21, 2014, 1:08:31 PM5/21/14
to geni-...@googlegroups.com
Hi Niky,

I had a oldest version of omni.

I updated omni and reserved everything all over again, but now, ping works when using the 3 switches, and fails when using only 2 switches !

Try it now and let me know if it works.


Thanks,
Gonçalo




On Monday, 19 May 2014 18:50:01 UTC+1, gonçalo Semedo wrote:

gonçalo Semedo

unread,
May 21, 2014, 1:09:56 PM5/21/14
to geni-...@googlegroups.com
I created a new slice named egreDebug2


Sarah Edwards

unread,
May 21, 2014, 1:42:07 PM5/21/14
to geni-...@googlegroups.com, Sarah Edwards
Ah.  Sorry, we should have told you to reconfigure your omni after you upgraded per [1].

I added Niky, Vic and I by I following these instructions [2] to add our keys to the slice:
omni -V 3 poa urn:publicid:IDN+ch.geni.net:ExperimentDebug+slice+egreDebug2  geni_update_users -a max-ig -a sox-ig -a gatech-ig --useSliceMembers

The login information isn't available through the portal but we can get it via readyToLogin: 
readyToLogin.py urn:publicid:IDN+ch.geni.net:ExperimentDebug+slice+egreDebug2 --no-keys -a max-ig -a sox-ig -a gatech-ig

Using the output of this command I was able to login to Host-1.

Sarah



[2] http://trac.gpolab.bbn.com/gcf/wiki/HowTo/UseCHAPIInOmni#poageni_update_users

--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gonçalo Semedo

unread,
May 21, 2014, 5:06:56 PM5/21/14
to geni-...@googlegroups.com
Have you found something that points towards the problem ?


Gonçalo

On Monday, 19 May 2014 18:50:01 UTC+1, gonçalo Semedo wrote:

Sarah Edwards

unread,
May 21, 2014, 7:52:36 PM5/21/14
to geni-...@googlegroups.com, Sarah Edwards
Hi Gonçalo,

Vic and I looked at this.

We did a couple of things.
 (1) We deleted the controller from each OVS switch and put each OVS switch in standalone mode.  We were able to successfully ping between Host1 and Host 2 which is good and implies your topology came up ok.
 (2) We then set the controller and put each OVS switch into secure mode.  
       * When we started the controller we were able to reliably ping along the long path (Host1-> OVS-1 -> OVS-2 -> OVS-3 -> Host 2).  
       * When we stop the controller the pings continued. 
       * We tried restarting the controller a number of times and saw the same behavior you did where it seemed happy to send traffic along the long path but not along the short path.  
* After we played around for awhile we started seeing less consistent behavior such as long periods where no pings got through but then a single ping or a group of 4 pings would get through.
* Once the controller went nuts and kept printing this error over and over:
17:22:23.749 INFO [n.f.c.i.OFChannelHandler:Hashed wheel timer #2] Disconnected switch [? DPID[?]]
17:22:24.042 ERROR [n.f.c.i.OFChannelHandler:Hashed wheel timer #2] Disconnecting switch [? DPID[?]]: failed to complete handshake

Can you tell us about the controller you are using?  Is it just a standard un-modified floodlight controller?

We're suspicious that this is an issue with the controller and that you should reach out to the controller community for advice.

Thanks,
Sarah & Vic

Just run sudo java -jar target/floodlight.jar -cf src/main/resources/floodlightdefaultproprieties

In the console, will be shown the chosen path by the algorithm, and every time the path includes all 3 switches the ping will fail.
--
GENI Users is a community supported mailing list, so please help by responding to questions you know the answer to.
 
If this is your first time posting a question to this list, please review http://groups.geni.net/geni/wiki/GENIExperimenter/CommunityMailingList
---
You received this message because you are subscribed to the Google Groups "GENI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geni-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gonçalo Semedo

unread,
May 22, 2014, 7:53:11 AM5/22/14
to geni-...@googlegroups.com, Sarah Edwards
Hi Sarah and Vic,

First of all, thanks a lot for the time you are spending trying to help me. 

The controller has a module called random created by me, that chooses the next port where the packet will be send randomly. That is the only difference between my controller and a standard un-modified floodlight controller.

I don't know how the standalone OVS works, are you sure that both paths were used?  I don't know what could be wrong with the controller since it works fine in mininet.

Do you know any major difference between mininet and GENI that makes the algorithm work in mininet, but fail in GENI?

Thanks
Gonçalo

Sarah Edwards

unread,
May 22, 2014, 9:42:54 AM5/22/14
to geni-...@googlegroups.com, Sarah Edwards
Hi Gonçalo,

What happens when you run the unmodified controller in your current multi-AM triangle topology in GENI? 

Likewise, what happens when you run the modified and unmodified controllers on the same topology but within a single aggregate on GENI? (That is, if you reserve the triangle topology all within a single aggregate.) 

There are two fundamental things that I'm concerned about here:
   * Did the controller modification introduce a bug?  The first test above should point in the right direction.
   * In your current multi-aggregate topology the nodes  are physically far from each other and from the controller. I could imagine the delay could introduce some issues that your controller would need to handle.   The second test above should help determine if this is the issue.

Thanks,
Sarah

gonçalo Semedo

unread,
May 23, 2014, 5:27:13 PM5/23/14
to geni-...@googlegroups.com, Sarah Edwards
Hi Sarah,

The default forwarding module of floodlight is working.

And my module isn't working in a single AM topology. Which is strange because I am pretty sure I tried it before.

I will try to fix the controller problem in a single AM and then move on to a multi-AM.

Thanks again for all your help
Gonçalo

Sarah Edwards

unread,
May 23, 2014, 5:38:04 PM5/23/14
to geni-...@googlegroups.com, Sarah Edwards
Perfect!  Sounds like the right thing.

Good luck!

Sarah

gonçalo Semedo

unread,
May 25, 2014, 10:11:16 AM5/25/14
to geni-...@googlegroups.com
Hi Sarah,

I found out that the unmodified floodlight controller doesn't work either on this topology because it computes the shortest path and since in this topology, the shortest path is the one that doesn't work, ping always fails.

In a single AM topology with LANs the controller shows a weird topology, showing hosts connected to all switches and those switches are all in different domains.

In a single AM topology with EGRE-TUNNELS the controller shows the correct topology but I have the same problem as in a Multi-AM topology, one of the paths doesn't work.

Which leads me to conclude that the problem is not in my code.

Now, either the problem is in the topology, or is in the controller code.

What I find strange is that with tcpdump, I see packets leaving the switch, but never reach the destination.

What can be causing this?

I attached a image with a tcpdump capture.

Thanks

Gonçalo





On Monday, 19 May 2014 18:50:01 UTC+1, gonçalo Semedo wrote:
tcpdump.png

gonçalo Semedo

unread,
May 25, 2014, 10:59:28 AM5/25/14
to geni-...@googlegroups.com
I think i found what could be causing the problem.

"Reverse path filtering"[1] it seems that kernel routing feature will silently drop incoming packets if they arrive on an interface other than the interface that would be used by outgoing packets to the source address.

I tried to disable this feature using sudo echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter, but it says that I don't have permission.

Can you help me with that?

Thanks
Gonçalo Semedo

[1] http://serverfault.com/questions/359100/multiple-gre-tunnels-on-the-same-host-only-one-routes-incoming-packets




On Monday, 19 May 2014 18:50:01 UTC+1, gonçalo Semedo wrote:

Leigh Stoller

unread,
May 25, 2014, 11:19:21 AM5/25/14
to geni-...@googlegroups.com
> I tried to disable this feature using sudo echo 0 >
> /proc/sys/net/ipv4/conf/eth1/rp_filter, but it says that I don't have
> permission.

Hi. Typing "sudo shell redirection" into google gives me this:

http://askubuntu.com/questions/230476/when-using-sudo-with-redirection-i-get-permission-denied

Hope this helps.

Leigh





gonçalo Semedo

unread,
May 25, 2014, 12:32:14 PM5/25/14
to geni-...@googlegroups.com
Thanks Leigh that solved the permissions problem!

And I was right! The Reverse path filtering was the problem! Now I can send packets using all paths!

Thank you everyone for your help!

Gonçalo



Sarah Edwards

unread,
May 27, 2014, 9:34:30 AM5/27/14
to geni-...@googlegroups.com, Sarah Edwards
Fantastic! 

Reply all
Reply to author
Forward
0 new messages