OFSwitch13: Runtime assertion failure with message "Inconsistent L2 switching table" on loop topologies

157 views
Skip to first unread message

Joshua Boley

unread,
Jun 19, 2018, 12:12:02 PM6/19/18
to ns-3-users
Hi guys,

I'm getting an assertion failure whenever I try to run ns-3 with a loop topology, using the OFSwitch13 module to create the switches. From the cmdline:

[jboley@localhost ns-3.28]$ ./waf --run "scratch/ofswitch13-diamond"
Waf: Entering directory `/home/jboley/build/ns-3.28/build'
[ 909/1873] Compiling scratch/ofswitch13-diamond.cc
[1860/1873] Linking build/scratch/ofswitch13-diamond
Waf: Leaving directory `/home/jboley/build/ns-3.28/build'
Build commands will be stored in build/compile_commands.json
'build' finished successfully (2.730s)
PING  10.1.1.2 56(84) bytes of data.
assert failed. cond="itSrc->second == inPort", msg="Inconsistent L2 switching table", file=../src/ofswitch13/model/ofswitch13-learning-controller.cc, line=146
terminate called without an active exception
Command ['/home/jboley/build/ns-3.28/build/scratch/ofswitch13-diamond'] terminated with signal SIGIOT. Run it under a
debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").

I've run with gdb but the stack trace goes so deep into the ns-3 code that I have no idea what I'm looking at. I've attached source code for a simple test case with all loops, etc. unrolled so you can see exactly how the configuration proceeds. I've also attached the output of a run of that code with --verbose set.

Just for yuks, I've also tried to run with an external controller through a TapBridge. In that particular case there are no assertion failures and the simulation runs to completion, but the OF switches never attempt to connect to the controller and the ping test naturally fails. That at least shows that the problem goes deeper than just the ofswitch13-learning-controller.cc code. I can post logs and traces for this case too, if desired.

I've made a targeted search both on the ns-3 documentation site and these forums but haven't hit on anything relevant. Did I miss something about ns-3 not working properly with loop topologies? Is this just a problem with OFSwitch13? Or am I not configuring the fabric or simulation settings correctly?

Our primary use cases involve loop topologies on OpenFlow 1.3 enabled networks, so this is make or break.

Thanks in advance,
Josh
ofswitch13-diamond.cc
verbose.log

Joshua Boley

unread,
Jun 19, 2018, 12:23:57 PM6/19/18
to ns-3-users
Apologies, I attached a version of ofswitch13-diamond.cc that uses a modification I made to the OFSwitch13 module that allows me to explicitly set datapath IDs. I've attached the correct version here.

Thanks,
Josh
ofswitch13-diamond.cc

Joshua Boley

unread,
Jun 19, 2018, 6:20:52 PM6/19/18
to ns-3-users
There must not be too many people using the OFSwitch13 module out there. Anyway, for those of you (apparently) few and far between who run into this problem, the assertion errors are generated when the OFSwitch13 learning controller application sees the same source MAC address come in on different ports (i.e., the ARP packets, which are flooded). That much was fairly easy to diagnose after going through the learning switch source code and checking against the verbose output. Good news is that isn't a bug, bad news is it's a limitation of the logic implemented by the learning controller. I'll consider writing a smarter controller app, but this isn't really our target scenario.

I'm still in the process of diagnosing the breakdown that occurs when an external switch is used.

Josh

On Tuesday, June 19, 2018 at 11:12:02 AM UTC-5, Joshua Boley wrote:

Luciano Jerez Chaves

unread,
Jun 23, 2018, 6:45:32 PM6/23/18
to ns-3-users
Hi Joshua,

As stated in the OFSwitch13 documentation, the Learning Controller only works with loop-free topologies. This is because this controller is based on the same implementation from the Bridge module, which also has this same limitation. The proper way of fixing it is implementing the spanning tree protocol for the network (as in real networks). But when thinking about SDN, the common strategy is to implement a "smart" controller application, as using the controller to configure the network in the same way as a standard switch would do may not be an efficient solution. Because of the that, this limitation was left as it is. Anyway, contributions  are always welcome.

Luciano.

Joshua Boley

unread,
Jun 23, 2018, 7:04:29 PM6/23/18
to ns-3-...@googlegroups.com

Hi Luciano,

 

Thanks for the feedback. I must have skimmed over it by accident, I don’t recall reading that but I’m sure you’re right; it’s a symptom of having my attention split a dozen different ways at once. Occupational hazard.

 

I haven’t had much time in the past few days to follow up on it (again with the split attention), but do you have any insight on the problems connecting to external controllers? At first it seemed like loop topologies were especially susceptible, but testing since then shows that the type of topology doesn’t matter. What’s odd is that I’m seeing “phantom” OFPT_PACKETOUTs in the openflow trace while ARP replies/requests are still being exchanged, and then it all seems to just hang for the rest of the sim without the switches ever attempting to handshake the controller.

 

I know the docs say to wait for a period of 3 to 5 minutes between runs, and that does seem to work more often then not. I’d still like to find a solution as this prevents us from batching together a series of tests on scaling topologies.

 

Thanks,

Josh

--
Posting to this group should follow these guidelines https://www.nsnam.org/wiki/Ns-3-users-guidelines-for-posting
---
You received this message because you are subscribed to the Google Groups "ns-3-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ns-3-users+...@googlegroups.com.
To post to this group, send email to ns-3-...@googlegroups.com.
Visit this group at https://groups.google.com/group/ns-3-users.
For more options, visit https://groups.google.com/d/optout.

 

Luciano Jerez Chaves

unread,
Jun 23, 2018, 7:17:19 PM6/23/18
to ns-3-users
Hi,

To be honest, I've never used the external controller in deep. Some people also complain about different issues on the module discussing list. The real switch datapath implementation is implemented by an external library, which was tested by many users but I can't ensure it is full compliant with the standards. I've already found some small bugs there, and there may be others. So, to understand what is, de facto, happening in your scenario requires some deep debuging.

The time to wait between simulations is related just with reusing port number for new sockets. The TCP state machine says that the socket must wait for 2 * MSL (by default, 4 minutes) before going to the closed stated. Maybe using different controller port number every time the simulation restarts can help on this.

Luciano.

Joshua Boley

unread,
Jun 25, 2018, 12:21:41 PM6/25/18
to ns-3-...@googlegroups.com

Hi Luciano,

 

Ok, interesting. Thanks for the info. I’ll take this over to the module discussion list after I’ve done a bit more poking around. I have a couple workarounds I’d like to try before diving into a deep debug session.

Reply all
Reply to author
Forward
0 new messages