Biggest Design Mistake

217 views
Skip to first unread message

Jacob Uecker

unread,
Jul 10, 2015, 4:43:50 PM7/10/15
to ccdegro...@googlegroups.com
Hi Everyone,

I thought that’d I’d try to spur some design discussion, so I’d like to ask a question of the group:

What has been your biggest network design mistake? And how did it happen (was it bad assumptions, bad information, poor choices, etc)?



-Jacob



Jeremy Filliben

unread,
Jul 15, 2015, 11:09:30 AM7/15/15
to ccdegro...@googlegroups.com
Jacob,

This is a great question. Answering it feels a bit like a confessional, which is therapeutic :) Most CCDE candidates who have attended my training have heard some form of this story, but it should be new to most followers of this group.

The design mistake that I most regret occurred in 2005, but the suffering didn't begin until 2006. For those that were involved in network design at that time you likely recall the push that carriers were making to get enterprise customers off of Frame-Relay/ATM networks and over to their shiny new MPLS networks. The enterprise company I worked with at the time had a flat EIGRP network with two FR/ATM providers. We chose to migrate our Sprint FR/ATM network to L3VPN first. At the time Sprint was an all-Cisco shop and they offered to use EIGRP as the PE-CE routing protocol to simplify the migration. My Networks Operations team was not trained in BGP, so I accepted the offer. At the completion of this migration we had a Broadwing FR/ATM network and a Sprint L3VPN network, with a single EIGRP IGP. Life was good, but the choice of EIGRP as a PE-CE routing protocol had sowed the seeds of future difficulty.

The following year we began to migrate the Broadwing FR/ATM network to an MPLS-delivered WAN. Unfortunately Broadwing was not a Cisco shop, and they could not offer EIGRP as a PE-CE routing protocol. This threatened to put us in the uncomfortable position of having two WAN routing protocols at each branch. Broadwing offered another solution - L2VPN (VPLS). I chose to accept this solution, without having researched it well enough to understand the potential drawbacks. While VPLS allowed us to use EIGRP on both WANs, it was quite difficult to provide comparable metrics across the L2 and L3 VPNs. I spent a lot of time working with EIGRP offset-lists to implement an acceptable routing policy. VPLS also had significant drawbacks regarding multicast (it is similar to a LAN switch without IGMP snooping) and QoS (at the time, Broadwing's VPLS PE equipment could only classify based on 802.1p).

Eventually I revisited my initial decision that BGP was too complicated for my NetOps team. We properly trained the team and migrated Sprint to BGP, and Broadwing to L3VPN+BGP. This permitted us to implement the routing policy we wanted. This project convinced me to choose the correct technology solution for the problem at hand. I then implement a training plan for my NetOps team to get them ready to support the new network technology. This also has the side benefit of investing time and money in my team, which makes them more valuable to my organization and to the employers in general. It's a win/win.

Jeremy Filliben, CCDE #2009::3

malcybood

unread,
Jul 16, 2015, 7:27:13 PM7/16/15
to ccdegro...@googlegroups.com
A design mistake I have seen on more then one occasion is where a firewall is used for default gateways on data centre subnets instead of a layer 3 switch "just because it's there" and there is no other layer 3 functionality available for whatever reason (procedural or technical).

This causes a major administrative headaches to the network team, as every time a new subnet (and in some cases a new host!) is added the firewall, the security policy needs to be amended and this can be majorly complex if the traffic flows are large between several different subnets.

A specific issue with this is where application servers are located within different subnets with the default gateways on the firewall and they need to join one or more Active Directory Domains.  Another example could be where a telephony system needs to integrate with a contact center, voicemail, MS Exchange server etc.

This type of design causes the following issues:
  • The amount of time it takes to prepare change templates to be approved by change management is excessive.
  • This can result in the engineering team working silly hours to enable basic connectivity between servers, which demotivates staff and time could be used in more constructive ways to develop the network!
  • When managing a firewall through a GUI it doesn't tell the full story in regard to how many ACLs are being generated by a rule being implemented.
    • One rule in the GUI can result in hundreds of ACLs in the CLI i.e. Permit multiple hosts to multiple hosts to multiple TCP / UDP protocols
    • Firewalls run in multi-context mode for a shared service for example split resources across the different contexts and if one customer's rule set is huge it could potentially exhaust the resources for other customers and I've seen this happen where there was no space for any more rules on a shared service!
    • This actually raises the design question around how do you size that type of service and what network management tools should be included into the design to monitor the utilisation.
So what is the solution for this type of scenario where every subnet uses the firewall for L3 gateways?

Well in true network architecture style I guess the answer to that is "it depends" on the customer and the requirements.

It may be that some servers need to be segregated into different security zones for PCI compliance or some other regulatory compliance, but typically I find that servers can normally be grouped into a number of common groups where it is not an issue if they are able to communicate with each other within their own group, but should not be able to communicate with servers in another group.

The solution I've typically used is below
  • Group the app servers into common groups which allows them to communicate with each other
  • Create an interface on the firewall per group which has a routed link (physical or logical) to a VRF aware data centre switch.
  • The VRF aware data centre switch is configured with a routed interface to the firewall on a /29 or /30 depending if any inline appliances or HSRP etc is required.
  • Each group has a VRF is defined on the DC switch with the routed link being enabled for the appropriate group's VRF
  • The gateways that were previously hosted on the firewall are removed and added onto the DC switch using the same IP address and subnet mask.
  • This ensures the servers do not need to have their IP addresses changed!
In my experience the above approach takes the number of firewall interfaces down significantly and reduces / simplifies the rule base.

If the design was for a hosting environment with multiple customers hosted on the samephysical environment and overlapping IP addresses were present on the different customer networks, this could be addressed by implementing a firewall that could be virtualised like the ASA to run multiple contexts or Fortinet's can run VDOMs which are the equivalent to contexts.

Malcolm
Reply all
Reply to author
Forward
0 new messages