Replacing the gorouters with an agent that dynamically updates a F5 load balancers

205 views
Skip to first unread message

Mike Heath

unread,
Nov 13, 2014, 7:40:32 PM11/13/14
to vcap...@cloudfoundry.org

We currently use F5 load balancers for all of our load balancing needs. Lately we’ve been considering replacing the CF routers with an agent that dynamically updates our F5 load balancers. We’re fairly confident we can build something that replaces all the functionality of the router including emitting loggregator/metron data.


Our reasons for wanting to replace the routers are:

  • Having two layers of load-balancing is redundant.

    • Adds extra hops increasing latency

    • Adds another potential source of failure

  • We have some mission critical applications that are hesitant to move all of their application instances to Cloud Foundry. Currently they are operating in a hybrid mode where they have some instances off CF and some on. It is difficult to properly balance traffic across non-CF instances and CF instances because the F5 does not know how many instances are deployed on CF; the F5 only knows how many routers we’ve deployed.

  • The current gorouter has a lot of small quirks that are causing us problems such as:

    • Forcing a connection close with each request to the application

    • Breaking when a URL starts with “//”

    • Inserting a ‘Content-type’ header when one is not provided by the application

  • Manipulating the F5 directly provides a path for us to allow users to update their F5 configurations in a more DevOps friendly manner.

    • Allow the app to specify whether or not https should be required or optional

    • Possibly allow TCP traffic instead of requiring HTTP

    • Configure health checks, timeouts, etc.

    • Application specific “out of service” pages.

  • Removing the routers should also improve our ability to leverage our F5 global load balancer with Cloud Foundry applications.


Our biggest concern moving forward with this plan is that we may lose out on functionality that may be added to the routers. Is there anything on the router road map that would have a significant functional impact on what we plan to do? We realize we’ll have to support getting route updates from NATS today and from etcd with Diego. This is not a significant problem.


Is there another argument for not replacing the CF routers?


Assuming we move forward with this project, is there anyone else in the CF community that would be interested in collaborating with us on this effort?


-Mike


Dieu Cao

unread,
Nov 14, 2014, 11:05:17 AM11/14/14
to vcap...@cloudfoundry.org
Hey Mike,

I'm reviewing this with the Runtime team and James Bayer.

-Dieu
CF Runtime PM

Simon Johansson

unread,
Nov 14, 2014, 12:49:07 PM11/14/14
to vcap...@cloudfoundry.org
Hi Mike, currently the small quirks you are are not affecting us to much, but we are planning to move to F5 instead of haproxy(proper h/a and all that) so this would be interesting if we could make it slightly more general to support

1. Update a pool dynamically based on gorouter IPs from within CF rather from the outside.
2. And your usecase, update routing table from nats/etcd to take over gorouters role completely.


Would love to help.


--
You received this message because you are subscribed to the Google Groups "Cloud Foundry Developers" group.
To view this discussion on the web visit https://groups.google.com/a/cloudfoundry.org/d/msgid/vcap-dev/0f745b18-9f90-4162-9695-1873c4f81dd6%40cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

James Bayer

unread,
Nov 14, 2014, 9:49:52 PM11/14/14
to vcap...@cloudfoundry.org
mike,

here is a list of potential enhancements we've been compiling over time that people have asked for in the router [1].

diego makes use of a shared config service (etcd currently) and onsi has been talking about how the diego team may migrate to etcd (or whatever is playing that role) completely instead of relying on NATS for the last mile to the router.

the CF Runtime team definitely considers the NATS messages to be a private inside the-black-box interface currently that is subject to be changed without notice.

i'd prefer if we could find ways to address the issues you have enumerated before considering whether routing announcements could become a publicly supported API. the implication of routing announcements being a public api is that they are much more difficult to change as time goes on. i realize we can't eliminate the extra hop or the chance of failure, but hopefully we can mitigate them by showing a low latency per request and low risk of failure when routers are scaled across AZs.




To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

Harry Zhang

unread,
Nov 15, 2014, 8:43:48 AM11/15/14
to vcap...@cloudfoundry.org
We have many customers in China using F5 as their LB, it works as well as HAproxy.
But if we replace Gorouter by F5, we may have to face new problems, for example. we have to configure the customers' F5 hardware to make it work with CF.


On the other hand,  all the quirks you mentioned also bother us a lot and Gorouter is definitely the last component we want to do customization. Features like TCP routes and Multiple routable ports are the most common cases (1 out of 5 customers have TCP apps and almost every customer has Java RMI or jconsole requirements).

The pity is that those features are long-term goals of Gorouter which means our own "portservice" have to handle it for some more time.

So I must keep a eye open for this post. :)

Mike Youngstrom

unread,
Nov 17, 2014, 2:36:14 PM11/17/14
to vcap...@cloudfoundry.org
Thanks for compiling the document James.  Many of those features become trivial and more feature rich when using a true load balancer like the F5 LTM.

The only item in that doc that I think would possibly be problematic in an F5 LTM is the "CF Route access authorization policy".  Though I think our organization would prefer not to use such a feature in favor of our enterprise SSO/Oauth server.   The biggest concern here is if CF components like the Notification Service decide to take advantage of such a feature instead of doing oauth itself.  There are other issues with a feature like this but I'll put those comments in the doc since they are not specific to this discussion.

Another issue we are facing listed in the document is context based routing.  The complexity of our hack solution [1] is beginning to wear on our customers.  Controlling the router would allow us to provide a simpler solution.  But would still be a hack compared to an official CF solution.  Anyway, just another vote for that feature to hopefully get prioritized sooner than later.

We acknowledge the fact that the Router message and future etcd protocols are private and subject to change.  I think we are willing to shoulder that concern for now.  In the long term future, once this area has stabilized, we'd hope that this space could mature to the level of a more public api.  Opening the door to perhaps official 3rd party Load Balancer implementations of the CF Router.

No matter how feature rich and stable the CF Router becomes in the future it will never shake the problems of increased request latency and being another component to complicate that critical path between the user and the app.  Fostering support for tighter integration with 3rd party load balancers that I'm sure many enterprises have already purchased and rely upon will always have those benefits over requiring a gateway of abstractions for requests to pass through.

Thanks,
Mike



Mike Youngstrom

unread,
Nov 17, 2014, 5:05:13 PM11/17/14
to vcap...@cloudfoundry.org
I thought I'd quickly elaborate on one of the bullets Mike H. listed:

* Removing the routers should also improve our ability to leverage our F5 global load balancer with Cloud Foundry applications.

I don't know if you are familiar with the F5 GTM (Global Traffic Manager) features.  It is essentially a DNS server to help aid in routing decisions across DataCenters.  We now have CF deployments in 2 datacenters and will be adding a 3rd in the near future and our org already supports the F5 GTM for datacenter routing decisions.  Essentially, this appliance works by making health decisions for an app at the datacenter level before resolving a host.  I'm not very experienced with the internals of the GTM but one of the most common way for the GTM to make these app health decisions is by simply examining the F5 LTM (Local Traffic Manager) pool for the app in each datacenter.  In the case of CF that pool is a bunch of routers telling the GTM nothing about the health of the actual app.  If we have an actual LTM pool representing the actual instances of a CF app it makes integration with a product like the GTM much simpler.

There are several other features the F5 provides us that we are currently not taking advantage of or have provided complex workarounds for today that we believe would be simpler with a better F5 CF Router integration..

Anyway, just thought I'd elaborate on one area where we believe that regardless of the stability of the CF Router there could be benefits in supporting thirdparty routers in an enterprise deployment.

Mike
Reply all
Reply to author
Forward
0 new messages