Way to round-robin DNS results when making HTTP requests?

1,796 views
Skip to first unread message

Nathan Smith

unread,
Jul 11, 2015, 7:21:07 PM7/11/15
to golan...@googlegroups.com
Hi all, 

Problem: A third-party API that we consume at a high volume uses a DNS-based load-balancing strategy. Their devs have reported that our app tends to favor certain IPs, creating a load imbalance on their end. They have asked me to modify our app to resolve and balance across their available IP addresses.*

I've taken a look at accomplishing this in code and run into the following roadblocks: 

1. The net/http package does not appear to expose any of the DNS lookup* internals when making requests 
2. I can't do the lookups separately and make requests to raw IPs because the API requires the Host header to be set to the host cname (and Go overwrites the Host header internally)

So, I have two questions: 

1. Does anyone have a suggested (programmatic) solution, such as a way to randomize the ordering of the resolved IPs?
2. What do the Go maintainers think about exposing DNS resolution internals similar to how Transport details are exposed?

Thanks!
nate


* Yeah, I think this is all pretty stupid, but things are probably not going to change on their end.

Jakob Borg

unread,
Jul 11, 2015, 10:31:15 PM7/11/15
to Nathan Smith, golan...@googlegroups.com

> On 12 jul 2015, at 04:56, Nathan Smith <nat...@neocortical.net> wrote:
>
> 2. I can't do the lookups separately and make requests to raw IPs because the API requires the Host header to be set to the host cname (and Go overwrites the Host header internally)

You may be able to do this overriding the Dialer on the http.Transport to do your load balancing there, and then perform requests towards the cname as usual for correct host headers.

//jb

James Bardin

unread,
Jul 11, 2015, 10:38:09 PM7/11/15
to golan...@googlegroups.com

You do have full control of DNS resolution when making requests. The Transport.Dial function is opaque, and you can use it to handle lookup however you want.

This also may be an issue with the local resolver, or the reusing of persistent connections. It would probably help if you logged what was going on to better troubleshoot the issue.

Nathan Smith

unread,
Jul 12, 2015, 1:17:40 AM7/12/15
to golan...@googlegroups.com
Thanks for the replies! I've dug around in the net package and found the source of the issue I'm having. TL;DR: go 1.4 only tries a single IP address and 1.5 seems to favor the first in the returned list.

The root of the problem seems to be in net/ipsock.go in resolveInternetAddr(). I added logging to verify that multiple IPV4 addresses were being resolved (they were), but the last line of the func is `return firstFavoriteAddr(filter, ips, inetaddr)`.

firstFavoriteAddr() seems constructed to return either a single address or a pair consisting of one IPV4 address and one IPV6 address. Thus, as long as the local resolver returns addresses in the same order, the same address (addresses if a mix of IPV4 and IPV6 is encountered) is returned.

I patched together a new version of firstFavoriteAddr() to verify that it solves my problem: 

@@ -69,8 +71,8 @@ func firstFavoriteAddr(filter func(IP) IP, ips []IP, inetaddr func(IP) netaddr)
 
return firstSupportedAddr(filter, ips, inetaddr)
 
}
 
var (
-               ipv4, ipv6, swap bool
-               list             addrList
+               ip4Addrs, ip6Addrs addrList
+               result             addrList
 
)
 
for _, ip := range ips {
 
// We'll take any IP address, but since the dialing
@@ -79,30 +81,27 @@ func firstFavoriteAddr(filter func(IP) IP, ips []IP, inetaddr func(IP) netaddr)
 
// possible. This is especially relevant if localhost
 
// resolves to [ipv6-localhost, ipv4-localhost]. Too
 
// much code assumes localhost == ipv4-localhost.
-               if ip4 := ipv4only(ip); ip4 != nil && !ipv4 {
-                       list = append(list, inetaddr(ip4))
-                       ipv4 = true
-                       if ipv6 {
-                               swap = true
-                       }
-               } else if ip6 := ipv6only(ip); ip6 != nil && !ipv6 {
-                       list = append(list, inetaddr(ip6))
-                       ipv6 = true
-               }
-               if ipv4 && ipv6 {
-                       if swap {
-                               list[0], list[1] = list[1], list[0]
-                       }
-                       break
+               if ip4 := ipv4only(ip); ip4 != nil {
+                       ip4Addrs = append(ip4Addrs, inetaddr(ip4))
+               } else if ip6 := ipv6only(ip); ip6 != nil {
+                       ip6Addrs = append(ip6Addrs, inetaddr(ip6))
 
}
 
}
-       switch len(list) {
+
+       if len(ip4Addrs) > 0 {
+               result = append(result, ip4Addrs[rand.Intn(len(ip4Addrs))])
+       }
+       if len(ip6Addrs) > 0 {
+               result = append(result, ip6Addrs[rand.Intn(len(ip6Addrs))])
+       }
+
+       switch len(result) {
 
case 0:
 
return nil, errNoSuitableAddress
 
case 1:
-               return list[0], nil
+               return result[0], nil
 
default:
-               return list, nil
+               return result, nil
 
}
}



I'm not suggesting this as a code change, obviously. I just wanted to see if I had it right. My guess is that any code that tried to equalize the chosen IP A) would have to play nice with KeepAlive settings and B) shouldn't use rand as a dep.

This code seems to have changed quite a bit in in 1.5beta. firstFavoriteAddr() has been replaced with filterAddrList(), which does return all IPV4 addresses, but the first in the list continues to be favored. I haven't dug into why yet.

I will look more into implementing my own Dialer to achieve my goal. However, I can see that I will be circumventing quite a bit of well-vetted code, which I'm not enthusiastic about. 

I guess my remaining question to anyone who has read this far down is: is the current behavior the best behavior? Should the net package do more to either A) spread requests across DNS lookup results and/or B) be more robust about trying multiple IPs in the event of a connection failure?

Cheers, 
nate

Will Donnelly

unread,
Jul 13, 2015, 3:44:16 PM7/13/15
to golan...@googlegroups.com
On Saturday, July 11, 2015 at 10:17:40 PM UTC-7, Nathan Smith wrote:
go 1.4 only tries a single IP address and 1.5 seems to favor the first in the returned list.

I'm not sure if there's a standard as such, but https://en.wikipedia.org/wiki/Round-robin_DNS seems to suggest that this is typical behavior and the DNS server is expected to permute the addresses on their end if they don't want everybody connecting to whatever they return first.

Nathan Smith

unread,
Jul 14, 2015, 1:26:57 AM7/14/15
to golan...@googlegroups.com
On Monday, July 13, 2015 at 12:44:16 PM UTC-7, Will Donnelly wrote:
I'm not sure if there's a standard as such, but https://en.wikipedia.org/wiki/Round-robin_DNS seems to suggest that this is typical behavior and the DNS server is expected to permute the addresses on their end if they don't want everybody connecting to whatever they return first.

OK, cool, that makes sense. The DNS load-balancing product in question (Akamai GTM) does this pretty poorly. Oh well.

n

James Bardin

unread,
Jul 14, 2015, 10:53:39 AM7/14/15
to Nathan Smith, golan...@googlegroups.com

On Tue, Jul 14, 2015 at 1:26 AM, Nathan Smith <nat...@neocortical.net> wrote:
OK, cool, that makes sense. The DNS load-balancing product in question (Akamai GTM) does this pretty poorly. Oh well.

Are you sure that you're actually re-dialing that often? If most of the requests are going over a few persistent connections, then DNS doesn't come into play at all.

Akamai actually does a very good job of this; but if DNS is a problem, then these API servers' entries might not be configured properly.

For example, if the TTL is very high, you're going to get the same cached result for that entire time, sending all requests to the same host. They could also be using location specific mapping (geographic or CIDR) to route requests to nearer servers, inadvertently causing high volume users to focus on a singe host.

This is all besides the fact that round robin dns shouldn't be the sole method of load balancing for a high volume system in the first place.

Nate Smith

unread,
Jul 14, 2015, 7:35:52 PM7/14/15
to golan...@googlegroups.com
I added log statements at various points in the Go source to verify that lookups were happening, although it's possible I might have missed something and fooled myself. 

Running nslookup in a loop does show that Akamai is doing a reasonably good job of mixing things up (albeit with some static stretches of 10 sec or so). That makes me think that their TTL is set pretty low, but that there is still some caching somewhere in the chain.

And yeah, DNS-based load balancing is a poor choice in this case. I gently pointed this out, but it's not going to change.

Jason Playne

unread,
Jul 15, 2015, 4:18:49 AM7/15/15
to Nate Smith, golan...@googlegroups.com
TBH I would have pushed back to the provider of the 3rd party service you are consuming with a link to LVS (https://en.wikipedia.org/wiki/Linux_Virtual_Server). This will make their lives a lot easier!

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-François Bustarret

unread,
Jul 15, 2015, 10:57:15 AM7/15/15
to golan...@googlegroups.com, nat...@neocortical.net
Le mercredi 15 juillet 2015 10:18:49 UTC+2, Jason Playne a écrit :
TBH I would have pushed back to the provider of the 3rd party service you are consuming with a link to LVS (https://en.wikipedia.org/wiki/Linux_Virtual_Server). This will make their lives a lot easier!

I'm sure Akamai would laugh when receiving a link to LVS...

This does not really work for a distributed multi-datacenter platform (with DNS entries pointing to different locations).

JFB

James Bardin

unread,
Jul 15, 2015, 11:01:59 AM7/15/15
to Jean-François Bustarret, golan...@googlegroups.com, Nate Smith
On Wed, Jul 15, 2015 at 4:44 AM, Jean-François Bustarret <jfb...@gmail.com> wrote:
I'm sure Akamai would laugh when receiving a link to LVS...


I think he meant for the API provider, not Akamai.

Even still, I wouldn't presume that LVS would be a good solution (or even applicable) for a system I know nothing about. 

 

Nate Smith

unread,
Jul 15, 2015, 12:56:28 PM7/15/15
to golan...@googlegroups.com
The third party in question isn't Akamai, they are just using Akamai's DNS-based load balancing product. I've definitely told them (diplomatically but firmly) that their API's clients shouldn't need to compensate for their bad design decisions. Believe me, this is one item in a long list of issues with these folks.

Jason Playne

unread,
Jul 15, 2015, 9:21:06 PM7/15/15
to Nate Smith, golan...@googlegroups.com
​I am pretty sure we all feel your pain. been there and done that!​

Reply all
Reply to author
Forward
0 new messages