gRPC client side RoundRobin loadbalancing w/ Consul DNS

114 views
Skip to first unread message

Ram Kumar Rengaswamy

unread,
Jan 16, 2019, 8:01:33 PM1/16/19
to grp...@googlegroups.com
Hello ... We are looking to setup client-side loadbalancing in GRPC (C++).
Our current plan roughly is the following:
1. Use consul for service discovery, health checks etc.
2. Expose the IP addresses behind a service to GRPC client via Consul DNS Interface
3. Configure the client to use simple round_robin loadbalancing (All our servers have the same capacity and therefore we don't need any sophisticated load balancing)

Before we embark on this path, it would be great if someone with gRPC production experience could answer a few questions.
Q1: We plan to use a low DNS TTL (say 30s) to force the clients to have the most up to date service discovery information. Do gRPC clients honor DNS TTL ?
Q2: Is it possible for gRPC to resolve DNS via TCP instead of UDP ? We could have a couple of hundred backends for a service.
Q3: Does gRPC do its own health checks and mark unhealthy connections?

Also from experience, do folks think that this is a really bad idea and we should really use grpclb policy and implement a look-aside loadbalancer instead ?

Thanks,
-Ram

Carl Mastrangelo

unread,
Jan 17, 2019, 9:23:37 PM1/17/19
to grpc.io
I know you asked for C++, but At least for Java we do not honor TTL.  (because the JVM doesn't surface it to us).  If you implement your own NameResolver (not as hard as it sounds!) you can honor these TTLs.  

I believe C++ uses the cares resolver which IIRC can resort to doing TCP lookups if the response size is too large.  Alas, I cannot answer with any more detail.

gRPC has the option to do health checks, but what I think you actually want are keep-alives.  This is configurable on the channel and the server.  If you can add more detail about the problem you are trying to avoid, I can give a better answer.

As for if DNS is a really bad idea:  Not really.  It has issues, but none of them are particularly damning.   For example, when you add a new server most clients won't find out about it until they poll again.  gRPC is designed around a push based name resolution model, with clients being told what servers they can talk to.   DNS is adapted onto this model, by periodically spawning a thread and notifying the client via the push-interface.   

The DNS support is pretty good in gRPC, to the point that implementing a custom DNS resolver is likely to cause more issues (what happens if the A lookups succeed, but the AAAA fail?, what happens if there are lots of addresses for a single endpoint?, etc.)

One last thing to consider:  the loadbalancer in gRPC is independent of the name resolver.  You could continue to use DNS (and do SRV lookups and such) and pass that info into your own custom client-side LB.  This is what gRPCLB does, but you could customize your own copy to not depend on a gRPCLB server.   There's lots of options here. 

apo...@google.com

unread,
Jan 17, 2019, 11:00:33 PM1/17/19
to grpc.io
I can add to a couple of questions.
Re: > "Do gRPC clients honor DNS TTL ?"

gRPC clients don't look at TTL's at all. In C++, a gRPC client channel will request it's DNS resolver to re-resolve when it determines that it has reached "transient failure" state. The details of when exactly it reaches that state depends on the load balancing policy in use. In "round robin", it would be roughly when all individual connections in the list reach "transient failure", i.e. if the connections all break. Effectively, if backends are moving around and things break, then the default client will re-resolve. But if you want the DNS resolution to be up to date for different reasons, there's no polling built in to the default DNS resolver.

This could be done with a custom resolver, but in C++ the resolver API isn't currently public. I understand that making that API public is something in progress though.

Re: > "Q2: Is it possible for gRPC to resolve DNS via TCP instead of UDP ?"

If the DNS server sends back a large response (c-ares considers a large response greater than 512 bytes), or if the response has a "truncated" bit set, then it will re-send its query over TCP. I can confirm this with "ares", I believe this is also the case with "native" (in C++ there's two DNS resolvers: "ares" (c-ares) and "native" (getaddrinfo); "native" is the default one right now, but "ares" should be the default in the upcoming 1.19 release)

Ram Kumar Rengaswamy

unread,
Jan 18, 2019, 2:30:19 AM1/18/19
to apo...@google.com, grpc.io
Hmm ... It's unfortunate that there is no way to force the C++ client to periodically refresh it's list of IP addresses. That's a show stopper as our backends scale up elastically and there is no way for gRPC client to become aware of them.

Q1: If we implement our own lookaside LB, could we configure the client to consult this LB for a fresh set of IP addresses periodically ?
Q2: Can the lookaside LB be within the client process ?


--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/f8a69f99-5c08-4f33-8299-3f4922aed084%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark D. Roth

unread,
Jan 18, 2019, 2:45:33 PM1/18/19
to Ram Kumar Rengaswamy, Alexander Polcyn, grpc.io
The issue of forcing DNS to periodically re-resolve is being discussed in https://github.com/grpc/grpc/issues/12295, but there's no consensus yet that we actually want to implement that.  That having been said, that issue does discuss a work-around that people have used successfully, which is to use the MaxConnectionAge feature on the server side to periodically force clients to reconnect.  Whenever that happens, the client will re-resolve.

I think that in your situation, I would use the above workaround with the existing DNS resolver and the existing round_robin LB policy.  I don't think you need to write any custom plugins.


For more options, visit https://groups.google.com/d/optout.


--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

Ram Kumar Rengaswamy

unread,
Jan 18, 2019, 3:15:54 PM1/18/19
to Mark D. Roth, Alexander Polcyn, grpc.io
Thanks for pointer to the discussion.
The MaxConnectionAge feature won't work for us as we do want to preserve very long lived connections.
In future, if Custom Resolver API in C+ becomes public, then that would be the way to go.
For now, we are going to implement a lookaside load balancer that proxies the DNS lookup and switch to grpclb scheme in the C++ client.
Reply all
Reply to author
Forward
0 new messages