gRPC Python DNS Resolution

as...@brilliant.tech

unread,

Jan 21, 2019, 12:27:49 AM1/21/19

to grpc.io

I'm trying to set up a python gRPC simple client and server, where the client uses round robin load balancing against a single DNS record, where I have multiple servers (instances) in the DNS record.

In the beginning, I'm able to connect and issue queries fine, but when I try a re-deploy of my servers, I get some weird behavior that I was hoping would be resolved automatically by the client library. In my re-deploy, I first bring up new servers, set the DNS record to the IPs of the new servers, and then destroy the old servers. Everything seems to work until I destroy the old servers, at which point I get a couple of UNAVAILABLE_ERRORs followed by DEADLINE_EXCEEDED until I kill the client. From what I understand, when the sub-channels go down (i.e. the server instances are killed), the channel should re-resolve the DNS record and attempt to re-connect to the new instances. Am I interpreting this incorrectly? Is there some channel and/or server option I need to set in order for this to work?

Sample client below:

channel = grpc.insecure_channel("localhost:10000", options=(("grpc.lb_policy_name", "round_robin"),))

fut = grpc.channel_ready_future(channel)

fut.result()

print("done waiting")

stub = test_pb2_grpc.TestStub(channel)

while True:

try:

print(stub.Test(request, timeout=5))

except grpc.RpcError as e:

print("{} {}".format(time.time(), e))

time.sleep(0.5)

Srini Polavarapu

unread,

Jan 23, 2019, 2:29:35 PM1/23/19

to grpc.io

That's right. When a subchannel goes down, the channel re-resolves DNS in round-robin LB. Depending on your DNS TTL, the OS may still be returning cached DNS entry which might still contain the server IP that went down. Regardless, depending on DNS updates will not fully meet your LB requirements. This is because gRPC client does not periodically re-resolve DNS. This means when new backends are added, gRPC client will not know about those. See this and this.

as...@brilliant.tech

unread,

Jan 24, 2019, 1:30:02 PM1/24/19

to grpc.io

Sure, I understand that part. But what I didn't understand was why I continued to get DEADLINE_EXCEEDED errors after the UNAVAILABLE errors? I would have though that since the old instances were terminated, I wouldn't be able to even maintain a connection to the old instances, so theoretically I would keep trying to reconnect and eventually open up a connection to the new instance.

Srini Polavarapu

unread,

Jan 26, 2019, 1:36:43 AM1/26/19

to grpc.io

From your description it looks like you are destroying all the old servers and bringing up a completely new set of servers with new IPs. The gRPC client is still seeing old IPs in the cached DNS, none of which are available. It will try to connect to these unavailable IPs until the deadline is reached.

Reply all

Reply to author

Forward