I'm a bit unclear as to how name resolvers in GRPC work with load balancing in cases of rolling deploys. As far as I can tell, it seems like the most recently deployed server will end up no traffic if we follow our current deploy strategy.
Our rolling deploy strategy that we plan to adapt for GRPC works as follows
1. Build artifacts
2. De-register a server replica from its DNS name
3. Update the server and restart it
4. Re-register the server replica from its DNS name.
For the Java GRPC implementation, it looks like the GRPC name resolver
does not refresh the list of IPs unless (1) there is an error in a previous resolve or (2) a server goes down. I believe the core implementation does the same thing, though I'm not familiar enough with C to really tell.
What I believe will happen during a rolling deploy is:
1. Before deploy: Client is talking to N nodes
2. A server is removed from DNS, nothing happens on the client
3. The server issues a GOAWAY frame to clients. The client removes the server from its list of connections, and resolves a new list of servers, finding any newly added servers
4. The server is restarted and added to the DNS
5. Repeat for all other servers in the server set
5. After deploy: Client is talking to N-1 nodes and will never attempt to look for the last server to be restarted
Is my analysis correct? And if so, what is the recommended way to make sure the client ends up talking to all N servers after a rolling deploy?