Thanks for taking the time to reply.
> Is it possible that you're seeing cases where the resolver is returning a completely disjoint set of addresses from what it returned previously, which might cause a slight latency spike in some gRPC implementations as the policy tries to get connected to the new set of addresses? Or, alternatively, is it possible that there is only a very small overlap
> between the two sets, which might cause gRPC to send all traffic to a small set of endpoints for which it already has connections, while it tries to establish connections to all of the new endpoints?
Our tests were done in a controlled environment, so we were terminating or adding a specific amount of instances each time. We were trying with 50 instances total and were adding or terminating 5, so that's by how much the set of addresses was changing.
> Are you seeing this latency spike measured on the client side? Do you see corresponding latency on the server side?
We did further tests and definitely one of the cause seems to be a slightly higher latency in new server instances, so we're working on improving our warmup process there. This would explain the latency spikes in the client and server when adding new instances.
When instances are removed, we implemented a graceful shutdown in our gRPC server and we're seeing new connections receiving GO_AWAY, as expected. But for existing connections, for a couple seconds after the server was terminated, we're seeing deadline exceeded errors while attempting to send requests to terminated instance. Is the status of the connections monitored by the client, or is there any way to let the client know from the server that the connection is dead?
>Are there traffic spikes on individual endpoints when this happens?
We only have one endpoint.
> What language and version of gRPC are you using?
Java and we're using version 1.70.0 with netty.
Thanks