Round Robin client side load balancing with custom service discovery

Mikael Morales

unread,

Apr 3, 2025, 10:11:25 AMApr 3

to grpc.io

Hey all,

We're trying to improve the load balancing across multiple services handling a significant amount of traffic (up to 2M requests per second for the biggest ones) and each server is roughly handling 5.5K requests per second.

Our services autoscale a lot across the day and we need to discover new servers within seconds. For performance and cost efficiency, each client is aware of all its servers, so the traffic does not flow through an AWS LB.

So far, to discover new servers, we've been using the max connection age as recommended, which lead to some traffic imbalance impacting the performance and cost.

To improve the current setup we decided to:

Implement our own custom name resolver that periodically polls the available servers.
Leverage the round robin load balancing policy to evenly distribute the traffic.

All of this is working well but we are observing latency spike whenever we autoscale (scale-in and out).
Our understanding is that round robin will first establish the connection before utilizing the sub-channel, so we don't understand the spike when new server are discovered.

And the spike when a server is deleted is even more unclear to us.

We've tried many things but without luck, so any explanations or suggestions from your side would be really appreciated.

Thanks

Mark D. Roth

unread,

Apr 14, 2025, 8:31:30 PMApr 14

to Mikael Morales, grpc.io

Unfortunately, it's hard to diagnose the problem without a lot more details and ideally a way to reproduce it. The LB policy does establish a connection before picking the subchannel, so I don't think that's the issue.

Is it possible that you're seeing cases where the resolver is returning a completely disjoint set of addresses from what it returned previously, which might cause a slight latency spike in some gRPC implementations as the policy tries to get connected to the new set of addresses? Or, alternatively, is it possible that there is only a very small overlap between the two sets, which might cause gRPC to send all traffic to a small set of endpoints for which it already has connections, while it tries to establish connections to all of the new endpoints?

Are you seeing this latency spike measured on the client side? Do you see corresponding latency on the server side? Are there traffic spikes on individual endpoints when this happens?

What language and version of gRPC are you using?

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/grpc-io/d2e2226d-eb15-4838-bc95-c1b1548f5d2en%40googlegroups.com.

--

Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

Mikael Morales

unread,

Apr 25, 2025, 6:26:09 AMApr 25

to grpc.io

Thanks for taking the time to reply.

> Is it possible that you're seeing cases where the resolver is returning a completely disjoint set of addresses from what it returned previously, which might cause a slight latency spike in some gRPC implementations as the policy tries to get connected to the new set of addresses? Or, alternatively, is it possible that there is only a very small overlap

> between the two sets, which might cause gRPC to send all traffic to a small set of endpoints for which it already has connections, while it tries to establish connections to all of the new endpoints?

Our tests were done in a controlled environment, so we were terminating or adding a specific amount of instances each time. We were trying with 50 instances total and were adding or terminating 5, so that's by how much the set of addresses was changing.

> Are you seeing this latency spike measured on the client side? Do you see corresponding latency on the server side?

We did further tests and definitely one of the cause seems to be a slightly higher latency in new server instances, so we're working on improving our warmup process there. This would explain the latency spikes in the client and server when adding new instances.
When instances are removed, we implemented a graceful shutdown in our gRPC server and we're seeing new connections receiving GO_AWAY, as expected. But for existing connections, for a couple seconds after the server was terminated, we're seeing deadline exceeded errors while attempting to send requests to terminated instance. Is the status of the connections monitored by the client, or is there any way to let the client know from the server that the connection is dead?

>Are there traffic spikes on individual endpoints when this happens?

We only have one endpoint.

> What language and version of gRPC are you using?

Java and we're using version 1.70.0 with netty.

Thanks

Reply all

Reply to author

Forward