Load Balancing long lasting bidirectional streams

howar...@google.com

unread,

Oct 7, 2019, 6:54:06 PM10/7/19

to grpc.io

We have a case where we have many clients and few servers, typically 1000:1 ratio. The traffic is a single bidirectional stream per client.

The problem we are seeing is that when a new server comes up, it will have no clients connected, as they maintain their connection to the other servers.

This is made worse by Kubernetes autoscaling, as this new server will have 0 load it will scale down and we flip flop between n and n+1 replicas. This graph shows this behavior pretty well: https://snapshot.raintank.io/dashboard/snapshot/SceOCrNpdOr4qmTUk1UHF20xMiNqGk6K?panelId=4&fullscreen&orgId=2

As a mitigation against this, we have the server close the connections every 30m. This is not great, because it takes a least 30 min to balance, and due to the above issue this generally doesn't ever work.

I am wondering if there are any best practices for handling this type of problem?

One possible idea we have is the server sharing load information and shedding load if they have more than their "fair share" of connections, but this is pretty complex

Mark D. Roth

unread,

Oct 16, 2019, 1:55:20 PM10/16/19

to John Howard, grpc.io

Do your clients all start up at the same time? If not, it's not clear to me why your setup wouldn't work. If the clients' start times are randomly distributed, then if the server closes each connection after 30m, the connection close times should be just as randomly distributed as the client start times, which means that as soon as the new server comes up, clients should start trickling into it. It may take 30m for the load to fully balance, but the new server should start getting new load immediately, and the load should increase slowly over that 30m period.

I don't know anything about the Kubernetes autoscaling side of things, but maybe there are parameters you can tune there to give the new server more time to accumulate load before Kubernetes kills it?

In general, it's not clear to me that there's a better approach than the one you're already taking. There's always a bit of tension between load balancing and streaming RPCs, because the whole point of a streaming RPC is that it doesn't go through load balancing for each individual message, which means that all of the messages go to the same backend.

I hope this information is helpful.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com.

--

Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

howar...@google.com

unread,

Oct 16, 2019, 7:17:52 PM10/16/19

to grpc.io

In general, the connection times are fairly randomly distributed, but not always. One case I have seen is a update of the server. We spin up a new replica of the server, and it takes all connections. As a result, another replica spins up shortly after due to autoscaling. This new replica takes no connections for 30m.

As far as tuning the autoscaling, that is a good point, I will look into that.

To unsubscribe from this group and stop receiving emails from it, send an email to grp...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com.

Reply all

Reply to author

Forward