A58: Weighted Round Robin LB Policy

281 views
Skip to first unread message

Yousuk Seung

unread,
Feb 6, 2023, 1:05:02 AM2/6/23
to grpc.io
This is the discussion thread for A58: Weighted Round Robin LB Policy.


Please share your comments.

Tommy Ulfsparre

unread,
Feb 7, 2023, 9:27:18 AM2/7/23
to grpc.io
Hey,

Looking forward to see this proposal implemented!  

Is there cases where you don't want to also include client local observations (like in-flight request) into the weight calculation? 

How would WRR behave for a client that load balance over a set of endpoint where a subset of the endpoints has higher (network) latencies, meaning latencies that isn't observable server side? Instead of choosing between least request and WRR could we get the benefits of both? 

Mark D. Roth

unread,
Feb 13, 2023, 1:50:09 PM2/13/23
to Tommy Ulfsparre, grpc.io
This design does not actually use any info about in-flight requests or network latencies.  It weights backends purely by the CPU utilization and request rate reported by the endpoint.

It's certainly possible to write an LB policy that weights on in-flight requests or network latency, but that's not the goal of this particular policy.
--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/4b12c382-5535-42e5-9ed8-48b1464f37adn%40googlegroups.com.


--
Mark D. Roth <ro...@google.com>
Software Engineer
Google, Inc.

Tommy Ulfsparre

unread,
Feb 13, 2023, 2:38:40 PM2/13/23
to grpc.io
Hey Mark,

I read the proposal and my question was not about using the weight based on in-flight request or network latency rather is there cases where you don't want to always include both. Meaning, can the existing design be improved by including both in-flight request count in addition to the server reported CPU utilization and request rate in the final weight calculation. Does that make sense? 

Mark D. Roth

unread,
Feb 13, 2023, 3:04:00 PM2/13/23
to Tommy Ulfsparre, grpc.io
I don't think we'd want the client to do its own tracking of in-flight requests to each endpoint, because the endpoint may also be receiving requests from many other endpoints at the same time, and the client would not see those, so it could result in incorrect weights.  I think it's both more correct and simpler to do this based solely on the metrics reported by the endpoint.

Tommy Ulfsparre

unread,
Feb 13, 2023, 4:19:36 PM2/13/23
to grpc.io

> I don't think we'd want the client to do its own tracking of in-flight requests to each endpoint, because the endpoint may also be receiving requests from many other endpoints at the same time, and the client would not see those, so it could result in incorrect weights

The client will see those because it's carried through server side load reporting? In-flight requests would just be an added variable to the weight calculation function which currently only consist of qps / cpu utilization. Or did i misunderstand what you meant here?  

> I think it's both more correct and simpler to do this based solely on the metrics reported by the endpoint

The current design would not penalize endpoints with higher latency or other things that can cause client to perceive a higher latency like (stop-the-world) garbage collection or CPU throttling. If that is not the goal of this design then opting for something simpler makes sense. 

Mark D. Roth

unread,
Feb 13, 2023, 5:01:47 PM2/13/23
to Tommy Ulfsparre, grpc.io
On Mon, Feb 13, 2023 at 1:19 PM Tommy Ulfsparre <to...@ulfsparre.se> wrote:

> I don't think we'd want the client to do its own tracking of in-flight requests to each endpoint, because the endpoint may also be receiving requests from many other endpoints at the same time, and the client would not see those, so it could result in incorrect weights

The client will see those because it's carried through server side load reporting? In-flight requests would just be an added variable to the weight calculation function which currently only consist of qps / cpu utilization. Or did i misunderstand what you meant here?  

If you want the in-flight requests to be reported by the server, then I don't see why we'd need another metric here.  The server could simply choose to increment its qps metric at the start of each request rather than at the end of each request, so the existing qps metric would also include in-flight requests.  This design does not dictate how the server computes the values it reports.
 

> I think it's both more correct and simpler to do this based solely on the metrics reported by the endpoint

The current design would not penalize endpoints with higher latency or other things that can cause client to perceive a higher latency like (stop-the-world) garbage collection or CPU throttling. If that is not the goal of this design then opting for something simpler makes sense. 

The goal of this policy is to provide balanced CPU utilization over a set of backends that are otherwise equivalent -- i.e., request latency is expected to be driven primarily by CPU utilization, so balancing the CPU utilization will also balance the request latency.

I think the cases you mention here are fundamentally different, because they involve request latency being driven primarily by things other than CPU utilization.  I don't think those use-cases are going to be able to be addressed by this design.  I suspect those are more likely to be addressed by something like a least_request policy.
 

Tommy Ulfsparre

unread,
Feb 13, 2023, 5:11:42 PM2/13/23
to grpc.io
> If you want the in-flight requests to be reported by the server, then I don't see why we'd need another metric here

I see the confusion here. I don't want in-flight requests to be reported by the server. The client will keep track of its in-flight request. 
The weight could then be calculated by using client side in-flight request + server side reported qps and cpu utilization. 

> I think the cases you mention here are fundamentally different, because they involve request latency being driven primarily by things other than CPU utilization.  I don't think those use-cases are going to be able to be addressed by this design.  I suspect those are more likely to be addressed by something like a least_request policy

Makes sense, thanks! 

Mark D. Roth

unread,
Feb 13, 2023, 5:16:36 PM2/13/23
to Tommy Ulfsparre, grpc.io
On Mon, Feb 13, 2023 at 2:11 PM Tommy Ulfsparre <to...@ulfsparre.se> wrote:
> If you want the in-flight requests to be reported by the server, then I don't see why we'd need another metric here

I see the confusion here. I don't want in-flight requests to be reported by the server. The client will keep track of its in-flight request. 
The weight could then be calculated by using client side in-flight request + server side reported qps and cpu utilization. 

So that goes back to what I was saying earlier: I think this would result in incorrect weights, because each client will see only a fraction of the in-flight requests.  This would no longer be weighting the backends based on CPU utilization normalized by traffic.
 

Tommy Ulfsparre

unread,
Feb 14, 2023, 3:45:17 AM2/14/23
to grpc.io
> So that goes back to what I was saying earlier: I think this would result in incorrect weights, because each client will see only a fraction of the in-flight requests.  This would no longer be weighting the backends based on CPU utilization normalized by traffic.

Right, but as you said that is a property that is not always desirable when request latency is not always driven by CPU utilization or when an endpoint is fast failing (which is not solved by least request unless you account for errors)? 

In a steady state wouldn't weighting still be based on the server reported load and when you have outliers in-flight request would bias the weighting? This might be better explored with a custom LB policy though. 

Does it make sense to update gRFC with the design goals of the WRR policy and what behavior one could expect or is that captured elsewhere?  


Mark D. Roth

unread,
Feb 14, 2023, 2:38:35 PM2/14/23
to Tommy Ulfsparre, grpc.io, Yousuk Seung
On Tue, Feb 14, 2023 at 12:45 AM Tommy Ulfsparre <to...@ulfsparre.se> wrote:
> So that goes back to what I was saying earlier: I think this would result in incorrect weights, because each client will see only a fraction of the in-flight requests.  This would no longer be weighting the backends based on CPU utilization normalized by traffic.

Right, but as you said that is a property that is not always desirable when request latency is not always driven by CPU utilization or when an endpoint is fast failing (which is not solved by least request unless you account for errors)? 

Yes, I agree that if your request latency is not dictated solely by CPU utilization, then this WRR design does not meet all of your needs.  But I think that's a different use-case than the one we're trying to address here.

In general, I think that it's not desirable for a single LB policy to use multiple criteria for load balancing, because it's very hard to automatically find the right balance between multiple criteria that works best for every workload.  Ultimately, I think that every unique workload has a slightly different ideal load balancing policy, so there are always trade-offs: when you make things better for one workload, you make things worse for another workload.  For example, let's say that we have a workload where there are many different types of RPCs, some of which will always inherently take longer to process but are not CPU bound, while others are shorter but more expensive in terms of CPU.  In that workload, if we took request latency into account, the load balancing would actually get worse, not better: whenever an endpoint happens to have processed a number of the long-but-not-CPU-bound RPCs, we would send it less traffic and thus waste its CPU, while at the same time there may be other endpoints whose CPUs are more heavily used but are getting more traffic.

In addition, even if we could craft an algorithm that magically did the optimal thing for every possible workload, I think the complexity of such a policy would make the code incredibly hard to understand and reason about.  It would also almost certainly add a significant amount of overhead to every RPC, which would reduce performance.

I think it's generally better to build simple, understandable policies for common cases.  Most users will generally be fine with choosing one of those common policies, and those who really want to fully optimize for their individual workload can write custom policies.  I don't think we can really do that for them.
 

In a steady state wouldn't weighting still be based on the server reported load and when you have outliers in-flight request would bias the weighting? This might be better explored with a custom LB policy though. 

If your concern is outlier detection, I think that's better served via the mechanism described in gRFC A50.  Note that you can use that mechanism in conjunction with the WRR LB policy.
 

Does it make sense to update gRFC with the design goals of the WRR policy and what behavior one could expect or is that captured elsewhere?  

Yes, I agree, we should explicitly state that we're really only trying to address the goal of balancing CPU utilization across endpoints, not taking into account other sources of request latency.

Yousuk, can you please add a comment about this in the gRFC?  Thanks!
 

Tommy Ulfsparre

unread,
Feb 21, 2023, 8:17:43 AM2/21/23
to grpc.io

> In general, I think that it's not desirable for a single LB policy to use multiple criteria for load balancing, because it's very hard to automatically find the right balance between multiple criteria that works best for every workload.  Ultimately, I think that every unique workload has a slightly different ideal load balancing policy, so there are always trade-offs: when you make things better for one workload, you make things worse for another workload.  For example, let's say that we have a workload where there are many different types of RPCs, some of which will always inherently take longer to process but are not CPU bound, while others are shorter but more expensive in terms of CPU.  In that workload, if we took request latency into account, the load balancing would actually get worse, not better: whenever an endpoint happens to have processed a number of the long-but-not-CPU-bound RPCs, we would send it less traffic and thus waste its CPU, while at the same time there may be other endpoints whose CPUs are more heavily used but are getting more traffic.

> In addition, even if we could craft an algorithm that magically did the optimal thing for every possible workload, I think the complexity of such a policy would make the code incredibly hard to understand and reason about.  It would also almost certainly add a significant amount of overhead to every RPC, which would reduce performance.

> I think it's generally better to build simple, understandable policies for common cases.  Most users will generally be fine with choosing one of those common policies, and those who really want to fully optimize for their individual workload can write custom policies.  I don't think we can really do that for them.

I agree with that. 

My initial questing was if this wasn't the common case. I see it more as implementing Weighted Least Request just like Envoy does. Meaning the weight function in this case would become: (rps/cpu utilization) / (active_requests + 1)^active_request_bias. Setting the active_request_bias to 0.0 would make it behave as Weighted Round Robin as currently. Not simple by any means and as you said might be better off as a custom policy.

Thanks! 
 

Mark D. Roth

unread,
Feb 21, 2023, 6:46:37 PM2/21/23
to Tommy Ulfsparre, grpc.io
I don't think there are any remaining open questions here, but since you mentioned Envoy's least-request policy, I wanted to provide just a little more information.

First, note that there is a community-contributed design for Envoy's least-request LB policy in gRFC A48.  I believe this policy has actually been implemented in grpc-java, and if you're interested in this policy for another language, you're welcome to contribute the code.

It's worth calling out that the main difference between the WRR policy that we're designing here and the WRR policy that Envoy currently supports is that in Envoy's WRR policy, the endpoint weights are dictated by the control plane, whereas in the policy discussed in this design, the endpoint weights are calculated in the client based on the backend metric data returned by each endpoint.  That's why in xDS, this new policy is referred to as ClientSideWeightedRoundRobin, to differentiate it from Envoy's existing WRR.

The reason I mention that difference is that Envoy's least-request policy, like Envoy's existing WRR policy, gets the endpoint weights from the control plane.  I think if it were desirable to have a least-request policy that computes the endpoint weights from backend metric data, that would have to be another variant of the least-request policy, just like ClientSideWeightedRoundRobin is different from Envoy's existing WRR policy.

I'll also note that we did have a community contributor do some work toward a design for implementing an Envoy-style WRR policy in gRPC in https://github.com/grpc/proposal/pull/202, although that effort seems to have stalled.  If that's something you're interested in, please feel free to try to pick that up and drive it forward.

I hope this info is helpful.

Tommy Ulfsparre

unread,
Feb 22, 2023, 3:58:38 AM2/22/23
to grpc.io
> I think if it were desirable to have a least-request policy that computes the endpoint weights from backend metric data, that would have to be another variant of the least-request policy, just like ClientSideWeightedRoundRobin is different from Envoy's existing WRR policy.

SGTM! Thanks for the info! 

Reply all
Reply to author
Forward
0 new messages