Yes, I agree that if your request latency is not dictated solely by CPU utilization, then this WRR design does not meet all of your needs. But I think that's a different use-case than the one we're trying to address here.
In general, I think that it's not desirable for a single LB policy to use multiple criteria for load balancing, because it's very hard to automatically find the right balance between multiple criteria that works best for every workload. Ultimately, I think that every unique workload has a slightly different ideal load balancing policy, so there are always trade-offs: when you make things better for one workload, you make things worse for another workload. For example, let's say that we have a workload where there are many different types of RPCs, some of which will always inherently take longer to process but are not CPU bound, while others are shorter but more expensive in terms of CPU. In that workload, if we took request latency into account, the load balancing would actually get worse, not better: whenever an endpoint happens to have processed a number of the long-but-not-CPU-bound RPCs, we would send it less traffic and thus waste its CPU, while at the same time there may be other endpoints whose CPUs are more heavily used but are getting more traffic.
In addition, even if we could craft an algorithm that magically did the optimal thing for every possible workload, I think the complexity of such a policy would make the code incredibly hard to understand and reason about. It would also almost certainly add a significant amount of overhead to every RPC, which would reduce performance.
I think it's generally better to build simple, understandable policies for common cases. Most users will generally be fine with choosing one of those common policies, and those who really want to fully optimize for their individual workload can write custom policies. I don't think we can really do that for them.