Benchmark for evaluating new load balancing algorithm on Envoy?

Dimitris L

unread,

Jun 23, 2023, 10:39:53 PM6/23/23

to envoy-users

Hi,

TLDR: We would like to benchmark a simple variant of the Two-Choice load balancing algorithm on EnvoyProxy. Is there a recommended benchmark or a typical workload that you use?

We would like to evaluate the following simple variant of the Two-Choice algorithm for load balancing: For each request,
1. Allocate to the least loaded of two randomly chosen servers (i.e., like Two-Choice) with probability β.
2. Otherwise, allocate to a random server.

Somewhat surprisingly we recently demonstrated in theory and simulations (paper, slides, visualisation) that for suitably chosen β, this outperforms Two-Choice when there is outdated information (RTT is large, there are multiple allocators) or there is noise. The main reason for this is that it allocates to lesser loaded servers less aggressively than Two-Choice, so that it spreads the load more evenly (see this figure).

We would like to see whether this result also implies improvements in real-world load balancers, including EnvoyProxy. For the evaluation purposes, I have drafted a very simple implementation (see here) and was wondering whether you could give me some pointers for setting up and running a benchmark.

Kind regards,

Dimitris

P.S. I have reached out a few other open source load balancers.

Tony Allen

unread,

Jun 28, 2023, 6:27:44 PM6/28/23

to Dimitris L, envoy-users

Hi Dimitris,

Last time I looked into it, there wasn't a "standard benchmark" for the load balancer outside of some synthetic tests in the load balancer's unit tests. There do exist some actively developed load/performance testing tools like Nighthawk and some scripts used in blog posts, but it seems like what you're ideally looking for is some "real-world traffic" to run through modified Envoys. I'm not aware of anything that fits the bill.

That being said, if I understood your abstract correctly, you're approximating outdated information and long TTLs by using "batch sizes" of requests that go through the load balancer? If that's the case, there's a wrench in the gears when evaluating of the behavior of your algorithm with Envoy's usage of the P2C/2choice in its load balancer. An individual Envoy will derive the "weight" of a particular endpoint using the number of outstanding requests sent to those endpoints-- the idea is that more active requests means a more loaded endpoint (details here).

I don't see a way to get the conditions in which your new algorithm will shine since the endpoint weights are updated every time an endpoint is sampled from the host set. This may also be a problem for other load balancers, since it's common to perform this "weight scaling" with the number of open connections or even request latencies (least connections and least time in NGINX terms).

This isn't to say we don't try to combat herding behaviors in our 2choice/P2C implementation-- we reduce the aggression of the weight bias via a term in the formula we use to derive endpoint weights for our P2C implementation. It's just that those herding behaviors are a result of things like idle hosts being added to a very busy set of hosts, not the conditions you mention.

I'm not a researcher, so pardon any misunderstanding. Let me know if I can clarify anything.

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/5c42e882-40d5-4529-8323-2d15a7410be6n%40googlegroups.com.

Dimitris L

unread,

Jun 29, 2023, 10:12:47 PM6/29/23

to envoy-users

Hi Tony,

Thank you very much for the detailed reply. I will start by looking at the synthetic tests and see how the algorithms perform there.

Indeed the setting that we study assumes that we have homogeneous machines (all weights equal). For cases when looping through all servers is not an option, the (1+β)-process is a relavtively simple modification to Two-Choice that (i) avoids herding (although less profound than for Round-Robin), (ii) avoids draining (as it is called in the blog post you linked) and (iii) does not require looping through the entire array (similarly to Two-Choice).

But on a more high level, the insight is that you can avoid herding by mixing a load balancing strategy with the random one. For instance, for the weighted least requests, you could take

p_i = β * w_i/Σ_j w_j + (1-β) * 1/n

which is a slightly different way to make allocations less aggressive than the one you currently use.

> I'm not a researcher, so pardon any misunderstanding.

I have mostly looked the problem from a theoretical point of view, so pardon my misconceptions.*

> idle hosts being added to a very busy set of hosts

I see. Does this occur often?

Also, how is the load balancing weight determined? what is the typical number of servers that you deal with?

Many thanks,

Dimitris

*We are interested in developing theory that more accurately models real-world load balancing (and we would greatly appreciate the input of this community), but maybe it is a topic for a different thread(?)

Tony Allen

unread,

Jul 5, 2023, 4:40:42 PM7/5/23

to Dimitris L, envoy-users

Load balancing weights are set in the Envoy configuration when you define an endpoint. These weights are used by the weighted round-robin and weighted least-request load balancers.

In my experience, I've seen herding behavior caused by idle nodes being added to LB sets quite often. For example, auto-scaling groups will expand the number of nodes when load reaches a certain level, so the new nodes would be idle. This is common enough that in addition to the aggression coefficient mentioned in the previous email, we also have slow start mode.

The environments I've personally worked in have been all over the place with regards to how they've implemented load balancing. It's not always a straightforward set of nodes with weights assigned. For example, there is the subset load balancer which creates hierarchical subsets from metadata attached to each node, each with their own weight and potentially a heterogeneous mix of LB algorithms. Similarly, there's also the locality weighted LB and zone-aware routing, which can certainly complicate things. These are fairly common in scenarios where you want to balance traffic across different availability zones to build more resilient systems.

To top it all off, we also have the ability to create priority levels and retry predicates, which can all be used simultaneously! These all see use in various environments, but I wouldn't be able to tell you how common they are.

I can zoom in on anything that doesn't make sense, just let me know if I can clarify anything.

-Tony

To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/1083eea8-8c5e-4793-b2dc-03504672b862n%40googlegroups.com.

Dimitris L

unread,

Jul 11, 2023, 3:01:03 PM7/11/23

to Tony Allen, envoy-users

Hi Tony,

Thank you again for the detailed reply! I will be working on this over the summer, and will definitely let you know if I have any questions,