Hi Tony,
Thank you very much for the detailed reply. I will start by looking at the synthetic tests and see how the algorithms perform there.
Indeed the setting that we study assumes that we have homogeneous machines (all weights equal). For cases when looping through all servers is not an option, the (1+β)-process is a relavtively simple modification to Two-Choice that (i) avoids herding (although less profound than for Round-Robin), (ii) avoids draining (as it is called in the blog post you linked) and (iii) does not require looping through the entire array (similarly to Two-Choice).
But on a more high level, the insight is that you can avoid herding by mixing a load balancing strategy with the random one. For instance, for the weighted least requests, you could take
p_i = β * w_i/Σ_j w_j + (1-β) * 1/n
which is a slightly different way to make allocations less aggressive than the one you currently use.
> I'm not a researcher, so pardon any misunderstanding.
I have mostly looked the problem from a theoretical point of view, so pardon my misconceptions.*
> idle hosts being added to a very busy set of hosts
I see. Does this occur often?
Also, how is the load balancing weight determined? what is the typical number of servers that you deal with?
Many thanks,
Dimitris
*We are interested in developing theory that more accurately models real-world load balancing (and we would greatly appreciate the input of this community), but maybe it is a topic for a different thread(?)