Hello,
I would like to contribute functionality similar to what a shuffle shard load balancer can provide.
To learn more about how shuffle sharding works,
here is a good blog post, and AWS'
reference implementation. TLDR: Shuffle sharding is an inexpensive way to isolate clients combinatorially.
I believe this could be accomplished either by implementing a shuffle shard LB, or by adding "
replication" to the maglev/ring-hash LBs. The rest of this message will dive into shuffle-shard implementation details, but I would be happy to change directions to maglev replication.
So far, I've gotten a basic implementation of shuffle sharding in envoy, but I've made some observations that may guide me to a more performant implementation. Questions follow!
I understand that Subset LoadBalancers have an LB per subset, and use a trie to reduce lookup overhead. I started by implementing a shuffle shard LB as a single LB, with calls to "update(..)" on every "chooseHost(..)". There is additional math inside each chooseHost (passed to update(..)) that calculates which hosts should be on or off in the LB. Given how much effort was put into making the subset LB performant, I'm afraid this initial implementation won't meet envoy's standards.
1. Gut instinct is that the combinatorial math will result in too many LBs if there were an LB per shuffle shard. Back of the napkin math for a full set of 30 endpoints, with each shuffle-shard composed of 15 endpoints, results in 155,117,520 possible shards. Thoughts?
2. One weakness with having multiple LBs per subset/shard is that the inner LB type loses context. For example, a round-robin LB will only round-robin within a subset (causing the potential for the same endpoint to get hit N times in a row if there are N subsets). The more load balancers there are (see point 1), the less context the inner LB has.
4. I've implemented a cache to hold the top N shuffle shards and hope this provides a good balance between speed and memory. Is a cache necessary? Are there better ways to improve performance (like pushing a filter into the base LB, so that we only need 1 "subset", that is constantly filtered)?
5. Any other thoughts? Is this a feature that would be accepted into envoy?
I haven't thought about Maglev replication much but I believe it would require lifting it to the level of a Subset LB, so that another LB algorithm could be applied on the few hosts that Maglev chooses per hash.
Thanks,
Charlie