Shuffle Shard load balancer

530 views

Skip to first unread message

Charlie Getzen

unread,

Jan 6, 2021, 4:16:51 PM1/6/21

to envoy-dev

Hello,

I would like to contribute functionality similar to what a shuffle shard load balancer can provide.

To learn more about how shuffle sharding works, here is a good blog post, and AWS' reference implementation. TLDR: Shuffle sharding is an inexpensive way to isolate clients combinatorially.

I believe this could be accomplished either by implementing a shuffle shard LB, or by adding "replication" to the maglev/ring-hash LBs. The rest of this message will dive into shuffle-shard implementation details, but I would be happy to change directions to maglev replication.

So far, I've gotten a basic implementation of shuffle sharding in envoy, but I've made some observations that may guide me to a more performant implementation. Questions follow!

I understand that Subset LoadBalancers have an LB per subset, and use a trie to reduce lookup overhead. I started by implementing a shuffle shard LB as a single LB, with calls to "update(..)" on every "chooseHost(..)". There is additional math inside each chooseHost (passed to update(..)) that calculates which hosts should be on or off in the LB. Given how much effort was put into making the subset LB performant, I'm afraid this initial implementation won't meet envoy's standards.

1. Gut instinct is that the combinatorial math will result in too many LBs if there were an LB per shuffle shard. Back of the napkin math for a full set of 30 endpoints, with each shuffle-shard composed of 15 endpoints, results in 155,117,520 possible shards. Thoughts?

2. One weakness with having multiple LBs per subset/shard is that the inner LB type loses context. For example, a round-robin LB will only round-robin within a subset (causing the potential for the same endpoint to get hit N times in a row if there are N subsets). The more load balancers there are (see point 1), the less context the inner LB has.

4. I've implemented a cache to hold the top N shuffle shards and hope this provides a good balance between speed and memory. Is a cache necessary? Are there better ways to improve performance (like pushing a filter into the base LB, so that we only need 1 "subset", that is constantly filtered)?

5. Any other thoughts? Is this a feature that would be accepted into envoy?

I haven't thought about Maglev replication much but I believe it would require lifting it to the level of a Subset LB, so that another LB algorithm could be applied on the few hosts that Maglev chooses per hash.

Thanks,

Charlie

Matt Klein

unread,

Jan 7, 2021, 6:36:26 PM1/7/21

to Charlie Getzen, envoy-dev

Hi,

Do you mind opening an issue and we can discuss/iterate there?

I think I would recommend stepping back and focusing on the overall problem that we are trying to solve. How will shards be configured? How will shards be chosen?

My initial thinking would be to actually implement a new LB that is effectively a 2-level consistent hash. Level 1 picks the shard, level 2 picks the host in the shard. The level 2 picking could be whatever LB type is desired. It's possible that this could be implemented as part of the subset LB, but it might be enough different that it should be its own LB.

In terms of the number of shards, unless I'm misunderstanding, I don't think your math is correct. Typically with shuffle sharding the number of shards is not huge, and then each shard has some number of hosts. So, yes, hosts are duplicated between shards, but that is relatively cheap, just a shared_ptr. So it seems like a 2 layer approach with a mechanism to create shards, a mechanism to hash to shards, and then a 2nd tier LB in each shard would work out nicely?

--
You received this message because you are subscribed to the Google Groups "envoy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-dev/47de9330-c00f-4d1f-ac0d-f32f447ff9d7n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages