Konstantin Osipov
unread,Jul 28, 2021, 3:03:49 PM7/28/21Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message as abuse
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raft...@googlegroups.com
Hello,
We've been experimenting with making Raft elections live with 10k+
servers.
We found that if election timeout min and max is fixed, the election
space becomes too crowded, no matter what timeout distribution is
used, or whether or not exponential back-off is used to select the
next election timeout. It just takes the cluster too long to reach
the acceptable timeout range to make sure there is only one or two
candidates at a given tick.
We, however, found a couple of measures which worked well.
1) Pick a random value for election timeout from range
election_timeout_min ... election_timeout_min + size(current
configuration), not from a constant range.
2) Push a specific server's election timeout a couple of ticks
into the future upon receiving a "RequestVote" RPC (no
prevote).
Has anyone been fixing Raft to work with a large number of nodes?
Direct-ping-based failure detection was also a hindrance, but
I wrote about it in the previous (apparently uninteresting) post
to this group.
--
Konstantin Osipov, Moscow, Russia