Sorry about my slow reply; I know some of you have been waiting for
the BGSFL to chime in.
1. Pipelining
Thanks Kelly for setting the record straight that Raft very much
allows pipelining of AppendEntries. As long as followers append to
their logs in log order (which is guaranteed by the AppendEntries
consistency check), this is safe. I've got a local branch of LogCabin
that does pipelining, and one day maybe I'll get to write code again
and finish it. The changes were small, but squeezing all the potential
performance out will take some time.
2. Head-of-line blocking leading to unwanted elections
The concern is that AppendEntries might have to push out such a big
log entry that it might not get done before the election timeout. The
follower will time out, the cluster will lose its leader, and worse
yet, this can repeat forever.
An upper limit on log entry size would help. For example, LogCabin
enforces a 1MB maximum size, and that much should transfer fast enough
on most datacenter networks to avoid timeouts. But let's suppose you
don't want an upper limit.
To address this problem in general, leaders need to be able to get
messages across independently of the size of log entries. Either an
AppendEntries that was forced to contain no entries or a separate
heartbeat message type would work equally well.
3. Separating heartbeats into their own RPC
I think separating heartbeats from AppendEntries into their own RPC is
an aesthetic question. On the one hand, heartbeats are a logically
different function from replicating entries. On the other hand,
heartbeats and AppendEntries share most of the same fields and
processing code.
I think it might just depend on the implementation. In LogCabin, I
think it'd just result in a lot of code duplication to separate
heartbeats out. But in a more dynamically typed language dealing with
JSON, for example, it might be easier to factor out the code common to
both RPCs and might end up being clearer that way. I'll have to look
over hashicorp/raft sometime to see how it works there.
Just my two cents,
Diego