RPC timeout & retry

Arnout Kazemier

unread,

Nov 7, 2014, 7:09:26 AM11/7/14

to raft...@googlegroups.com

Hello all,

I'm in the progress of writing another raft implementation for JavaScript so it can be used in the browser (using SharedWorkers for example) and on Node.js. While I already made great progress with the implementation there are some parts about the white paper that some questions. In paragraph 5.1 Raft basics it states the following:

Servers retry RPCs if they do not receive a response in a timely manner, and they issue RPCs in parallel for best performance.

This is only part of the paper that states RPC's should be retried when they take longer then a specified timeout. But it doesn't state how what an acceptable and sane default value is for these timeouts. In addition it does not state what kind if a retry algorithm should be used or what do when we get no response other than retying again. So i'm wondering what the general consensus is about this part of the paper. (ooh see what I did there ;-))

- Arnout

Hugues Evrard

unread,

Nov 7, 2014, 8:46:36 AM11/7/14

to raft...@googlegroups.com

On 11/07/14 13:09, Arnout Kazemier wrote:
>
> Servers retry RPCs if they do not receive a response in a timely
> manner, and they issue RPCs in parallel for best performance.
>
>
> This is only part of the paper that states RPC's should be retried when
> they take longer then a specified timeout. But it doesn't state how what
> an acceptable and sane default value is for these timeouts. In addition
> it does not state what kind if a retry algorithm should be used or what
> do when we get no response other than retying again.

In Raft, RPCs are "request vote" and "append entry"; timeouts are the
election and the heartbeat ones.
In the context of Raft, RPC is useful to know which responses comes from
which requests. If you don't know this, then you can have scenarios like :

- leader sends heartbeat (i.e. empty append entry request) to everyone
- everyone gets it, send response which is buffered by the network
- a client send a new entry to the leader
- leader sends an append entry request to everyone
- now leader receive the answer for the heartbeats : it must not
consider them as answer to the append entry.

Raft was designed to be usable on a transport layer that may loose or
duplicate messages, and which does not guarantee message ordering
between two processes. Therefore you can implement Raft on top of UDP
for instance.

In order to differentiate the messages, there are several approaches
(non-exhaustive list) :

- use an RPC mechanism that gives you the correspondence

- use message IDs in requests that are also sent back in responses

- use the messages as described in the TLA+ specification of Raft,
available in Diego's thesis. Answers to append entry contains more than
just the replyer term and a boolean "success", and they allow to know
the status of the replyer when he sent the reply.

- you can think on relying on TCP to order messages between peers, and
maintain counter of requests / answers; however it's hard to handle
crashes with this approach.

--
Hugues Evrard
INRIA / LIG - Team CONVECS

Diego Ongaro

unread,

Nov 13, 2014, 8:44:30 PM11/13/14

to Arnout Kazemier, raft...@googlegroups.com, Hugues Evrard

On Fri, Nov 7, 2014 at 4:09 AM, Arnout Kazemier <in...@3rd-eden.com> wrote:

Hello all,

I'm in the progress of writing another raft implementation for JavaScript so it can be used in the browser (using SharedWorkers for example) and on Node.js. While I already made great progress with the implementation there are some parts about the white paper that some questions.

Cool. I added liferaft to the list of implementations on the Raft website.

In paragraph 5.1 Raft basics it states the following:

Servers retry RPCs if they do not receive a response in a timely manner, and they issue RPCs in parallel for best performance.

This is only part of the paper that states RPC's should be retried when they take longer then a specified timeout. But it doesn't state how what an acceptable and sane default value is for these timeouts. In addition it does not state what kind if a retry algorithm should be used or what do when we get no response other than retying again. So i'm wondering what the general consensus is about this part of the paper. (ooh see what I did there ;-))

It really depends on the implementation and the deployment. Let's take each RPC individually:

For RequestVote, a timeout that's longer than the election timeout won't ever make sense, since by that point this server or some other server would have moved on to another term. If you're in a network with little packet loss, you might get away with never retrying RequestVote requests (then in the event of packet loss, you might have to wait for a new election).

For AppendEntries, the fear is that a dropped heartbeat would cause a follower to start an election. So you probably want an RPC timeout that's low enough to give you time to retry that heartbeat. Unless your network/transport doesn't drop packets very often, in which case you might be OK with an unnecessary leader change when a heartbeat gets lost.

At least that's the way I think about it.

And in any case, keep retrying periodically until the term/server state changes. Maybe with a little backoff if you want to get fancy, but don't do so much backoff that it goes above the election timeout.

-Diego

Vasileios Anagnostopoulos

unread,

Feb 9, 2015, 9:17:54 AM2/9/15

to raft...@googlegroups.com, hugues...@inria.fr

Do you mean that RPCs model eventual delivery? (e.g. TCP with retries?)

UDP has no eventual delivery. I think this is not even possible with e.g. 10 messages sent. So, how can RPC be able to eventually deliver even with retries?

I think only TCP is usable here.

Hugues Evrard

unread,

Feb 9, 2015, 12:12:38 PM2/9/15

to Vasileios Anagnostopoulos, raft...@googlegroups.com

Hi Vasileios,

On 02/09/15 15:17, Vasileios Anagnostopoulos wrote:
> Do you mean that RPCs model eventual delivery? (e.g. TCP with retries?)
>
> UDP has no eventual delivery. I think this is not even possible with
> e.g. 10 messages sent. So, how can RPC be able to eventually deliver
> even with retries?
>
> I think only TCP is usable here.

My previous mail is not very clear indeed, sorry for the confusion.

I just wanted to point out is that Raft is designed to work on a network
that may drop or re-order messages. I am refering to the TLA version of
Raft [0], which slightly differs from the Raft paper (RPC messages have
a few more fields).

[0] https://github.com/ongardie/raft.tla

Best,
Hugues

Diego Ongaro

unread,

Feb 13, 2015, 1:30:53 PM2/13/15

to Vasileios Anagnostopoulos, Hugues Evrard, raft...@googlegroups.com

Just to clarify a little more, Raft doesn't rely on TCP, though many
implementations do use it.

The way I think about it, TCP provides four things:

1. Ordering/duplicate elimination

Raft doesn't rely on the ordering guarantees or duplicate elimination
at the transport layer; its RPCs contain enough information to be
processed out of order safely. Also, most Raft implementations would
need to deal with TCP connections failing by connecting again, where
at least the duplicate elimination guarantees are lost.

2. Retransmission

TCP includes acknowledgements and the sender will retransmit lost
packets (up to the point that the connection is considered failed).
Again, most Raft implementations need to deal with TCP connections
failing by trying again, so the code to do retries is already going to
be there at the application level. And Raft's RPC replies serve as
acknowledgements already.

3. Large message assembly, flow control, and congestion control

If you're sending large amounts of data, it's a pain to have to
reimpement this stuff at the application layer. This is a big win for
using TCP sockets, but it's really just a convenience; there's nothing
fundamental about the network stack doing this instead of your
application.

4. Deployment advantages

Finally, from what I hear, UDP and other protocols aren't always
supported very well, especially in clouds and wide area networks. TCP
is probably just easier to deploy and operate.

So the least fundamental of these from a theoretical point of view,
(3) and (4), are the best reasons to use TCP, in my opinion.

-Diego

Reply all

Reply to author

Forward