RAFT implementaion with UDP or TCP

1,146 views
Skip to first unread message

Orion Naga

unread,
Nov 25, 2016, 3:48:37 AM11/25/16
to raft-dev
Hi,
I tried Raft with UDP and found that: Under high load (many requests/responses concurrently), Heartbeat message can be lost unpredictably, causing the whole cluster flip-flop to choose another leader.  Even with Prevote modification (from here http://openlife.cc/blogs/2015/september/4-modifications-raft-consensus), the thing is that UDP packets tend to be lost more frequently when network load is higher, so node stucks at prevote state when not receive any heartbeat.
Does anybody here experience the similar situation? Do you think Raft implementation over TCP is better, in term of sending heartbeat)? AppendEntries with data still be carried by TCP? What do you think about this setting?

I read from this related topic, RPC over TCP seems to be more preferrable:
https://groups.google.com/forum/#!searchin/raft-dev/udp|sort:relevance/raft-dev/kn5vRAtmoSc/FuEn7WuAkCUJ
Thank.
Orion.

Oren Eini (Ayende Rahien)

unread,
Nov 25, 2016, 4:20:58 AM11/25/16
to raft...@googlegroups.com
The problem is the same in TCP / UDP, except that TCP will re-transmit the packet.
You can set the heartbeat time to a lower value, so in the timeout windows, you'll have multiple packets sent, and higher chance of arriving.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Archie Cobbs

unread,
Nov 25, 2016, 10:59:22 AM11/25/16
to raft-dev
IMHO an important benefit of TCP is flow control and therefore the ability to measure back pressure. E.g. if you have to InstallSnapshot, and you just blast away 50MB of data via UDP packets, one of those packets is probably going to be dropped, especially as the recipient is having to write the data to disk, whereas the sender is only having to read it from disk.

More generally, any time the recipient is slowed, TCP will alert you to this. You simply watch the outgoing queue size for that peer. This prevents dropped data. Of course dropped data with TCP is impossible anyway, so even if you do nothing you'll get the same effect in the form of a blocked writer thread. So TCP forces you to deal with the issue one way or another :)

In short, it's wasteful to send data knowing the recipient can't handle it. TCP makes doing this impossible, whereas UDP makes doing it almost unavoidable - unless you implement your own flow control mechanism... in which case, why not just use TCP in the first place??

-Archie

Orion Naga

unread,
Nov 25, 2016, 6:41:00 PM11/25/16
to raft-dev
Yes, TCP can handle ACK situation, however that degrades throughput, because its overhead is bigger than UDP one, doesn't it?
Certainly InstallSnapshot should be implemented in TCP. I just open a discussion on whether we can implement Heartbeat(AppendEntries that carries no information) in UDP to reduce the total network overhead? 

Archie Cobbs

unread,
Nov 26, 2016, 12:23:46 PM11/26/16
to raft-dev
On Friday, November 25, 2016 at 5:41:00 PM UTC-6, Orion Naga wrote:
Yes, TCP can handle ACK situation, however that degrades throughput, because its overhead is bigger than UDP one, doesn't it?
Certainly InstallSnapshot should be implemented in TCP. I just open a discussion on whether we can implement Heartbeat(AppendEntries that carries no information) in UDP to reduce the total network overhead? 

Yes technically speaking overhead is bigger... but by an amount that, in practice, is negligible when compared to all the other ways in which overall behavior is affected by switching between UDP and TCP.

To take a small example, if TCP's retransmit time is smaller than heartbeat interval (likely on a fast network), you will have fewer missed heartbeats - and therefore possible leadership changes - because TCP will automatically retransmit them for you.

And there are probably a lot more similar subtleties like this. For another example, TCP won't re-order data. Etc etc.

-Archie

Orion Naga

unread,
Nov 27, 2016, 10:12:16 PM11/27/16
to raft-dev
Then I guess changing the broadcasting stuff to TCP should be a good move. Thank you Archie.

Philip Haynes

unread,
Dec 5, 2016, 5:11:26 PM12/5/16
to raft-dev
Hi Orion,

We use UDP / Multicast but we do that with the Aeron library to provide reliable transport semantics.

I would agree with Archie that flow control and back pressure are key.

If you don't have extreme performance requirements, I would switch to TCP to be vanilla otherwise
we have found UDP / Multicast Aeron to be an option.

Philip

Reply all
Reply to author
Forward
0 new messages