FAB design paper

Bhavin Thaker

unread,

Jun 9, 2015, 2:12:25 PM6/9/15

to raft...@googlegroups.com

Hi All,

Attached is a design paper on a simplistic consensus protocol, called, FAB (Fast Atomic Broadcast), that I experimented with a few months back. I have borrowed concepts from Raft and Paxos and made design changes to build a simple protocol with less corner cases and have a more reliable and easily understandable implementation of FAB. If you see any holes in the design, please let me know.

This is work done by me at Symantec and external publication of this paper has been cleared by the Symantec Legal team. I have a working implementation of FAB but cannot share the source code currently. Any input you may have on the design of FAB is most appreciated.

Thanks,

Bhavin Thaker.

FAB_Design_Paper_v12_23March2015.pdf

Oren Eini (Ayende Rahien)

unread,

Jun 9, 2015, 7:05:56 PM6/9/15

to Bhavin Thaker, raft...@googlegroups.com

Some notes.

This seems to assume very low level of messages. In particular, the notion that you must send a message to the entire cluster before you can process the next message is a very limiting factor.

A similar issue applies with regards to the timeout assumed in the paper (yes, just config, but implies intent).

Note that a due to the "wait until timeout or next message", you can have a message go out to the system, be committed on the leader ,but not committed on anyone else for a pretty long while.

That is the same for raft, but much shorter timeframes usually apply.

Dynamic adding to the log and GC are not compatible.

Assume that I have nodes S1, S2, S3, and I send 7 messages to them.

Because all were successfully delivered, I now GC them from all servers.

Now I add S4 to the cluster. But S4 can never actually work. It is missing the first 7 messages, so it can never actually work.

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bhavin Thaker

unread,

Jun 10, 2015, 7:54:55 PM6/10/15

to Oren Eini (Ayende Rahien), raft...@googlegroups.com

Hi Oren,

Thanks for your comments.

Yes, the use-case than FAB tries to address is a specific one which makes the consensus protocol simpler. Raft does address a more generic use-case. FAB works for the use-case where we want to have configuration updates be sent to a cluster of nodes and the configuration updates are dependent on each other.

For the comment about dynamic addition and Garbage Collection, the problem mentioned by you is solved by having the software layer using FAB take a snapshot of the configuration at the leader and provide that configuration when a node joins the cluster. I have not described those details in the paper, but it does actually work.

Thanks again for reading the paper so quickly and providing your insightful comments.

Regards,

Bhavin Thaker.

Diego Ongaro

unread,

Sep 26, 2015, 9:38:26 PM9/26/15

to Bhavin Thaker, raft...@googlegroups.com

Hey Bhavin,

I finally read through your paper (yeah, I know, it took me a while). I have a lot of comments that I think would help you improve the paper, a few questions, especially about membership changes, and a few items that would probably lead to interesting discussions. I'll say a few things here, but let's try to chat in real time.

It's clear that you put a lot of time and effort into polishing the paper, and I appreciate that. I think there may be interesting design points for "sequential Raft" (like FAB) and "register Raft" (full current state overwritten in every round). However, I'm still not sure that this paper in its current form meets its goals: is it significantly simpler than Raft, and can it be understood independently of Raft? I also wonder about some of the design choices (for example, the overloading of RequestVote).

Is it significantly simpler than Raft? I think FAB has to be a lot simpler than Raft to make it worthwhile. For example, if FAB has 90% of the complexity of Raft, but Raft has a much larger ecosystem (talks, papers, proofs, implementations, developers), I think we'd both agree that it's not worth it. On the other hand, if FAB has 50% of the complexity of Raft, then I think we'd both agree that FAB is a good choice for some systems. Actually, I'm really curious to hear the number in your gut right now. Mine is about 85% right now, but that may drop after getting some clarification from you (the unresolved ambiguities in my head increase perceived complexity).

Can it be understood independently of Raft? I'm actually not sure whether this was a goal of yours or not, but I think the paper is probably neither here nor there right now. If it IS a goal, I'm too tainted to say with confidence, but I think it's missing some important explanations (e.g., why do we want a log? What do we do with it? Why do we care about majorities? What about snapshots? Client interaction?). The way to work though these is to find fresh people and make them read it (we did that a ton with the Raft paper). If it's NOT a goal, then you might try to explain (some) things in terms of Raft more, building down from Raft rather than building up from scratch. Either way, this paper is crying out for a section that compares explicitly with Raft, assuming familiarity of Raft. What's different, how does that help understandability, how does correctness still hold, and what does FAB give up in terms of features/performance?

BTW, any progress on releasing the source code?

I'll send a follow-up privately to arrange a chat.

Best,

Diego

Bhavin Thaker

unread,

Sep 28, 2015, 10:17:28 PM9/28/15

to Diego Ongaro, raft...@googlegroups.com

Hi Diego,

I sincerely appreciate you spending time to read the FAB paper and sharing your comments.

FAB started out because I could borrow concepts from Raft and Paxos and simplify these protocols for my use-case of replicating configuration updates. This approach worked fine for the problem that I was working on and I found FAB simple enough to design and implement it in 2 months. The FAB paper was written in around 4 days with only formatting changes incorporated later. The design and implementation has survived internal testing so far but that does not necessarily prove its correctness. The use-case is narrow enough for its simplicity.

In terms of ease of understanding, I shared the FAB paper internally with many colleagues and they found it easy to understand. Many of them had never read the Paxos or Raft papers and still understood the design. I shared the FAB paper with a high-school student and he understood and appreciated the design. I have shared the FAB paper externally on researchgate.net for anybody who wants to use it. You can explore the stats though they could be interpreted in many ways.

http://www.researchgate.net/profile/Bhavin_Thaker/stats

I think Raft is a rich protocol that works in many scenarios and you have justifiably polished its design and implementation and also proved its correctness. I appreciate your work on it. FAB just targets a narrow use-case. Whether somebody find FAB useful or easy-to-understand is their viewpoint. As an example, with due respect to Mr. Leslie Lamport, many of my colleagues found the paper, "Paxos made simple", hard to understand, probably because of the mathematically-oriented presentation style of the paper, though Mr. Lamport may disagree with my colleagues.

http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf

Currently, I think that it may be difficult to open-source the FAB implementation since FAB is part of a closed-source system and uses its communication mechanisms. However, since I could design and implement FAB in 2 months, I am confident that somebody else can do it quicker/better. I will let you know if I can make progress on this aspect.

I would love to chat with you and learn more about your other comments.

Regards,

Bhavin Thaker.

Юрий Соколов

unread,

Sep 29, 2015, 3:33:35 PM9/29/15

to raft-dev

Did you read ZAB (Zookeeper Atomic Broadcast)?

Bhavin Thaker

unread,

Sep 30, 2015, 1:44:10 AM9/30/15

to raft...@googlegroups.com

See reference #6 in the FAB paper.

Bhavin Thaker.

On Sep 29, 2015 12:33 PM, "Юрий Соколов" <funny....@gmail.com> wrote:

Did you read ZAB (Zookeeper Atomic Broadcast)?

Reply all

Reply to author

Forward