raft "locations" concept proposal

57 views
Skip to first unread message

Samo Pogačnik

unread,
Nov 22, 2016, 3:42:57 PM11/22/16
to raft-dev
Hello,

While i was wondering about the possibility to achieve a channelized client data propagation towards a specific group of Raft cluster nodes, i've come up with the following. A specific group may consist of nodes behind any kind of higher latency networking infrastructure (i.e. WAN lines, , firewalls, routers, …), or just a group of nodes that need to be seen as a separate administrative domain, …

For the purpose of this proposal, the term “location” represents a group of cluster nodes (a Raft sub-cluster).

The concept behind this proposal is to extend the Raft Consensus Algorithm in such a way, that a Raft cluster may be aware of different locations its nodes belong to. Locations may be organised hierarchically, each forming a separate Raft sub-cluster. To achieve hierarchic relation between locations, a whole subordinate location presents itself to the superior location exclusively via its currently elected leader. This way a whole subordinated location counts as a single Raft node within the superior Raft location. Any node of the subordinated location may become a Raft sub-cluster leader and subsequently also a follower within a superior location. Potentially a subordinated leader might become a superior leader as well, in case all superior location nodes died.

Presumably the location leader maintains two separate logs, one log for its superior location's follower role and the other for its subordinate location's leadership.

I am aware that exact protocol extension is yet to be defined and lots of potential issues yet to be tackled, as well as scenarios to be explored (i.e. may location roles of subordination switch dynamically in a specific situation, …).

regards, Samo

Oren Eini (Ayende Rahien)

unread,
Nov 23, 2016, 1:22:18 AM11/23/16
to raft...@googlegroups.com
What does that actually give you?
Assume you have three data centers (Asia, Europe, US), with 3 nodes in each.

So you have two tiers of the cluster.

Top: (Asia, Europe, US)

2nd (Asia-1, Asia-2, Asia-3, Europe-1, Europe-2, Europe-3, US-1,US-2,US-3)

Now, if the current leader is in Europe, we can commit by first committing to Europe, (E-1, E2) and then letting the US cluster know, so it commits there too (US-1, US-3).
You saved the cost of waiting for response from Asia. Which is great, and compared with 9 nodes cluster, you only have to have 4 nodes for commit, instead of 5.

However, there is significant complexity here. When do you commit locally and when globally? How do you handle truncating the log distributed when you have multiple logs?



Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samo Pogačnik

unread,
Nov 23, 2016, 1:48:18 PM11/23/16
to raft-dev

Hi,

Regarding what "locations" actually give you, is generally speaking flexibility.

There may be different node/cluster set-ups with different requirements upon data quality stored at each location. One could for example trade-off commit speed to achieve global commits and vice versa.

Speaking of your interesting scenario, I imagine it would be possible to achieve safe write into a distributed data store, successfully reaching only two top-tier nodes. It all depends upon how you take “location” local commits into account for the consensus majority to be achieved in the top-tier (primary) “location”. And the minimum requirement for the commit to survive is not to take any additional “location” into account at all. Of course data may reach such a “location” eventually as opposed to before client receives the commit acknowledge.

Going wild, there is a "window of opportunity" to set-up two (maybe three) “locations” switching its primary role based upon the current count of live nodes at each “location” or after switching clients from accessing one “location” to another.

I am not sure how to asses complexity of such “things”, which must be significant. However, imho “locations” do not break any fundamental concepts of Raft including its goal to be understandable.

regards, Samo



Dne sreda, 23. november 2016 07.22.18 UTC+1 je oseba Ayende Rahien napisala:
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.

Philip Haynes

unread,
Nov 23, 2016, 4:35:36 PM11/23/16
to raft-dev
Hi Samo,

Conceptually, I like the notion of a recursive RAFT structure. Engineering wise I would have concerns in that fundamentals like communications systems, timing and batching are all different once you cross the boundary for LAN to WAN that are significant enough to make redundant a recursive structure.

That stated there could be use cases where a DNS'esque transactions could make sense - do you have a view on what these might be?

Philip

Samo Pogačnik

unread,
Nov 23, 2016, 5:58:18 PM11/23/16
to raft-dev
Hi Philip,

Honestly, i can not provide any facts about the subject, but i think that in a heterogenous network environment it might be "easier" to set optimal protocol parameters and maintain data flows over LAN to WAN boundries in a more channelized manner (one to one).  

regards, Samo

Dne sreda, 23. november 2016 22.35.36 UTC+1 je oseba Philip Haynes napisala:

Samo Pogačnik

unread,
Nov 24, 2016, 10:35:51 AM11/24/16
to raft-dev
Hi,

It might be a good idea to rephrase the term "locations" into "recursive domains", to reflect a "recursive RAFT structure" (thanks Philip).

Samo Pogačnik

unread,
Dec 4, 2016, 3:11:27 PM12/4/16
to raft-dev
Hi,

I prepared some initial material about potential "Raft Recursive Domains" functionality (see: http://u2up.net/RaftRecursiveDomains-draft-20161204.pdf). Comments are most appreciated.

Thanks, Samo

Dne četrtek, 24. november 2016 16.35.51 UTC+1 je oseba Samo Pogačnik napisala:
Reply all
Reply to author
Forward
0 new messages