Checking RAFT Leader and influencing Election

174 views
Skip to first unread message

Abubakar Siddique

unread,
May 19, 2015, 6:11:35 AM5/19/15
to onos-d...@onosproject.org
Hi all,

Is it possible If I can check who is the current leader (who won the latest election) in the RAFT cluster, and how?

Also is it possible if the RAFT leader election can be influenced somehow, i.e. I want a specific machine running ONOS to be the leader.

Thanks
Abubakar

Madan Jampani

unread,
May 19, 2015, 2:52:22 PM5/19/15
to Abubakar Siddique, onos-d...@onosproject.org
From karaf console you can run "partitions" to view the various Raft partitions. The output shows the nodes in each partition and the leader for each partition and the current leadership term.
You'll find p0..pN partitions, where N is the number of nodes in the cluster.
p0 partition encompasses all nodes in the cluster and is backed by a in-memory log. p0 partition is used by applications that require strong consistency for coordination (ex: leader election) and not for durable data storage.
p1..pN on the other hand form the partitioned, durable, consistent data store and can be used to initialize a ConsistentMap. The size of each p1..pN partitions is N if N <= 2 and 3 if N > 2

We currently do not have any capabilities to influence Raft leader election. I suppose one way to do that would be to set a artificially low heartbeat timeout for the node you want to be elected as leader. While this does not guarantee that the desired node will always be the leader, it does give it an advantage over other nodes when it comes to leader election.

Madan.

--
You received this message because you are subscribed to the Google Groups "ONOS Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onos-discuss...@onosproject.org.
To post to this group, send email to onos-d...@onosproject.org.
Visit this group at http://groups.google.com/a/onosproject.org/group/onos-discuss/.
To view this discussion on the web visit https://groups.google.com/a/onosproject.org/d/msgid/onos-discuss/4de79581-5a45-4177-9fd7-fdbf71071c4b%40onosproject.org.

Abubakar Siddique

unread,
May 20, 2015, 4:15:43 AM5/20/15
to onos-d...@onosproject.org, guitarax...@gmail.com
Hi Madan,

Thanks for the answer, I'll surely test it out. This heartbeat, is it the one using the port 2419 or is it specifically on the RAFT TCP Connection?

Also if I want to change it, where should I?

Regards
Abubakar

Abubakar Siddique

unread,
May 20, 2015, 12:37:11 PM5/20/15
to onos-d...@onosproject.org
Just to add, currently my output for the distributed system is this:

onos> partitions
----------------------------------------------------------
Name                     Term                   Members
----------------------------------------------------------
p0                          1 tcp://192.168.56.101:7238 *
                              tcp://192.168.56.102:7238
----------------------------------------------------------
p1                          1 tcp://192.168.56.101:7238
                              tcp://192.168.56.102:7238 *
----------------------------------------------------------
p2                          1 tcp://192.168.56.101:7238
                              tcp://192.168.56.102:7238 *

----------------------------------------------------------

So if I interpret correctly, 192.168.56.101 is the leader for p0 and p0 is only being used for the leader election of RAFT, including the RAFT heartbeat etc?

And regarding, multiple partitions: Why are multiple partitions implemented? Does it mean if for example, there is a change in topology and it is notified to the controller at 192.168.56.101, then it has to inform the leader of data store on the 192.168.56.102 in p1 and p2 twice?

Regards
Abubakar

Madan Jampani

unread,
May 20, 2015, 2:09:12 PM5/20/15
to Abubakar Siddique, onos-d...@onosproject.org
Yes. The * next to the node indicates that it is the leader for that partition.

There are two notions of leadership. One is internal to Raft and and the other one is a separate leadership concept which is exposed as a distributed coordination primitive in ONOS via the LeadershipService.

What the partitions output shows you are leaders elected as part of the Raft protocol to coordinate log replication. Each partition has its own leader and and the leader is responsible for heartbeating to the followers in that partition. All updates submitted to a partition are serialized through the leader of that partition. That is what ensure strong consistency. Details of Raft protocol can be found in this paper: https://ramcloud.stanford.edu/raft.pdf

Multiple partitions exist for scalability reasons. As I mentioned updates within a partition are serialized. Updates in different partitions proceed independent of each other. We implemented a distributed, strongly consistent map abstraction by mapping keys (in the map) to different partitions thereby "sharding" the key space. As the size of cluster grows one can simply scale by adding more partitions and/or redistributing existing partitions.

Madan.





Abubakar Siddique

unread,
May 26, 2015, 12:25:54 PM5/26/15
to onos-d...@onosproject.org, guitarax...@gmail.com
Hi Madan,

Regarding the p0 partition: This is for for any arbitrary app? I mean p0 can be used for leader election for any app while maintaining strong consistency and any consistent map is not sharded here?

For the others p1..pN, I understood at any time, not more than 3 in a partition are responsible for a shard of the consistent map, does this mean that only 3 have a shard of the consistent map? If any controller outside the shard, let's say a member of p1 wants to access something from the shard in p2, he contacts the leader of p2 or is there some other methodology?

I applaud your patience in my queries :)

Abubakar

Madan Jampani

unread,
May 26, 2015, 12:34:03 PM5/26/15
to Abubakar Siddique, onos-d...@onosproject.org
You are right.

When building a ConsistentMap, one can use the withPartitionsDisabled() method on the builder and that puts all the data for that map in p0 partition. In addition to not being sharded, the data placed in this partition is also not persisted i.e. a full cluster restart will wipe out all data on p0 partition. The primary advantage of p0 partition is that it spans the entire cluster i.e as long as a majority of the cluster is up and running p0 partition can take updates and serve queries.

For p1..pN partitions, if a node that is not a member of the shard wants to access data in that shard, it merely forwards the request to the "leader" of that shard.

Madan.

Reply all
Reply to author
Forward
0 new messages