Leadership lost while committing log

79 views
Skip to first unread message

AJ

unread,
Nov 17, 2024, 4:19:06 PM11/17/24
to raft-dev
Hi, 

I have initialized a raft node and added 2 peers one-by-one. While the first peer is added, the second peer addition fails with an error "leadership lost while committing log.". 
 
It seems like after adding the first peer, the original leader (cluster-aus) is losing the leadership due to re-election. Not sure if this is normal behavior. What is the correct way to add the two peers to form the cluster? 
 
The exact same example in a unit test code or example with the same workflow works fine. However, when testing with a set of VM nodes, it faces the below issues. Not sure if some timer values are influencing the protocol behavior.

The leader node: cluster-aus
Peer 1 node: cluster-nyc
Peer 2 node: cluster-sjc

Below is the chronology of the events. 
  • time="2024-11-17 20:48:03.898223" level=info msg="#### raft node state transition ####" current_leader=cluster-aus new_state=Leader node_id=cluster-aus previousS_state=UnKnown 
  • time="2024-11-17 20:48:03.898277" level=info msg="adding peer node to the cluster" current_leader=cluster-aus peer_node_addr=192.168.133.170 peer_node_id=cluster-nyc peer_node_port=30653 
  • time="2024-11-17 20:48:03.898879" level=info msg="the node joined into the cluster" leader_node_id=cluster-aus node_id=cluster-aus peer_node_id=cluster-nyc 
<<< Perhaps not required, but added a 1 sec delay for the first peer to join the cluster >>>
  • time="2024-11-17 20:48:04.899394" level=info msg="adding peer node to the cluster" current_leader=cluster-aus peer_node_addr=192.168.133.178 peer_node_id=cluster-sjc peer_node_port=30653 
  • time="2024-11-17 20:48:08.900554" level=error msg="failed to join the node into the cluster" current_leader= error="failed to add the node to the raft cluster via leader cluster-aus, error leadership lost while committing log" node_id=cluster-aus peer_node_id=cluster-sjc 

AJ. 


Jason Aten

unread,
Aug 16, 2025, 4:13:22 PMAug 16
to raft-dev
Hi AJ,

On bootstrapping, Section 4.4, page 46 of the dissertation says,

"Instead, we recommend that the very first 
time a cluster is created, one server is initialized 
with a configuration entry as the first entry 
in its log. This configuration lists only that one 
server; it alone forms a majority of its configuration, 
so it can consider this configuration committed. 
Other servers from then on should be initialized 
with empty logs; they are added to the cluster 
and learn of the current configuration through 
the membership change mechanism."

If you have this bootstrap log-entry in place for (only) the designated
leader's log; and you have the pre-vote, and sticky-leader optimizations, 
then you should definitely not be seeing premature 
leader churn because the designated leader will 
always have the longer log as nodes join initially.

Reply all
Reply to author
Forward
0 new messages