Lose data after ack to client?

47 views
Skip to first unread message

H Wang

unread,
Jun 3, 2024, 1:51:18 AMJun 3
to raft-dev
For a cluster with 3 nodes, the leader gets ack from itself and one follower,  then ack to the client with success.  Then the leader crashes (and not restart). For the rest of two nodes, as the last log entry only replicated to one follower, will the last entry be lost? 

Oren Eini (Ayende Rahien)

unread,
Jun 3, 2024, 3:15:20 AMJun 3
to raft...@googlegroups.com
No, because only the follower with that entry can then be selected as the leader

On Mon, Jun 3, 2024 at 8:51 AM H Wang <howard.w...@gmail.com> wrote:
For a cluster with 3 nodes, the leader gets ack from itself and one follower,  then ack to the client with success.  Then the leader crashes (and not restart). For the rest of two nodes, as the last log entry only replicated to one follower, will the last entry be lost? 

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/0fc1b842-8a1f-4011-8a8b-132323c3ae4en%40googlegroups.com.


--

H Wang

unread,
Jun 3, 2024, 3:58:09 PMJun 3
to raft-dev
Here is a very specific scenario: if there are 3 nodes (S1, S2, S3),  the network is partitioned to P1 (S1 and S2 in it) and P2 (S3 in it).    S1 crashed and then a fresh new S1, which forgets the latest status, snapshot, WAL logs, etc.,  but has the same IP and configuration, started in P2.  S1 and S3 now have quorum. S3 becomes leader. The committed logs, which are in S2 but not replicated to S3 due to network partitioning, will be lost. 

This is quite an unusual situation (and may not the scope of raft). I would appreciate any practical advice on how to address the durability issue.

Archie Cobbs

unread,
Jun 3, 2024, 4:04:34 PMJun 3
to raft...@googlegroups.com
On Mon, Jun 3, 2024 at 2:58 PM H Wang <howard.w...@gmail.com> wrote:
a fresh new S1, which forgets the latest status, snapshot, WAL logs, etc.,  but has the same IP and configuration, started in P2.  S1 and S3 now have quorum.

What you described there is not considered Raft. Raft nodes are not allowed to "forget" but yet stay a member of their cluster.

If a Raft node forgets, it must forget any previous cluster membership as well and therefore would have to be added back manually in order to rejoin.

To properly implement this, include configuration changes in the log, and always derive your cluster membership state entirely from the log. Then if the log disappears, so does your configuration (automatically).

The configuration you derive from an empty log is always "I am not associated with any cluster".

-Archie
 
--
Archie L. Cobbs
Reply all
Reply to author
Forward
0 new messages