Rules for Updating NextIndex[]

196 views
Skip to first unread message

Gurjant Kalsi

unread,
Apr 7, 2023, 5:20:06 PM4/7/23
to raft-dev
Hi Rafters,

I am a little confused about the rules for Updating NextIndex.

As far as I understand, NextIndex[F] tracks the next index in the Log that the Leader should send to Follower F. If the Leader and follower's logs have diverged, the Leader will continue to decrement NextIndex[F] and retry AppendEntries until the most recent common ancestor in their two logs is found.

I'm confused as to what the leader should do when AppendEntries finally succeeds. If the leader is only sending 1 entry at a time then it would seem reasonable to increment NextIndex[F] by 1 but the AppendEntries RPC allows the leader to send a bulk update. However the AppendEntriesRPC Reply doesn't seem to indicate the state of the follower's log when the AppendEntriesRPC succeeds. 

I don't see an obvious way for the Leader to appropriately update its NextIndex[F].

I hope this question was clear enough, any tips would be appreciated.

Regards,
Kalsi


jordan.h...@gmail.com

unread,
Apr 7, 2023, 5:41:10 PM4/7/23
to raft...@googlegroups.com
The leader should update the next index upon successful AppendEntries RPC, thus causing its next AppendEntries RPC to that follower to include the entries immediately following those in the successful request. When the follower acknowledges a successful AppendEntries RPC, that indicates it has appended all the entries in the request. Therefore, the leader’s nextIndex for that follower then becomes nextIndex + count(entries). That is, the leader should update the nextIndex to what it believes to be the followers last log index + 1, based on the successful AE RPC.

In practice, different implementations may want to track and update the nextIndex a bit differently due to optimizations like pipelining. There’s not really any risk to correctness in how the leader updates the nextIndex — since the log matching checks on the follower will always ensure consistency between the leader’s and followers’ logs anyways — so do what makes sense for your implementation. 

You also mentioned information not being included in the AppendEntries response. A lot of implementations actually do add the follower’s last log index to the response, and you should not shy away from doing so. This allows the leader to converge on the follower’s log more quickly — as opposed to decrementing one entry at a time — and the leader can use the lastLogIndex in the follower’s response to update the nextIndex for that follower.

Jordan Halterman

On Apr 7, 2023, at 2:20 PM, Gurjant Kalsi <gurjan...@gmail.com> wrote:

Hi Rafters,
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/5b50ddaf-9e37-43b8-b9e4-2f4a96359dc5n%40googlegroups.com.
Message has been deleted
Message has been deleted

John Ousterhout

unread,
Apr 14, 2023, 5:33:22 PM4/14/23
to raft...@googlegroups.com
One clarification on this. Updating nextIndex doesn't have to be perfect, since it will self-correct in future AppendEntries request, but information about which entries have been successfully appended is also used by the leader to update its commitIndex and this must absolutely be precise (or, at least it must never err on the side of thinking an entry has been replicated when it hasn't). Thus you probably need to track the received entries pretty carefully, and that same care might as well be applied to nextIndex.

-John-

On Wed, Apr 12, 2023 at 9:59 AM Gurjant Kalsi <gurjan...@gmail.com> wrote:
Thanks for the quick and helpful response. I'm not sure why my previous reply was deleted but I think you exactly answered my question.

Specifically I was concerned about a situation where the Leader sends two AppendEntries RPCs to Follower[F], one with E[0..n] entries and another with E[0..n+1] but only receives one response. Without any additional information in the AppendEntries response it seems like it'd be impossible for the Leader to know which of those two AppendEntries RPCs was successful. Ergo it'd also be impossible for the leader to update FollowerNextIndex[F] appropriately.
I guess what I was missing is that it technically doesn't matter and the algorithm will eventually converge since FollowerNextIndex[F] is intended as an estimate anyway.

Thanks for also suggesting that many implementations augment the AppendEntries response with the LastLogIndex and LastLogTerm. I think that actually simplifies the implementation a little bit and I agree that it seems that it would converge faster as well.

Thanks again for your prompt and helpful response!

Regards,
Kalsi

Reply all
Reply to author
Forward
0 new messages