Suppose the following:Then in the AppendEntries that L sends to F, it sets leaderCommit to N, instead of N - 1 as it would normally do.
- The leader L is about to send an AppendEntries containing log entry with index N to follower F.
- L has already received confirmation from M - 1 nodes that log entry N has been successfully received and added to their logs (i.e., itself plus M - 2 other followers).
- L's current commitIndex is N - 1.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
> To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
It _has_ to first write to the disk, and only then send it out.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
I think the point is it *is* an optimization in a 2-3 node cluster, because the leader can write to disk and send all AppendEntries requests with the new commitIndex since it knows itself plus any follower is a majority.
> On Oct 7, 2016, at 1:53 PM, Юрий Соколов <funny....@gmail.com> wrote:
>
> I doubdt it ever could be optimization, cause optimized version sends AppendEntries to all followers (and writes to disk) simutaneously.
> So this optimisation will do meaningful work only when sending to lagging replicas.
> If you don't have non-lagging majority, then you alrwady have serious troubles.
>
> --
> You received this message because you are subscribed to the Google Groups "raft-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
BTW, fsync is neccessary in any mode, IMHO. If you think consensus protocol is viable without fsync... Lets not discus it here, I don't really want to hear that.
In normal mode (majority is not lagging)
10.2.1 will give leader latency of "one network round + write to disk", while discussed optimization is still "two writes to disk + one network round". On follower (even with cluster of 3 and majority 2) 10.2.1 will give latency "3/2 network round + one write to disk", while discussed optimization "two writes to disk + 1/2 network round".
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
For durable writes, the cost is around 300 us o Linux, around 120 us on Windows.Note that those are SSD drives, on HDD, you will see about 700 us (and that is without any contention).
I've said exactly that. I just measure by network roundtrip, so it sounded "2*Ta + 1/2*Tr" and "Ta + 3/2*Tr".
And you forget about leader latency: with 10.2.1 its latency is "Ta + 2*Tc", and with discussed optimization it is "2*Ta + 2*Tc".