commit index in copycat

Terry Tan

unread,

Feb 14, 2017, 9:37:59 AM2/14/17

to raft-dev

Hi Jordan,

I have been confused about the commit index in copycat , i found you did not persist it to disk , why ? and if the server is crashed ,how did the crashed server got restored by the commit Index?

jordan.h...@gmail.com

unread,

Feb 14, 2017, 12:33:12 PM2/14/17

to raft...@googlegroups.com

The commit index does not need to be persisted to disk. Only the log, the current term and votes need to be stored on disk. If the entire cluster is taken down and then restarted (so the commitIndex is lost on all servers), the Raft protocol is designed such that any leader will have all committed entries and then will simply recommit and reapply them to the state machine. And the protocol also ensures that any new operations that occur after a cluster restart will occur at a after (in logical time) the leader has recommitted all changes. Because the protocol's designed this way, there's nothing gained by storing the commit index aside from overhead.

This can only really cause issues if the state machine is persistent. But even if Copycat or other implementations were to persist the lastApplied index to try to prevent changes from being applied to a persistent state machine twice, there would still be a race wherein a persistent state machine could apply a command and the server could crash before persisting the lastApplied index, so the persistent state machine still has to ensure commands are idempotent.

On Feb 14, 2017, at 6:37 AM, Terry Tan <tx...@sina.com> wrote:

Hi Jordan,

I have been confused about the commit index in copycat , i found you did not persist it to disk , why ? and if the server is crashed ,how did the crashed server got restored by the commit Index?

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Юрий Соколов

unread,

Feb 14, 2017, 1:26:12 PM2/14/17

to raft-dev

There is benefit from persisting commited index for non-idempotent operations for in-memory storage (faster start after crash) which snapshot and specific implementations of change-log (with append-only log file with logical raft position).

I don't know, how copycat implements it. Probably copycat doesn't need this.

jordan.h...@gmail.com

unread,

Feb 14, 2017, 2:40:26 PM2/14/17

to raft...@googlegroups.com

Sure, there's nothing wrong with persisting the commitIndex, but the cluster typically will converge on that index very quickly (a leader election and commitment of one entry from the new leader's term) so I'm not sure it would save much time in practice.

On Feb 14, 2017, at 10:26 AM, Юрий Соколов <funny....@gmail.com> wrote:

There is benefit from persisting commited index for non-idempotent operations for in-memory storage (faster start after crash) which snapshot and specific implementations of change-log (with append-only log file with logical raft position).

I don't know, how copycat implements it. Probably copycat doesn't need this.

Юрий Соколов

unread,

Feb 14, 2017, 3:38:44 PM2/14/17

to raft...@googlegroups.com

In practice, there could be more than 32GB of data that should be loaded in memory, and secondary indices should be built.

If they will "commited" after talk to leader, it will be huge delay.

14 февр. 2017 г. 10:40 PM пользователь "jordan.h...@gmail.com" <jordan.h...@gmail.com> написал:

To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "raft-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raft-dev/0qHq4odakQk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raft-dev+unsubscribe@googlegroups.com.

jordan.h...@gmail.com

unread,

Feb 14, 2017, 3:47:25 PM2/14/17

to raft...@googlegroups.com

But the delay is not really going to be longer than it takes to converge on a commit index. Assuming you start a cluster and, in parallel, start the election protocol and begin applying known commits, all you're really gaining is the amount of time it takes to elect a leader and commit one entry. You're only beginning to apply commits that much sooner. It's only a significant benefit if you don't count the application of known committed entries as part of the startup time... which it is.

To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.

Terry Tan

unread,

Feb 15, 2017, 1:57:47 AM2/15/17

to raft-dev

Hi Jordan,

I have read your codes ,which is actually the same as what you said , the reason why you persist lastVoteFor on disk is to get to know which server is the leader if we start the whole cluster. But i am still a little bit confused about the process of restoring . if we start the whole cluster ,let's say we first start the server (whose role is ex-follower) ,then it will found it is not the leader(it will do nothing) , then we start the ex-leader server ,it will get the info from dick that it used to be leader ,then it will start to work, i dont know if what i am saying is correct or not? one more thing ,even if what i said is correct , how dose the leader know from which point it should issue appendentry call?

Terry Tan

unread,

Feb 15, 2017, 2:09:03 AM2/15/17

to raft-dev

and i also found you delete the log ,every time you start the server , why , it will lost all the operations ,how is the state machine able to restore from the log ?
在 2017年2月15日星期三 UTC+8下午2:57:47，Terry Tan写道：

Reply all

Reply to author

Forward