Side channel for raft snapshots

30 views
Skip to first unread message

Viacheslav Biriukov

unread,
Oct 24, 2018, 6:17:04 PM10/24/18
to etcd-dev
Hello, 

I'm quite confused about how to properly implement a side channel for RAFT snapshots. 
I want to implement a pull logic, where a follower downloads a snapshot from the leader using a side channel (non raft protocol).

So, as far as I understand (using raftexample names) it should be something like following:

for {
    select {
...
        case rd := <-rc.node.Ready():
rc.wal.Save(rd.HardState, rd.Entries) # 1

            if !raft.IsEmptySnap(rd.Snapshot) {
                // loop where we're trying to get a snapshot
                for {
                    if err := rc.downloadSnapshotFromSideChannel(rd.Snapshot); err == nil { # 2
                        break
                    }
                }

                rc.saveSnap(rd.Snapshot) # 3
                rc.raftStorage.ApplySnapshot(rd.Snapshot)
                rc.publishSnapshot(rd.Snapshot)
            }

            rc.raftStorage.Append(rd.Entries)
            rc.transport.Send(rd.Messages)
            if ok := rc.publishEntries(rc.entriesToApply(rd.CommittedEntries)); !ok {
                rc.stop()
                return
            }
            rc.maybeTriggerSnapshot()
            rc.node.Advance()
    }
}


So my questions is:
  1. If my process would be killed during the download stage (downloadSnapshotFromSideChannel) #2, is it OK to save a last HardState and Entries in #1 without saving Snapshot in #3? I think about process restarts and re-applying WAL logs during it. Is it safe?
Thank you in advance! 

-- 
BR,
Viacheslav Birukov

Xiang Li

unread,
Oct 24, 2018, 7:49:48 PM10/24/18
to v.v.bi...@gmail.com, etcd...@googlegroups.com
It is OK as long as you have not replied to the leader anything. The key is not to reply to the leader until you get the snapshot restore finished.

--
You received this message because you are subscribed to the Google Groups "etcd-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Viacheslav Biriukov

unread,
Oct 30, 2018, 7:16:18 PM10/30/18
to etcd-dev
Thank you for your answer!

But it looks like there could be a WAL corruption during a snapshot restoring process. More info and steps to reproduce in the issue: https://github.com/etcd-io/etcd/issues/10219
Reply all
Reply to author
Forward
0 new messages