Error recovering a failed machine in a etcd cluster : raft: tocommit(113) is out of range [lastInde

988 views

Skip to first unread message

Vicky Singh

unread,

May 10, 2018, 5:46:53 PM5/10/18

to CoreOS User

I have a 5 machine etcd cluster 3.3.5

I lost the the disk of one of the machines. I executed the following commands to get the snanpshot from another machine and restore the state. However during the startup I get the following error. This can be simulated by deleting the infra1.etcd directory.

2018-05-10 21:37:36.372723 I | rafthttp: established a TCP streaming connection with peer f94df198dfd1fae9 (stream Message writer)

2018-05-10 21:37:36.375263 I | rafthttp: peer e25bd2572f40d862 became active

2018-05-10 21:37:36.375272 I | rafthttp: established a TCP streaming connection with peer e25bd2572f40d862 (stream Message writer)

2018-05-10 21:37:36.376419 I | rafthttp: established a TCP streaming connection with peer f94df198dfd1fae9 (stream MsgApp v2 writer)

2018-05-10 21:37:36.376708 I | rafthttp: established a TCP streaming connection with peer e25bd2572f40d862 (stream MsgApp v2 writer)

2018-05-10 21:37:36.379499 I | rafthttp: peer 5085423e29a03b70 became active

2018-05-10 21:37:36.379508 I | rafthttp: established a TCP streaming connection with peer 5085423e29a03b70 (stream MsgApp v2 writer)

2018-05-10 21:37:36.379554 I | rafthttp: established a TCP streaming connection with peer 5085423e29a03b70 (stream Message writer)

2018-05-10 21:37:36.384789 I | rafthttp: established a TCP streaming connection with peer 5085423e29a03b70 (stream MsgApp v2 reader)

2018-05-10 21:37:36.384837 I | rafthttp: established a TCP streaming connection with peer 5085423e29a03b70 (stream Message reader)

2018-05-10 21:37:36.401149 I | etcdserver: ef4b0eeaaa716a7 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 3 active peer(s)

2018-05-10 21:37:36.409883 I | raft: ef4b0eeaaa716a7 [term: 1] received a MsgHeartbeat message with higher term from e25bd2572f40d862 [term: 23]

2018-05-10 21:37:36.409895 I | raft: ef4b0eeaaa716a7 became follower at term 23

2018-05-10 21:37:36.409905 C | raft: tocommit(113) is out of range [lastIndex(5)]. Was the raft log corrupted, truncated, or lost?

panic: tocommit(113) is out of range [lastIndex(5)]. Was the raft log corrupted, truncated, or lost?

goroutine 103 [running]:

github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc42013bba0, 0x1025c31, 0x5d, 0xc42185c620, 0x2, 0x2)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16d

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc4200e2150, 0x71)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/log.go:191 +0x15c

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420226100, 0x8, 0xef4b0eeaaa716a7, 0xe25bd2572f40d862, 0x17, 0x0, 0x0, 0x0, 0x0, 0x0, ...)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1194 +0x54

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.stepFollower(0xc420226100, 0x8, 0xef4b0eeaaa716a7, 0xe25bd2572f40d862, 0x17, 0x0, 0x0, 0x0, 0x0, 0x0, ...)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1140 +0x439

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc420226100, 0x8, 0xef4b0eeaaa716a7, 0xe25bd2572f40d862, 0x17, 0x0, 0x0, 0x0, 0x0, 0x0, ...)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:868 +0x1465

github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*node).run(0xc4202ac360, 0xc420226100)

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:323 +0x113e

created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.RestartNode

/tmp/etcd-release-3.3.5/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:223 +0x321

1. command to get snapshot : ETCDCTL_API=3 ./etcdctl --endpoints 10.0.0.1:2379 snapshot save snapshot.db

2. command to restore snapshot : sudo ETCDCTL_API=3 ./etcdctl snapshot --data-dir ./infra1.etcd restore ~/snapshot.db --name infra1 --initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380 --initial-cluster-token etcd-cluster-1 --initial-advertise-peer-urls http://10.0.0.2:2380

3. Command to start the service : etcd --name infra1 --initial-advertise-peer-urls http://10.0.0.2:2380 \

--listen-peer-urls http://10.0.0.2:2380 \

--listen-client-urls http://10.0.0.2:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.2:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state existing

4. Commands to start the service

etcd --name infra0 --initial-advertise-peer-urls http://10.0.0.1:2380 \

--listen-peer-urls http://10.0.0.1:2380 \

--listen-client-urls http://10.0.0.1:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.1:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state new

etcd --name infra1 --initial-advertise-peer-urls http://10.0.0.2:2380 \

--listen-peer-urls http://10.0.0.2:2380 \

--listen-client-urls http://10.0.0.2:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.2:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state new

./etcd --name infra2 --initial-advertise-peer-urls http://10.0.0.3:2380 \

--listen-peer-urls http://10.0.0.3:2380 \

--listen-client-urls http://10.0.0.3:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.3:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state new

./etcd --name infra3 --initial-advertise-peer-urls http://10.0.0.4:2380 \

--listen-peer-urls http://10.0.0.4:2380 \

--listen-client-urls http://10.0.0.4:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.4:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state new

./etcd --name infra4 --initial-advertise-peer-urls http://10.0.0.5:2380 \

--listen-peer-urls http://10.0.0.5:2380 \

--listen-client-urls http://10.0.0.5:2379,http://127.0.0.1:2379 \

--advertise-client-urls http://10.0.0.5:2379 \

--initial-cluster-token etcd-cluster-1 \

--initial-cluster infra0=http://10.0.0.1:2380,infra1=http://10.0.0.2:2380,infra2=http://10.0.0.3:2380,infra3=http://10.0.0.4:2380,infra4=http://10.0.0.5:2380,infra5=http://96.17.173.111:2380 \

--initial-cluster-state new

Vicky

Reply all

Reply to author

Forward

0 new messages