If you don't mind, I'll use this thread for questions and issues?
I was finally able to make rqlite work on our platform, but I had to start a single node, wait for it to become leader and then start the 2 other nodes. I thought raft could handle multiple nodes coming up at the same time?
However, I'm unable to restart my nodes and return to a good state afterwards. I'm restarting them 1 by 1, but maybe too fast?
This is my rqlited command:
rqlited \
-disco-id $DISCO_ID \
-http-addr "
0.0.0.0:4001" -http-adv-addr "[$PRIVATE_NET_IP]:4001" \
-raft-addr "
0.0.0.0:4002" -raft-adv-addr "[$PRIVATE_NET_IP]:4002" \
-raft-timeout 5s -raft-election-timeout 10s \
-on-disk \
/data/db
(I upped the raft timeouts, but I don't think it helped)
I'm getting the following logs for a single node after all nodes have restarted:
2020-12-05T16:21:50.006Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:21:55.668Z [INFO] raft: duplicate requestVote for same term: term=16
2020-12-05T16:21:55.671Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:22:00.005Z [ERROR] raft: failed to appendEntries to: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
2020-12-05T16:22:00.891Z [ERROR] raft: failed to heartbeat to: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002 error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
2020-12-05T16:22:10.022Z [ERROR] raft: failed to appendEntries to: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
2020-12-05T16:22:11.849Z [ERROR] raft: failed to heartbeat to: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002 error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
[store] 2020/12/05 16:22:22 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:23 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:22:27 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:28 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:22:32 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:34 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:22:38 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:39 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:22:43 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:44 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:22:55 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:22:56 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:23:00 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:23:01 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:23:05 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:23:07 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:23:11 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:23:12 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:23:16 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:23:17 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:23:33.323Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=[::]:4002
2020-12-05T16:23:33.326Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=18
2020-12-05T16:23:33.330Z [INFO] raft: duplicate requestVote for same term: term=18
2020-12-05T16:23:33.332Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:23:33.334Z [INFO] raft: election won: tally=2
2020-12-05T16:23:33.336Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:23:33.337Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:23:33.339Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:23:33.342Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:23:40.730Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:23:40.733Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=19
2020-12-05T16:23:40.738Z [INFO] raft: duplicate requestVote for same term: term=19
2020-12-05T16:23:40.740Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:23:40.742Z [INFO] raft: election won: tally=2
2020-12-05T16:23:40.743Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:23:40.744Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:23:40.746Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:23:40.751Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:23:40.822Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=14
2020-12-05T16:23:50.692Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:23:50.694Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=20
2020-12-05T16:23:50.698Z [INFO] raft: duplicate requestVote for same term: term=20
2020-12-05T16:23:50.700Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:23:50.701Z [INFO] raft: election won: tally=2
2020-12-05T16:23:50.702Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:23:50.704Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:23:50.706Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:23:50.709Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:23:50.778Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=14
2020-12-05T16:23:56.298Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:23:56.305Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=21
2020-12-05T16:23:56.310Z [INFO] raft: duplicate requestVote for same term: term=21
2020-12-05T16:23:56.312Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:23:56.313Z [INFO] raft: election won: tally=2
2020-12-05T16:23:56.314Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:23:56.315Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:23:56.317Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:23:56.319Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:23:56.390Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=14
2020-12-05T16:24:05.151Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:24:05.152Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=22
2020-12-05T16:24:05.160Z [INFO] raft: duplicate requestVote for same term: term=22
2020-12-05T16:24:05.162Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:24:05.164Z [INFO] raft: election won: tally=2
2020-12-05T16:24:05.165Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:24:05.166Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:05.168Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:24:05.171Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:24:05.242Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=14
2020-12-05T16:24:12.606Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:24:12.608Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=23
2020-12-05T16:24:12.614Z [INFO] raft: duplicate requestVote for same term: term=23
2020-12-05T16:24:12.615Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:24:12.617Z [INFO] raft: election won: tally=2
2020-12-05T16:24:12.618Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:24:12.619Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:12.621Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:24:12.624Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:24:12.692Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=14
2020-12-05T16:24:19.933Z [WARN] raft: rejecting vote request since our last term is greater: candidate=[::]:4002 last-term=23 last-candidate-term=17
[store] 2020/12/05 16:24:25 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:26.387Z [WARN] raft: previous log term mis-match: ours=18 remote=24
[store] 2020/12/05 16:24:30 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:24:35 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:24:35.907Z [WARN] raft: previous log term mis-match: ours=19 remote=25
[store] 2020/12/05 16:24:35 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:38.173Z [INFO] raft: duplicate requestVote for same term: term=26
2020-12-05T16:24:38.176Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:24:38.220Z [WARN] raft: clearing log suffix: from=14 to=19
[store] 2020/12/05 16:24:40 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:24:41 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:24:45 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:24:46 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
[store] 2020/12/05 16:24:51 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:24:57 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:57.707Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=[::]:4002
2020-12-05T16:24:57.711Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=27
2020-12-05T16:24:57.717Z [INFO] raft: duplicate requestVote for same term: term=27
2020-12-05T16:24:57.719Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:24:57.720Z [INFO] raft: election won: tally=2
2020-12-05T16:24:57.721Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:24:57.722Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:24:57.724Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:24:57.726Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
[store] 2020/12/05 16:25:01 received request to join node at [fdaa:0:33:a7b:85:0:3d3:2]:4002
[store] 2020/12/05 16:25:02 received request to join node at [fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:25:03.130Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:25:03.133Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=28
2020-12-05T16:25:03.139Z [INFO] raft: duplicate requestVote for same term: term=28
2020-12-05T16:25:03.141Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:25:03.142Z [INFO] raft: election won: tally=2
2020-12-05T16:25:03.143Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:25:03.144Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:25:03.146Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:25:03.148Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:25:03.221Z [WARN] raft: appendEntries rejected, sending older logs: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" next=15
2020-12-05T16:25:09.179Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:25:09.182Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=29
2020-12-05T16:25:09.188Z [INFO] raft: duplicate requestVote for same term: term=29
2020-12-05T16:25:09.190Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:25:09.191Z [ERROR] raft: failed to make requestVote RPC: target="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error=EOF
2020-12-05T16:25:09.194Z [INFO] raft: election won: tally=2
2020-12-05T16:25:09.195Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:25:09.196Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:25:09.198Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:25:09.200Z [ERROR] raft: failed to appendEntries to: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error=EOF
2020-12-05T16:25:09.203Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:25:15.045Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:25:15.048Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=30
2020-12-05T16:25:15.054Z [INFO] raft: duplicate requestVote for same term: term=30
2020-12-05T16:25:15.056Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:25:15.057Z [INFO] raft: election won: tally=2
2020-12-05T16:25:15.058Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:25:15.060Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:25:15.061Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:25:15.064Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:25:20.928Z [WARN] raft: heartbeat timeout reached, starting election: last-leader=
2020-12-05T16:25:20.931Z [INFO] raft: entering candidate state: node="Node at [::]:4002 [Candidate]" term=31
2020-12-05T16:25:20.936Z [INFO] raft: duplicate requestVote for same term: term=31
2020-12-05T16:25:20.937Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:25:20.940Z [INFO] raft: election won: tally=2
2020-12-05T16:25:20.941Z [INFO] raft: entering leader state: leader="Node at [::]:4002 [Leader]"
2020-12-05T16:25:20.943Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:ab2:0:3d2:2]:4002
2020-12-05T16:25:20.945Z [INFO] raft: added peer, starting replication: peer=[fdaa:0:33:a7b:85:0:3d3:2]:4002
2020-12-05T16:25:20.947Z [INFO] raft: entering follower state: follower="Node at [::]:4002 [Follower]" leader=
2020-12-05T16:25:25.001Z [INFO] raft: duplicate requestVote for same term: term=31
2020-12-05T16:25:25.003Z [WARN] raft: duplicate requestVote from: candidate=[::]:4002
2020-12-05T16:25:25.027Z [WARN] raft: clearing log suffix: from=15 to=19
2020-12-05T16:25:25.057Z [ERROR] raft: failed to make requestVote RPC: target="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
2020-12-05T16:25:25.064Z [ERROR] raft: failed to appendEntries to: peer="{Voter [fdaa:0:33:a7b:85:0:3d3:2]:4002 [fdaa:0:33:a7b:85:0:3d3:2]:4002}" error="dial tcp [fdaa:0:33:a7b:85:0:3d3:2]:4002: i/o timeout"
It appears to be going in a loop, not being able to do its thing.
Here's lsof showing this node is listening on the right ports and has established connections:
rqlited 501 root 7u IPv6 542 0t0 TCP *:4002 (LISTEN)
rqlited 501 root 8u IPv6 860 0t0 TCP fly-local-6pn:47490->[fdaa:0:33:a7b:85:0:3d3:2]:4002 (ESTABLISHED)
rqlited 501 root 9u IPv6 882 0t0 TCP fly-local-6pn:4002->[fdaa:0:33:a7b:85:0:3d3:2]:42576 (ESTABLISHED)
rqlited 501 root 12u IPv6 863 0t0 TCP fly-local-6pn:47492->[fdaa:0:33:a7b:85:0:3d3:2]:4002 (ESTABLISHED)
rqlited 501 root 13u IPv6 596 0t0 TCP [::1]:59110->[::1]:4002 (ESTABLISHED)
rqlited 501 root 14u IPv6 597 0t0 TCP [::1]:4002->[::1]:59110 (ESTABLISHED)
rqlited 501 root 15u IPv6 562 0t0 TCP *:4001 (LISTEN)
rqlited 501 root 16u IPv6 884 0t0 TCP fly-local-6pn:4002->[fdaa:0:33:a7b:85:0:3d3:2]:42580 (ESTABLISHED)
rqlited 501 root 17u IPv6 866 0t0 TCP fly-local-6pn:4001->[fdaa:0:33:a7b:85:0:3d3:2]:40416 (ESTABLISHED)
rqlited 501 root 20u IPv6 872 0t0 TCP fly-local-6pn:4001->[fdaa:0:33:a7b:85:0:3d3:2]:40420 (ESTABLISHED)
rqlited 501 root 23u IPv6 876 0t0 TCP fly-local-6pn:4001->[fdaa:0:33:a7b:85:0:3d3:2]:40426 (ESTABLISHED)
Any ideas what's causing this?