Groups

Vault using Raft can't recover from downed leader node

2,041 views

Skip to first unread message

GILESV FPV

unread,

Aug 26, 2019, 3:46:50 PM8/26/19

to Vault

Hey everyone,

I've been playing around with the Raft integrated storage on Vault and I can't get it to recover from the master node being killed off without manual intervention. My setup is as follows-

* Three AWS Fargate tasks running in the same cluster each running Vault 1.2

* Two nodes have successfully joined the third, giving us this raft configuration

vault operator raft configuration -format=json

"servers": [

{

"address": "ip-10-0-93-189.ec2.internal:8201",

"leader": true,

"node_id": "3620",

"protocol_version": "3",

"voter": true

},

{

"address": "ip-10-0-70-60.ec2.internal:8201",

"leader": false,

"node_id": "20874",

"protocol_version": "3",

"voter": true

},

{

"address": "ip-10-0-126-143.ec2.internal:8201",

"leader": false,

"node_id": "4411",

"protocol_version": "3",

"voter": true

}

I want this setup to be fault tolerant of course, so I've been testing what happens when 1/3 nodes is stopped. When standby nodes are killed, the ECS cluster creates a new Vault server and it automatically joins in on the existing cluster (thanks to a little scripting), which is perfect. However, whenever the leader node is stopped, the standby nodes log the following indefinitely, and do not elect a new leader from the existing two nodes. This results in a replacement third node not being able to join as there is no leader.

The fact that we're getting logs about TLS config has my attention as a possible culprit as I have disable_tls = 1 set in the server config file.

Logs from Cloudwatch-



19:31:01

2019-08-26T19:31:01.599Z [WARN] storage.raft: Heartbeat timeout from "[::]:8201" reached, starting election



19:31:01

2019-08-26T19:31:01.599Z [INFO] storage.raft: Node at [::]:8201 [Candidate] entering Candidate state in term 3



19:31:01

2019-08-26T19:31:01.602Z [ERROR] storage.raft: Failed to make RequestVote RPC to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:01

2019-08-26T19:31:01.689Z [INFO] storage.raft: Election won. Tally: 2



19:31:01

2019-08-26T19:31:01.689Z [INFO] storage.raft: Node at [::]:8201 [Leader] entering Leader state



19:31:01

2019-08-26T19:31:01.689Z [INFO] storage.raft: Added peer 26824, starting replication



19:31:01

2019-08-26T19:31:01.689Z [INFO] storage.raft: Added peer 25614, starting replication



19:31:01

2019-08-26T19:31:01.690Z [INFO] storage.raft: pipelining replication to peer {Voter 25614 ip-10-0-88-138.ec2.internal:8201}



19:31:01

2019-08-26T19:31:01.691Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:01

2019-08-26T19:31:01.696Z [INFO] core: acquired lock, enabling active operation



19:31:01

2019-08-26T19:31:01.702Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:01

2019-08-26T19:31:01.716Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:01

2019-08-26T19:31:01.735Z [INFO] core: post-unseal setup starting



19:31:01

2019-08-26T19:31:01.735Z [INFO] core: loaded wrapping token key



19:31:01

2019-08-26T19:31:01.735Z [INFO] core: successfully setup plugin catalog: plugin-directory=



19:31:01

2019-08-26T19:31:01.735Z [INFO] core: successfully mounted backend: type=system path=sys/



19:31:01

2019-08-26T19:31:01.736Z [INFO] core: successfully mounted backend: type=identity path=identity/



19:31:01

2019-08-26T19:31:01.736Z [INFO] core: successfully mounted backend: type=cubbyhole path=cubbyhole/



19:31:01

2019-08-26T19:31:01.737Z [INFO] core: successfully enabled credential backend: type=token path=token/



19:31:01

2019-08-26T19:31:01.737Z [INFO] core: restoring leases



19:31:01

2019-08-26T19:31:01.737Z [INFO] rollback: starting rollback manager



19:31:01

2019-08-26T19:31:01.737Z [INFO] identity: entities restored



19:31:01

2019-08-26T19:31:01.737Z [INFO] identity: groups restored



19:31:01

2019-08-26T19:31:01.737Z [INFO] expiration: lease restore complete



19:31:01

2019-08-26T19:31:01.738Z [INFO] core: post-unseal setup complete



19:31:01

2019-08-26T19:31:01.750Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:01

2019-08-26T19:31:01.867Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:02

2019-08-26T19:31:02.007Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:02

2019-08-26T19:31:02.230Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:02

2019-08-26T19:31:02.331Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:02

2019-08-26T19:31:02.616Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:03

2019-08-26T19:31:03.066Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:03

2019-08-26T19:31:03.349Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:03

2019-08-26T19:31:03.798Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:04

2019-08-26T19:31:04.191Z [WARN] storage.raft: Failed to contact 26824 in 2.501957086s



19:31:04

2019-08-26T19:31:04.333Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:04

2019-08-26T19:31:04.718Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:04

2019-08-26T19:31:04.878Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:05

2019-08-26T19:31:05.769Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: connect: connection refused



19:31:06

2019-08-26T19:31:06.673Z [WARN] storage.raft: Failed to contact 26824 in 4.984369858s



19:31:09

2019-08-26T19:31:09.154Z [WARN] storage.raft: Failed to contact 26824 in 7.464960807s



19:31:16

2019-08-26T19:31:16.598Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:31:17

2019-08-26T19:31:17.345Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: i/o timeout



19:31:27

2019-08-26T19:31:27.836Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:31:32

2019-08-26T19:31:32.524Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: i/o timeout



19:31:39

2019-08-26T19:31:39.312Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:31:51

2019-08-26T19:31:51.537Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:31:52

2019-08-26T19:31:52.765Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: i/o timeout



19:32:04

2019-08-26T19:32:04.651Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:32:11

2019-08-26T19:32:11.706Z [INFO] storage.raft: Updating configuration with AddStaging (15431, ip-10-0-124-88.ec2.internal:8201) to [{Suffrage:Voter ID:26824 Address:ip-10-0-100-129.ec2.internal:8201} {Suffrage:Voter ID:25614 Address:ip-10-0-88-138.ec2.internal:8201} {Suffrage:Voter ID:18097 Address:ip-10-0-67-248.ec2.internal:8201} {Suffrage:Voter ID:15431 Address:ip-10-0-124-88.ec2.internal:8201}



19:32:11

2019-08-26T19:32:11.711Z [INFO] storage.raft: Added peer 15431, starting replication



19:32:11

2019-08-26T19:32:11.714Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:11

2019-08-26T19:32:11.802Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:11

2019-08-26T19:32:11.899Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.016Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.129Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.306Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.534Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.678Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-124-88.ec2.internal:8201: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:12

2019-08-26T19:32:12.913Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:13

2019-08-26T19:32:13.086Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 26824 ip-10-0-100-129.ec2.internal:8201}: dial tcp 10.0.100.129:8201: i/o timeout



19:32:13

2019-08-26T19:32:13.349Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-124-88.ec2.internal:8201: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:13

2019-08-26T19:32:13.630Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:14

2019-08-26T19:32:14.211Z [WARN] storage.raft: Failed to contact 15431 in 2.500247569s



19:32:14

2019-08-26T19:32:14.211Z [WARN] storage.raft: Failed to contact quorum of nodes, stepping down



19:32:14

2019-08-26T19:32:14.211Z [INFO] storage.raft: Node at [::]:8201 [Follower] entering Follower state (Leader: "")



19:32:14

2019-08-26T19:32:14.211Z [INFO] storage.raft: aborting pipeline replication to peer {Voter 25614 ip-10-0-88-138.ec2.internal:8201}



19:32:14

2019-08-26T19:32:14.212Z [WARN] core: leadership lost, stopping active operation



19:32:14

2019-08-26T19:32:14.212Z [INFO] core: pre-seal teardown starting



19:32:14

2019-08-26T19:32:14.276Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-124-88.ec2.internal:8201: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:14

2019-08-26T19:32:14.712Z [INFO] rollback: stopping rollback manager



19:32:14

2019-08-26T19:32:14.712Z [INFO] core: pre-seal teardown complete



19:32:14

2019-08-26T19:32:14.713Z [ERROR] core: clearing leader advertisement failed: error="node is not the leader"



19:32:14

2019-08-26T19:32:14.713Z [ERROR] core: unlocking HA lock failed: error="node is not the leader"



19:32:14

2019-08-26T19:32:14.716Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:15

2019-08-26T19:32:15.002Z [ERROR] storage.raft: Failed to AppendEntries to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:15

2019-08-26T19:32:15.717Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:16

2019-08-26T19:32:16.226Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:17

2019-08-26T19:32:17.227Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:17

2019-08-26T19:32:17.472Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:19

2019-08-26T19:32:19.035Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:19

2019-08-26T19:32:19.258Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing remote error: tls: internal error""



19:32:19

2019-08-26T19:32:19.258Z [ERROR] core: forward request error: error="error during forwarding RPC request"



19:32:20

2019-08-26T19:32:20.132Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:20

2019-08-26T19:32:20.634Z [ERROR] storage.raft: Failed to heartbeat to ip-10-0-100-129.ec2.internal:8201: dial tcp 10.0.100.129:8201: i/o timeout



19:32:20

2019-08-26T19:32:20.637Z [WARN] storage.raft: Heartbeat timeout from "" reached, starting election



19:32:20

2019-08-26T19:32:20.637Z [INFO] storage.raft: Node at [::]:8201 [Candidate] entering Candidate state in term 4



19:32:20

2019-08-26T19:32:20.642Z [ERROR] storage.raft: Failed to make RequestVote RPC to {Voter 15431 ip-10-0-124-88.ec2.internal:8201}: dial tcp 10.0.124.88:8201: connect: connection refused



19:32:21

2019-08-26T19:32:21.363Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:24

2019-08-26T19:32:24.292Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing remote error: tls: internal error""



19:32:24

2019-08-26T19:32:24.292Z [ERROR] core: forward request error: error="error during forwarding RPC request"



19:32:24

2019-08-26T19:32:24.594Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:26

2019-08-26T19:32:26.098Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]



19:32:29

2019-08-26T19:32:29.326Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing remote error: tls: internal error""



19:32:29

2019-08-26T19:32:29.327Z [ERROR] core: forward request error: error="error during forwarding RPC request"



19:32:30

2019-08-26T19:32:30.246Z [WARN] storage.raft: Election timeout reached, restarting election



19:32:30

2019-08-26T19:32:30.246Z [INFO] storage.raft: Node at [::]:8201 [Candidate] entering Candidate state in term 5

However, if I manually perform a `vault operator raft remove-peer 3620`, the node is removed from the list and a new leader is elected, new nodes can join in, life is good.

Is this expected behavior, for Vault to require manual intervention when the leader has an outage? Is this perhaps something that the user implementing Vault needs to worry about, writing a script to call remove-peer when we determine a node to be dead? Are these error messages a dead giveaway that I configured something wrong, or maybe this is something that will be fixed when Raft storage becomes more supported by Hashicorp?

Thanks so much for any input!

Michel Vocks

unread,

Sep 2, 2019, 8:39:08 AM9/2/19

to Vault

Hi there!

However, if I manually perform a `vault operator raft remove-peer 3620`, the node is removed from the list and a new leader is elected, new nodes can join in, life is good.

Is this expected behavior, for Vault to require manual intervention when the leader has an outage? Is this perhaps something that the user implementing Vault needs to worry about, writing a script to call remove-peer when we determine a node to be dead? Are these error messages a dead giveaway that I configured something wrong, or maybe this is something that will be fixed when Raft storage becomes more supported by Hashicorp?

I think the problem here is that the quorum cannot be determined and therefore a leader election is not started.

From the log output you've provided above, I can see that there is one node not reachable (10.0.100.129) but a leader is still present and active. At some point, an additional node goes down (10.0.124.88).

I think at this point the internal cluster consists of four nodes where two nodes are not reachable and two nodes are fine. The consensus protocol (raft) cannot decide the quorum and therefore prevents the cluster from another leader election.

It is always recommended to automatically/manually remove peers (nodes) from the cluster when a node has been destroyed and it is ensured that this node will not be used again in the future to make sure the cluster quorum is always guaranteed.

I totally understand that this is sometimes tricky to automate (especially in a cloud environment). A solution which I've often seen is that nodes are always using the same IP address. If, for example, a node has a failure and needs to be replaced by a new instance, you could automatically assign the old IP address to the new instance. That would allow the raft protocol to remember this peer and would prevent a potential quorum outage.

I can also highly recommend to have a look at the consul raft documentation: https://www.consul.io/docs/internals/consensus.html

It is well written and has some really good explanations around the consenus protocol.

Cheers,

Michel

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu