I've been having an issue in production where vault goes offline because an election happens and then after the election, "Leader" points to one vault server, but "Active" is a different server. I prod I killed 2 of the 3 vault servers to get stability.
Since prod is a bit behind I rebuilt in Dev and am able to recreate the problem. The Backend is Consul 0.7.2 with 3 nodes running in "server" mode. Vault is version 0.7.0. Everything is running in containers and docker 12.5.0. To mimic fixing my issue in prod, I did start at consul 0.6.4 and then do an upgrade to 0.7.1, same for vault 0.5.2 upgraded to 0.7.0.
Scenario is everything is working, a leadership election occurs and then vault fails due to a "broken" election. To demo, I start with a working cluster, and then do a "vault step-down" to force an election. Looking at github I saw something similar, but it was fixed some time ago, and I'm seeing this on very fresh versions.
Any Ideas?
Thanks,
Steve Dillon
steved@steve-mint ~/projects/gitlab/docker-dev5/docker-farm/dockerimages/vault $ ./vs.sh
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: standby
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: active
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: standby
steved@steve-mint ~/projects/gitlab/docker-dev5/docker-farm/dockerimages/vault $ date; vault step-down -address=https://10.105.23.201:8200/ Fri Apr 7 15:21:53 EDT 2017
steved@steve-mint ~/projects/gitlab/docker-dev5/docker-farm/dockerimages/vault $ ./vs.sh
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: standby
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: standby
#
# Server 10.105.43.61 is currently active and will respond to vault requests, but isn't the leader
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
High-Availability Enabled: true
Mode: active
steved@steve-mint ~/projects/gitlab/docker-dev5/docker-farm/dockerimages/vault $
Vault Logs 10.105.2.80, nothing after the test run
Vault Logs 10.105.23.201
2017/04/07 19:21:53.614271 [WARN ] core: stepping down from active operation to standby
2017/04/07 19:21:53.653388 [WARN ] physical/consul: Concurrent state change notify dropped
2017/04/07 19:21:53.653402 [INFO ] core: pre-seal teardown starting
2017/04/07 19:21:53.653411 [INFO ] core: stopping cluster listeners
2017/04/07 19:21:53.653419 [INFO ] core: shutting down forwarding rpc listeners
2017/04/07 19:21:53.653440 [INFO ] core: forwarding rpc listeners stopped
2017/04/07 19:21:53.931695 [INFO ] core: rpc listeners successfully shut down
2017/04/07 19:21:53.931722 [INFO ] core: cluster listeners successfully shut down
2017/04/07 19:21:53.931746 [INFO ] rollback: stopping rollback manager
2017/04/07 19:21:53.931815 [INFO ] core: pre-seal teardown complete
2017/04/07 19:21:53.960938 [INFO ] core: acquired lock, enabling active operation
2017/04/07 19:21:54.022306 [WARN ] physical/consul: Concurrent state change notify dropped
2017/04/07 19:21:54.022325 [INFO ] core: post-unseal setup starting
2017/04/07 19:21:54.023767 [INFO ] core: loaded wrapping token key
2017/04/07 19:21:54.026032 [INFO ] core: successfully mounted backend: type=generic path=secret/
2017/04/07 19:21:54.026125 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/04/07 19:21:54.026144 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/04/07 19:21:54.026234 [INFO ] rollback: starting rollback manager
2017/04/07 19:21:54.031597 [INFO ] expiration: restoring leases
2017/04/07 19:21:54.035807 [INFO ] core: post-unseal setup complete
2017/04/07 19:21:54.035830 [INFO ] core/startClusterListener: starting listener: listener_address=0.0.0.0:8201 2017/04/07 19:21:54.035952 [INFO ] core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
The Consul servers run in containers on same cluster and share same IP addresses:
2017/04/07 19:18:50 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:18:51 [INFO] agent: Synced check 'vault:10.105.43.61:8200:vault-sealed-check'
2017/04/07 19:19:50 [WARN] agent: Check 'vault:10.105.23.201:8200:vault-sealed-check' missed TTL, is now critical
2017/04/07 19:19:50 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:19:51 [WARN] agent: Check 'vault:10.105.43.61:8200:vault-sealed-check' missed TTL, is now critical
2017/04/07 19:19:51 [INFO] agent: Synced check 'vault:10.105.43.61:8200:vault-sealed-check'
2017/04/07 19:20:00 [INFO] agent: Synced check 'vault:10.105.43.61:8200:vault-sealed-check'
2017/04/07 19:20:08 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:20:49 [INFO] agent: Synced check 'vault:10.105.43.61:8200:vault-sealed-check'
2017/04/07 19:22:43 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
Consul Logs 10.105.23.201
2017/04/07 19:18:10 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:20:10 [WARN] agent: Check 'vault:10.105.23.201:8200:vault-sealed-check' missed TTL, is now critical
2017/04/07 19:20:10 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:21:54 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:23:05 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:24:05 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:25:04 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:26:02 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:26:59 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:27:57 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:28:56 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:29:54 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
Consul Logs 10.105.43.61
2017/04/07 19:20:08 [WARN] agent: Check 'vault:10.105.23.201:8200:vault-sealed-check' missed TTL, is now critical
2017/04/07 19:20:08 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:20:09 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:21:09 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'
2017/04/07 19:21:54 [WARN] agent: Check 'vault:10.105.23.201:8200:vault-sealed-check' missed TTL, is now critical
2017/04/07 19:21:54 [INFO] agent: Synced check 'vault:10.105.23.201:8200:vault-sealed-check'