Greeting everyone,
Thank you for your time and assistance!
I have a system in-place where we use packer to build AMI's for our nomad server and clients. Then we do a rolling rebuild of all the nodes. For some reason, it appears that while Autopilot is configured to cleanup dead servers but its not working properly. After 24 hours, the old servers and clients are still visible in the cluster status. I am running Nomad 0.9.4 and raft protocol 3.
I was under the impression they should be cleaned up and no longer visible. After rebuilding all the nodes, should I force a GC or use the purge node api to clean them up?
Rebuilding the cluster as follows all done via the http api:
Servers first:
1) pick a non-leader server
2) shut it down
3) wait for replacement node to come online
4) validate health
5) repeat for remaining non-leader servers
6) Do the leader last
then Clients:
1) mark client as ineligible
2) drain node
3) wait for drain to complete
4) shutdown node
5) wait for new node to come online
6) repeat for remaining nodes
$ nomad -version
Nomad v0.9.4 (a81aa846a45fb8248551b12616287cb57c418cd6)
$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
nomad-server-10-2-18-202.us-east-1 10.2.18.202 4648 alive true 2 0.9.4 dc1 us-east-1
nomad-server-10-2-25-181.us-east-1 10.2.25.181 4648 alive false 2 0.9.4 dc1 us-east-1
nomad-server-10-2-38-163.us-east-1 10.2.38.163 4648 left false 2 0.9.4 dc1 us-east-1
nomad-server-10-2-41-128.us-east-1 10.2.41.128 4648 alive false 2 0.9.4 dc1 us-east-1
nomad-server-10-2-45-83.us-east-1 10.2.45.83 4648 left false 2 0.9.4 dc1 us-east-1
nomad-server-10-2-50-201.us-east-1 10.2.50.201 4648 alive false 2 0.9.4 dc1 us-east-1
nomad-server-10-2-61-45.us-east-1 10.2.61.45 4648 alive false 2 0.9.4 dc1 us-east-1
$ nomad node status
ID DC Name Class Drain Eligibility Status
c7bb3ff8 dc1 nomad-client-10-2-27-57 <none> false eligible ready
ff923b32 dc1 nomad-client-10-2-39-95 <none> false eligible ready
afb8793c dc1 nomad-client-10-2-53-5 <none> false eligible ready
068d65ff dc1 nomad-client-10-2-62-191 <none> false ineligible down
9e906e3d dc1 nomad-client-10-2-23-169 <none> false ineligible down
5b4b4a23 dc1 nomad-client-10-2-43-42 <none> false ineligible down
$ nomad operator autopilot get-config
CleanupDeadServers = true
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 10s
EnableRedundancyZones = false
DisableUpgradeMigration = false
EnableCustomUpgrades = false
$ nomad operator raft list-peers
Node ID Address State Voter RaftProtocol
nomad-server-10-2-25-181.us-east-1 3e12083e-b1d5-6580-dcfb-271cbbf61ca7 10.2.25.181:4647 follower true 3 nomad-server-10-2-18-202.us-east-1 3c41866a-8c48-300d-2b61-8988e0167b6c 10.2.18.202:4647 leader true 3 nomad-server-10-2-61-45.us-east-1 c8f8d6d2-d199-44e4-d0f8-ee700469fae5 10.2.61.45:4647 follower true 3 nomad-server-10-2-50-201.us-east-1 9cf0f469-3cba-e77f-d90c-ee60480d9214 10.2.50.201:4647 follower true 3 nomad-server-10-2-41-128.us-east-1 4e40dc46-e475-3576-fa84-68c40aa391f6 10.2.41.128:4647 follower true 3