Unable to add consul server to cluster

753 views
Skip to first unread message

ch...@iodinesoftware.com

unread,
May 14, 2018, 12:16:27 PM5/14/18
to Consul
We are in a peculiar situation in which we can't add a consul server to our cluster. We have a cluster of two which reports as healthy. Anytime we add a third (we've tried adding a few different times using different servers, all fail in the same way), we can't seem to get this to work.

consul02 is one of the existing healthy servers
consul05 is the new one (completely clean install)
both are running 1.0.6
they are on the same network segment
Both are VM's running RHEL on VMWare

On the existing leader, we see this in the logs:
[WARN] raft: AppendEntries to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300} rejected, sending older logs (next: 1)
[ERR] raft: Failed to get log at index 1: log not found


Is there a fix for this?

With two members, the autopilot health endpoint reports healthy
{
  "Healthy": true,
  "FailureTolerance": 0,
  "Servers": [
{
  "Healthy": true,
  "FailureTolerance": 0,
  "Servers": [
    {
      "ID": "0824f824-2cb5-f756-1210-9542f998bc7e",
      "Name": "consul02",
      "Address": "172.22.10.50:8300",
      "SerfStatus": "alive",
      "Version": "1.0.6",
      "Leader": true,
      "LastContact": "0s",
      "LastTerm": 55,
      "LastIndex": 86675246,
      "Healthy": true,
      "Voter": true,
      "StableSince": "2018-05-14T15:51:55Z"
    },
    {
      "ID": "81af471c-2521-4c28-fd37-4ecf24609f3f",
      "Name": "consul03",
      "Address": "172.22.10.51:8300",
      "SerfStatus": "alive",
      "Version": "1.0.6",
      "Leader": false,
      "LastContact": "54.605909ms",
      "LastTerm": 55,
      "LastIndex": 86675246,
      "Healthy": true,
      "Voter": true,
      "StableSince": "2018-05-14T15:57:48Z"
    }
]
}

With three nodes, it's unhealthy
{
  "Healthy": false,
  "FailureTolerance": 0,
  "Servers": [
    {
      "ID": "0824f824-2cb5-f756-1210-9542f998bc7e",
      "Name": "consul02",
      "Address": "172.22.10.50:8300",
      "SerfStatus": "alive",
      "Version": "1.0.6",
      "Leader": true,
      "LastContact": "0s",
      "LastTerm": 55,
      "LastIndex": 86674161,
      "Healthy": true,
      "Voter": true,
      "StableSince": "2018-05-14T15:51:55Z"
    },
    {
      "ID": "81af471c-2521-4c28-fd37-4ecf24609f3f",
      "Name": "consul03",
      "Address": "172.22.10.51:8300",
      "SerfStatus": "alive",
      "Version": "1.0.6",
      "Leader": false,
      "LastContact": "31.579345ms",
      "LastTerm": 55,
      "LastIndex": 86674161,
      "Healthy": true,
      "Voter": true,
      "StableSince": "2018-05-14T15:38:49Z"
    },
    {
      "ID": "f278b7e7-3524-788c-5854-67c84533d161",
      "Name": "consul05",
      "Address": "172.22.10.52:8300",
      "SerfStatus": "alive",
      "Version": "1.0.6",
      "Leader": false,
      "LastContact": "-1ns",
      "LastTerm": 0,
      "LastIndex": 0,
      "Healthy": false,
      "Voter": false,
      "StableSince": "0001-01-01T00:00:00Z"
    }
  ]
}


Here's the full consul logs
consul05
-- Logs begin at Mon 2018-05-14 01:24:49 EDT, end at Mon 2018-05-14 12:01:02 EDT. --
May 14 11:51:53 consul05 systemd[1]: Started consul agent.
May 14 11:51:53 consul05 systemd[1]: Starting consul agent...
May 14 11:51:53 consul05 consul[29692]: bootstrap_expect > 0: expecting 3 servers
May 14 11:51:53 consul05 consul[29692]: ==> Starting Consul agent...
May 14 11:51:53 consul05 consul[29692]: ==> Consul agent running!
May 14 11:51:53 consul05 consul[29692]: Version: 'v1.0.6'
May 14 11:51:53 consul05 consul[29692]: Node ID: 'f278b7e7-3524-788c-5854-67c84533d161'
May 14 11:51:53 consul05 consul[29692]: Node name: 'consul05'
May 14 11:51:53 consul05 consul[29692]: Datacenter: 'aus' (Segment: '<all>')
May 14 11:51:53 consul05 consul[29692]: Server: true (Bootstrap: false)
May 14 11:51:53 consul05 consul[29692]: Client Addr: [172.22.10.52] (HTTP: 8500, HTTPS: -1, DNS: 8600)
May 14 11:51:53 consul05 consul[29692]: Cluster Addr: 172.22.10.52 (LAN: 8301, WAN: 8302)
May 14 11:51:53 consul05 consul[29692]: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
May 14 11:51:53 consul05 consul[29692]: ==> Log data will now stream in as it occurs:
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] raft: Initial configuration (index=0): []
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] raft: Node at 172.22.10.52:8300 [Follower] entering Follower state (Leader: "")
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] serf: EventMemberJoin: consul05.aus 172.22.10.52
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] serf: EventMemberJoin: consul05 172.22.10.52
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Adding LAN server consul05 (Addr: tcp/172.22.10.52:8300) (DC: aus)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Started DNS server 172.22.10.52:8600 (udp)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Handled member-join event for server "consul05.aus" in area "wan"
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Started DNS server 172.22.10.52:8600 (tcp)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Started HTTP server on 172.22.10.52:8500 (tcp)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: started state syncer
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Joining LAN cluster...
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: (LAN) joining: [172.22.10.50 172.22.10.51 172.22.10.52]
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] serf: EventMemberJoin: consul02.aus 172.22.10.50
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47184
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] serf: EventMemberJoin: consul03.aus 172.22.10.51
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Handled member-join event for server "consul02.aus" in area "wan"
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Handled member-join event for server "consul03.aus" in area "wan"
... a bunch of EventMemberJoin for various client agents here
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Adding LAN server consul02 (Addr: tcp/172.22.10.50:8300) (DC: aus)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Existing Raft peers reported by consul03, disabling bootstrap mode
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] consul: Adding LAN server consul03 (Addr: tcp/172.22.10.51:8300) (DC: aus)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [WARN] raft: Failed to get previous log: 86673712 log not found (last: 0)
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: (LAN) joined: 3 Err: <nil>
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Join LAN completed. Synced with 3 initial agents
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] snapshot: Creating new snapshot at /var/lib/consul/data/raft/snapshots/54-86670320-1526313113219.tmp
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47186
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47188
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [INFO] agent: Synced node info
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47190
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47194
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47196
May 14 11:51:53 consul05 consul[29692]: 2018/05/14 11:51:53 [ERR] raft-net: Failed to decode incoming command: read tcp 172.22.10.52:8300->172.22.10.50:47196: read: connection reset by peer
May 14 11:52:04 consul05 consul[29692]: 2018/05/14 11:52:04 [DEBUG] raft-net: 172.22.10.52:8300 accepted connection from: 172.22.10.50:47212
... repeat for many more incoming connections
May 14 11:52:11 consul05 consul[29692]: ==> Newer Consul version available: 1.1.0 (currently running: 1.0.6)
... repeat for many more incoming connections
May 14 11:56:02 consul05 consul[29692]: 2018/05/14 11:56:02 [ERR] yamux: keepalive failed: i/o deadline reached
May 14 11:56:02 consul05 consul[29692]: 2018/05/14 11:56:02 [ERR] yamux: Failed to read stream data: read tcp 172.22.10.52:8300->172.16.20.20:34057: use of closed network connection
May 14 11:56:02 consul05 consul[29692]: 2018/05/14 11:56:02 [WARN] yamux: failed to send go away: session shutdown
May 14 11:56:02 consul05 consul[29692]: 2018/05/14 11:56:02 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=172.16.20.20:34057
---

consul02
-- Logs begin at Fri 2018-04-13 23:20:56 CDT, end at Mon 2018-05-14 11:02:04 CDT. --
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] serf: EventMemberJoin: consul05 172.22.10.52
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] consul: Adding LAN server consul05 (Addr: tcp/172.22.10.52:8300) (DC: aus)
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] raft: Updating configuration with AddNonvoter (f278b7e7-3524-788c-5854-67c84533d161, 172.22.10.52:8300) to [{Suffrage:Voter ID:0824f824-2cb5-f756-1210-9542f998bc7e Address:172.22.10.50:8300} {Suffrage:Voter ID:81af471c-2521-4c28-fd37-4ecf24609f3f Address:172.22.10.51:8300} {Suffrage:Nonvoter ID:f278b7e7-3524-788c-5854-67c84533d161 Address:172.22.10.52:8300}]
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] raft: Added peer f278b7e7-3524-788c-5854-67c84533d161, starting replication
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] serf: EventMemberJoin: consul05.aus 172.22.10.52
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] consul: Handled member-join event for server "consul05.aus" in area "wan"
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [INFO] consul: member 'consul05' joined, marking health alive
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [WARN] raft: AppendEntries to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300} rejected, sending older logs (next: 1)
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to get log at index 1: log not found
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to install snapshot 54-86670320-1526312303504: readfrom tcp 172.22.10.50:47184->172.22.10.52:8300: write tcp 172.22.10.50:47184->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to send snapshot to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300}: readfrom tcp 172.22.10.50:47184->172.22.10.52:8300: write tcp 172.22.10.50:47184->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to get log at index 1: log not found
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to install snapshot 54-86670320-1526312303504: readfrom tcp 172.22.10.50:47186->172.22.10.52:8300: write tcp 172.22.10.50:47186->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to send snapshot to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300}: readfrom tcp 172.22.10.50:47186->172.22.10.52:8300: write tcp 172.22.10.50:47186->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:53 consul02 consul[11724]: 2018/05/14 10:51:53 [ERR] raft: Failed to get log at index 1: log not found
... the above repeats a few more times with different outbound ephemeral ports
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [ERR] raft: Failed to install snapshot 54-86670320-1526312303504: readfrom tcp 172.22.10.50:47200->172.22.10.52:8300: write tcp 172.22.10.50:47200->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [ERR] raft: Failed to send snapshot to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300}: readfrom tcp 172.22.10.50:47200->172.22.10.52:8300: write tcp 172.22.10.50:47200->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [WARN] consul: error getting server health from "consul05": context deadline exceeded
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [WARN] consul: error getting server health from "consul02": context deadline exceeded
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [ERR] raft: Failed to get log at index 1: log not found
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [ERR] raft: Failed to install snapshot 54-86670320-1526312303504: readfrom tcp 172.22.10.50:47202->172.22.10.52:8300: write tcp 172.22.10.50:47202->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:54 consul02 consul[11724]: 2018/05/14 10:51:54 [ERR] raft: Failed to send snapshot to {Nonvoter f278b7e7-3524-788c-5854-67c84533d161 172.22.10.52:8300}: readfrom tcp 172.22.10.50:47202->172.22.10.52:8300: write tcp 172.22.10.50:47202->172.22.10.52:8300: write: connection reset by peer
May 14 10:51:55 consul02 consul[11724]: 2018/05/14 10:51:55 [WARN] consul: error getting server health from "consul05": last request still outstanding
May 14 10:51:56 consul02 consul[11724]: 2018/05/14 10:51:56 [ERR] raft: Failed to get log at index 1: log not found
---

Reply all
Reply to author
Forward
0 new messages