We're trying to configure Consul (version 0.9.2) to run in a high availability environment. We deploy a Consul cluster to a docker swarm (version 17.05.0-ce) of 3 manager nodes, where each node hosts a docker container running a Consul server. In our development environment, each swarm node is an Ubuntu-based (16.04.2 LTS) virtual machine running under KVM.
I test HA by manually killing (virsh destroy) the VM hosting the Consul leader and then manually restarting (virsh start) the dead VM. I expect to see a new leader elected from among the 2 surviving Consul servers. Once the dead VM is running again, I expect a new instance of Consul server to successfully re-join the cluster. I then repeat by killing the VM hosting the new leader. The number of repetitions never seems to go very high.
For a cluster of 3 Consul servers, I expect to see "raft:num_peers = 2" in the output of "consul info" executed on any server. I have observed values of 0, 1, or 3 for num_peers, and once I have, the cluster starts failing to recover. Consul is configured to run on its own subnet
172.29.20.0/29. Here's a sample test run (the dot notation is used to indicate the IP address of the container that's failed or restarted; e.g. .3 is shorthand for 172.29.20.3):
(1) Kill leader node3 (.3 failed) -> new leader node2; Restore node3 (.6 started) -> joined cluster
(2) Kill leader node2 (.5 failed) -> new leader node1; Restore node2 (.3 started, node1 LOST leadership to node3) -> joined cluster
Node1 LOST leadership to node3:
"[ERR] consul: failed to add raft peer: leadership lost while committing log"
"[ERR] consul: failed to reconcile member: {9d9dc35a40b1 172.29.20.3 8301 map[expect:3 id:b315b134-2efe-0946-4296-244a454fa9bc vsn:2 build:0.9.2:75ca2ca wan_join_port:8302 vsn_min:2 vsn_max:3 port:8300 dc:dc1 role:consul raft_vsn:2] alive 1 5 2 2 5 4}: leadership lost while committing log"
(3) Kill leader node3 (.6 failed) -> Leader election FAILED (raft:num_peers = 1 @node1; raft:num_peers = 0 @node2)
We mount Consul's data-dir to a directory on the host VM because the Consul documentation appears to strongly encourage it:
-data-dir - This flag provides a data directory for the agent to store state. This is required for all agents. The directory should be durable across reboots. This is especially critical for agents that are running in server mode as they must be able to persist cluster state. Additionally, the directory must support the use of filesystem locking, meaning some types of mounted folders (e.g. VirtualBox shared folders) may not be suitable.
The following are the contents of the stackfile we use for docker swarm deployment. Do you see anything wrong with this configuration that would threaten HA operation? Thank you...
version: "3"
services:
consul:
image: consul:0.9.2
# Deploy to all docker manager nodes
deploy:
mode: global
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
environment:
CONSUL_LOCAL_CONFIG: "{disable_update_check: true}"
CONSUL_BIND_INTERFACE: eth0
entrypoint:
- consul
- agent
- -server
- -bootstrap-expect=3
- -config-dir=/consul/config
- -data-dir=/consul/data
- -bind={{ GetInterfaceIP "eth0" }}
- -client=0.0.0.0
- -ui
- -rejoin
- -retry-join=172.29.20.2
- -retry-join=172.29.20.3
- -retry-join=172.29.20.4
- -retry-join=172.29.20.5
- -retry-join=172.29.20.6
- -retry-join=172.29.20.7
networks:
- net
- voltha-net
ports:
- "8300:8300"
- "8400:8400"
- "8500:8500"
- "8600:8600/udp"
volumes:
- /consul/data:/consul/data
networks:
net:
driver: overlay
driver_opts:
encrypted: "true"
ipam:
driver: default
config:
voltha-net:
external:
name: voltha_net