Single node cluster fails to elect itself leader, even with bootstrap

2,140 views
Skip to first unread message

Clay Bowen

unread,
Mar 28, 2016, 1:49:45 PM3/28/16
to Consul
I'm transitioning off of a datacenter, and I've moved Consul to a different location, but due to some limitations I've had to leave one server in a datacenter.  That server, however, refuses to act as a leader.

When I start, this is what I get:

[root@vault ~]# tail -f /var/log/consul
        Datacenter: 'devprodaws'
            Server: true (bootstrap: true)
       Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
      Cluster Addr: 10.16.19.138 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2016/03/28 17:46:05 [ERR] agent: failed to sync remote state: No cluster leader
    2016/03/28 17:46:07 [WARN] raft: Heartbeat timeout reached, starting election
    2016/03/28 17:46:07 [ERR] consul: failed to reconcile member: {vault.<company>-external.com 10.16.19.138 8301 map[build:0.6.1dev:52ac5530 port:8300 bootstrap:1 role:consul dc:devprodaws vsn:2 vsn_min:1 vsn_max:3] alive 1 3 2 2 4 4}: No cluster leader
    2016/03/28 17:46:07 [ERR] consul: failed to reconcile: No cluster leader
    2016/03/28 17:46:08 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.

It continues to complain about no leader endlessly.

Config is this:
[root@vault ~]# cat /opt/consul/consul.json
{
  "datacenter": "devprodAWS",
  "data_dir": "/opt/consul/data",
  "bind_addr" : "0.0.0.0",
  "client_addr" : "0.0.0.0",
  "log_level": "warn",
  "ui_dir": "/opt/consul/ui",
  "server": true,
  "bootstrap_expect": 1,
  "retry_max": 10,
  "retry_interval": "20s",
  "ports" : {
       "dns": 53,
       "http": 8500,
       "https": -1,
       "rpc": 8400,
       "serf_lan": 8301,
       "serf_wan": 8302,
       "server": 8300
  }
}


Clay Bowen

unread,
Mar 28, 2016, 2:39:57 PM3/28/16
to Consul
Consul info:

[root@vault ~]# consul info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 26
build:
        prerelease = dev
        revision = 52ac5530
        version = 0.6.1
consul:
        bootstrap = true
        known_datacenters = 1
        leader = false
        server = true
raft:
        applied_index = 3506210
        commit_index = 3506210
        fsm_pending = 0
        last_contact = never
        last_log_index = 3506210
        last_log_term = 10944
        last_snapshot_index = 3500890
        last_snapshot_term = 10936
        num_peers = 0
        state = Follower
        term = 10944
runtime:
        arch = amd64
        cpu_count = 1
        goroutines = 59
        max_procs = 1
        os = linux
        version = go1.5.1
serf_lan:
        encrypted = false
        event_queue = 1
        event_time = 2
        failed = 0
        intent_queue = 1
        left = 0
        member_time = 2
        members = 1
        query_queue = 0
        query_time = 1
serf_wan:
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Sean Chittenden

unread,
Mar 29, 2016, 3:34:18 AM3/29/16
to Consul
Hello Clay.  How did the other servers leave the cluster?  In the meantime, can you change your `"bootstrap_expect": 1` config option to just `"bootstrap": true` and see if that works for your single-node datacenter "cluseter" ?  -sc

Clay Bowen

unread,
Mar 29, 2016, 11:53:58 AM3/29/16
to Consul
Hey Sean.  Yeah, I tried it with just "bootstrap" and nothing.  Since I'm using this consul for a backend to Vault, I just backed up the KV store (using consul-backup) from the migrated consul.  I then renamed the "data" directory under consul to a different name and restarted consul.  Came up perfectly.  I then restored the KV backup.  I'm able to start Vault, and unseal it, but now I'm getting:

[ERR] core: failed to acquire lock: Existing key does not match lock use

and Vault won't consider itself "active".

I'm still working on it, but if you have any ideas I'm receptive.

Thanks,
Clay

Clay Bowen

unread,
Mar 29, 2016, 12:02:46 PM3/29/16
to Consul
Fixed -- removed the core:lock and core:leader keys in consul (in the Vault KV) and restarted Vault.  Working perfectly now.  Since I had a backup, it was easy to take a chance and remove what could have been important keys as I could just restore if a problem occurred.

Thanks,
Clay
Reply all
Reply to author
Forward
0 new messages