Re: [consul] Cluster leader removes itself and becomes a follower leaving cluster without a leader

633 views
Skip to first unread message

James Phillips

unread,
Dec 16, 2015, 6:21:24 PM12/16/15
to consu...@googlegroups.com
Hi Chris,

Are those other servers still listed in the peers.json file (see https://www.consul.io/docs/guides/outage.html for a description of that file)? If they are then Raft thinks there should be 3 servers so it's losing quorum. If you delete the entries for the failed servers and try again it should be able to remain as the leader. 

-- James

On Wed, Dec 16, 2015 at 3:09 PM, Chris White <cjw9...@gmail.com> wrote:
I have an issue where I have lost 2 out of my 3 consul servers and need to transition my cluster temporarily down to a single consul server, so I set the bootstrap_expect value to 1 and restart the server. When consul starts up I see the following in the logs and I cannot use the service, which I need as I there are key value pairs stored in the raft DB that are used by other services in the system. Why is consul removing itself from the leadership role, leaving the cluster without a leader?

==> WARNING: LAN keyring exists but -encrypt given, using keyring

==> WARNING: WAN keyring exists but -encrypt given, using keyring

==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.

==> WARNING: Bootstrap mode enabled! Do not enable unless necessary

==> Starting raft data migration...

==> Starting Consul agent...

==> Starting Consul agent RPC...

==> Consul agent running!

         Node name: 'chris-mnc5.ilabs.io'

        Datacenter: 'dc1'

            Server: true (bootstrap: true)

       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: -1, RPC: 8400)

      Cluster Addr: 10.6.1.7 (LAN: 8301, WAN: 8302)

    Gossip encrypt: true, RPC-TLS: true, TLS-Incoming: true

             Atlas: <disabled>


==> Log data will now stream in as it occurs:


    2015/12/16 14:11:13 [INFO] raft: Node at 10.6.1.7:8300 [Follower] entering Follower state

    2015/12/16 14:11:13 [INFO] serf: EventMemberJoin: chris-mnc5 10.6.1.7

    2015/12/16 14:11:13 [ERR] agent: failed to sync remote state: No cluster leader

    2015/12/16 14:11:13 [INFO] consul: adding server chris-mnc5 (Addr: 10.6.1.7:8300) (DC: dc1)


    2015/12/16 14:11:13 [INFO] serf: EventMemberJoin: chris-mnc5.dc1 10.6.1.7

    2015/12/16 14:11:13 [WARN] serf: Failed to re-join any previously known node

    2015/12/16 14:11:13 [INFO] consul: adding server chris-mnc5.dc1 (Addr: 10.6.1.7:8300) (DC: dc1)

    2015/12/16 14:11:13 [WARN] serf: Failed to re-join any previously known node

    2015/12/16 14:11:15 [WARN] raft: Heartbeat timeout reached, starting election

    2015/12/16 14:11:15 [INFO] raft: Node at 10.6.1.7:8300 [Candidate] entering Candidate state

    2015/12/16 14:11:15 [INFO] raft: Election won. Tally: 1

    2015/12/16 14:11:15 [INFO] raft: Node at 10.6.1.7:8300 [Leader] entering Leader state

    2015/12/16 14:11:15 [INFO] consul: cluster leadership acquired

    2015/12/16 14:11:15 [INFO] consul: New leader elected: chris-mnc5   

    2015/12/16 14:11:15 [INFO] raft: Disabling EnableSingleNode (bootstrap)

    2015/12/16 14:11:15 [INFO] raft: Added peer 10.6.31.17:8300, starting replication

    2015/12/16 14:11:15 [INFO] raft: Added peer 10.6.31.18:8300, starting replication

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused    

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused    

    2015/12/16 14:11:15 [ERR] raft: Failed to heartbeat to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused    

    2015/12/16 14:11:15 [ERR] raft: Failed to heartbeat to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused

====>

    2015/12/16 14:11:15 [ERR] raft: Failed to heartbeat to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to heartbeat to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to AppendEntries to 10.6.31.17:8300: dial tcp 10.6.31.17:8300: connection refused

    2015/12/16 14:11:15 [ERR] raft: Failed to heartbeat to 10.6.31.18:8300: dial tcp 10.6.31.18:8300: connection refused

    2015/12/16 14:11:15 [INFO] raft: Removed peer 10.6.31.17:8300, stopping replication (Index: 1240)

    2015/12/16 14:11:15 [INFO] raft: Removed peer 10.6.31.18:8300, stopping replication (Index: 1240)

    2015/12/16 14:11:15 [INFO] raft: Removed ourself, transitioning to follower


--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/b398846d-1796-4032-b1cb-eee0e2ab9dce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris White

unread,
Dec 16, 2015, 6:32:13 PM12/16/15
to Consul
No I removed them from peers.json before restarting the server.

James Phillips

unread,
Dec 16, 2015, 7:28:49 PM12/16/15
to consu...@googlegroups.com
I think there are entries in the Raft log that the leader is replaying that add the failed servers back, causing quorum to step back up. Unfortunately, the special "elect yourself" bootstrap only happens once at startup, so once a single node steps down I don't think it will step back up because the Raft configuration's DisableBootstrapAfterElect setting defaults to true because in general this is super dangerous.

If you can roll your own build of Consul you could defeat this by setting that config item to false right after this line - https://github.com/hashicorp/consul/blob/fed14582145987a6f56f4e21c841f9bfd97d411f/consul/config.go#L301. That should cause it to step back up after it has removed the failed servers once it completes running through the entire Raft log. You'd definitely need to get rid of this before introducing your new servers.

Another option is if you take your server down, back up your Consul data-dir, and then remove the raft.db file from there, this will wipe out the Raft log and should allow the bootstrap to work with the leader stepping down. This will roll back your state to the last snapshot, though, so you would lose data written since that snapshot (the snapshots are also in the data-dir so you can see how recent they are by date).

Sorry about this - Consul is wired to try to protect single servers from going rogue which makes going from a cluster of 3 to 1 very difficult.

Chris White

unread,
Dec 16, 2015, 7:30:56 PM12/16/15
to consu...@googlegroups.com
Thank you this is very helpful.

You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/CJWXA7dMHw4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAGoWc079jWAwbjLZYf%3DP%3DQHVZHBLnEqF0%3DAF0csWDfN5CSiTeA%40mail.gmail.com.

Chris White

unread,
Dec 16, 2015, 11:26:01 PM12/16/15
to Consul
How long do entries stay in the raft database for servers that have been removed using 'force-leave'. The documentation states "Consul periodically tries to reconnect to "failed" nodes in case it is a network partition. After some configured amount of time (by default 72 hours), Consul will reap "failed" nodes and stop trying to reconnect. The force-leave command can be used to transition the "failed" nodes to "left" nodes more quickly."
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/CJWXA7dMHw4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool+unsubscribe@googlegroups.com.

James Phillips

unread,
Dec 17, 2015, 12:40:46 AM12/17/15
to consu...@googlegroups.com
Doing a force-leave will cause them to enter the left state and they will be removed completely in 72 hours. I think in your case the problem isn't that, but that the Raft log contains events relating to those peers. Given enough time (basically 8192 Raft write operations, like a server event, KV update, catalog update, etc. as configured by default) Consul will take a new snapshot and then truncate the Raft log. This will clear them out and would allow the bootstrap to work as well, but it can take several hours to get a snapshot to happen.


--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/CJWXA7dMHw4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool...@googlegroups.com.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.

Chris White

unread,
Dec 17, 2015, 3:43:32 PM12/17/15
to Consul
James,

Looks I got around the problem by specifying bootstrap: true instead of bootstrap_expect:1 in the consul config. Now the server stays as leader.

James Phillips

unread,
Dec 17, 2015, 4:27:24 PM12/17/15
to consu...@googlegroups.com
Thanks for that follow up and sorry I didn't think of that. I thought those were equivalent so I'll have to look at that code path again. 
Reply all
Reply to author
Forward
0 new messages