How to gracefully transfer leadership from one server to another?

Steve H

unread,

Mar 14, 2017, 10:02:39 AM3/14/17

to Consul

Hi,

We have a 5 server cluster we are observing downtime of the KV service on the remaining servers when we stop the agent on the server that currently holds the leadership role. It appears that this downtime is for the duration of the leadership election among the remaining servers. This is only a couple of seconds, but it seems odd that there would be any downtime at all in a cluster with 5 (or 4 during a reboot) servers?

We're running 0.75 on Ubuntu 16.04.

# consul members

Node Address Status Type Build Protocol DC

STEVE-DESKTOP 10.302.101.15:8301 alive client 0.7.5 2 DC1

NODE1 10.302.101.24:8301 alive server 0.7.5 2 DC1

NODE2 10.302.101.8:8301 alive server 0.7.5 2 DC1

NODE3 10.302.101.11:8301 alive server 0.7.5 2 DC1

NODE4 10.302.101.12:8301 alive server 0.7.5 2 DC1

NODE5 10.302.101.13:8301 alive server 0.7.5 2 DC1

The config on each of the servers is as follows (with differing node names):

{

"data_dir": "/opt/consul",

"datacenter": "DC1",

"log_level": "INFO",

"node_name": "NODEx",

"performance": {

"raft_multiplier": 1

},

"rejoin_after_leave": true,

"server": true,

"ui": true

}

To provide an example of the downtime we observe that NODE1 is the leader:

# consul info

agent:

check_monitors = 0

check_ttls = 0

checks = 0

services = 1

build:

prerelease =

revision = '21f2d5a

version = 0.7.5

consul:

bootstrap = false

known_datacenters = 1

leader = true

leader_addr = 10.2.101.8:8300

server = true

raft:

applied_index = 131168

commit_index = 131168

fsm_pending = 0

last_contact = 0

last_log_index = 131168

last_log_term = 129

last_snapshot_index = 124469

last_snapshot_term = 129

latest_configuration = [{Suffrage:Voter ID:10.2.101.11:8300 Address:10.2.101.11:8300} {Suffrage:Voter ID:10.2.101.8:8300 Address:10.2.101.8:8300} {Suffrage:Voter ID:10.2.101.24:8300 Address:10.2.101.24:8300} {Suffrage:Voter ID:10.2.101.12:8300 Address:10.2.101.12:8300} {Suffrage:Voter ID:10.2.101.13:8300 Address:10.2.101.13:8300}]

latest_configuration_index = 94583

num_peers = 4

protocol_version = 1

protocol_version_max = 3

protocol_version_min = 0

snapshot_version_max = 1

snapshot_version_min = 0

state = Leader

term = 129

runtime:

arch = amd64

cpu_count = 16

goroutines = 100

max_procs = 16

os = linux

version = go1.7.5

serf_lan:

encrypted = false

event_queue = 0

event_time = 29

failed = 0

health_score = 0

intent_queue = 0

left = 0

member_time = 26

members = 6

query_queue = 0

query_time = 1

serf_wan:

encrypted = false

event_queue = 0

event_time = 1

failed = 0

health_score = 0

intent_queue = 0

left = 0

member_time = 2

members = 1

query_queue = 0

query_time = 1

On NODE2 we run a contrived continuous loop putting and getting a value in the KV store:

while true

do

consul kv get test/test

consul kv put test/test "$(date)"

done

This outputs a continuous stream of something like this:
...

Tue 14 Mar 13:44:56 GMT 2017

Success! Data written to: test/test

Tue 14 Mar 13:44:57 GMT 2017

Success! Data written to: test/test

...

If we then follow the instructions in "Stopping an Agent" on https://www.consul.io/docs/agent/basics.html and send

kill -INT consul_pid

on NODE1

and on NODE2 we get:

...

Tue 14 Mar 13:46:50 GMT 2017

Success! Data written to: test/test

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused)

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused)

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused)

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused)

Error querying Consul agent: Unexpected response code: 500

Success! Data written to: test/test

Tue 14 Mar 13:46:50 GMT 2017

Success! Data written to: test/test

Tue 14 Mar 13:46:53 GMT 2017

Success! Data written to: test/test

...

If we just perform a straight reboot of the leader server the number of 500 errors is much higher, so it seems that systemd is probably sending the default SIGTERM and doing a less graceful shutdown.

So our questions are: How can I make the shutdown of the agent service which is currently leader more graceful? Is there a way to force an election without shutting down the current leader? Is there an alternate signal that will wait for the leadership election to complete prior to the current leader going offline? Should I lower my expectations as to the availability of the KV service and account for downtime everywhere we use it? Is it possible to make the kv command retry a few times in the event of a 500 error? Have we totally missed something?

Thanks in advance for any assistance!

Best Regards

Steve

Paul Archer

unread,

Mar 14, 2017, 9:24:36 PM3/14/17

to Consul

I'm new to consul myself, but I think this is what you are looking for:
$ consul maint -help
Usage: consul maint [options]

Places a node or service into maintenance mode. During maintenance mode,
the node or service will be excluded from all queries through the DNS
or API interfaces, effectively taking it out of the pool of available
nodes. This is done by registering an additional critical health check.

When enabling maintenance mode for a node or service, you may optionally
specify a reason string. This string will appear in the "Notes" field
of the critical health check which is registered against the node or
service. If no reason is provided, a default value will be used.

Maintenance mode is persistent, and will be restored in the event of an
agent restart. It is therefore required to disable maintenance mode on
a given node or service before it will be placed back into the pool.

By default, we operate on the node as a whole. By specifying the
"-service" argument, this behavior can be changed to enable or disable
only a specific service.

If no arguments are given, the agent's maintenance status will be shown.
This will return blank if nothing is currently under maintenance.

Options:

-enable                    Enable maintenance mode.
-disable                   Disable maintenance mode.
-reason=<string>           Text string describing the maintenance reason
-service=<serviceID>       Control maintenance mode for a specific service ID
-token=""                  ACL token to use. Defaults to that of agent.
-http-addr=127.0.0.1:8500 HTTP address of the Consul agent.

Steve H

unread,

Mar 15, 2017, 7:15:56 AM3/15/17

to Consul

Thanks Paul,

We had looked at the "maint" option, but maybe we're still missing something. On the leader we run:

# consul maint -enable

Node maintenance is now enabled

However this doesn't cause an election to be held and leadership stays with the server even though it is in maintenance. If we then stop the agent process on the server we get the same 500 errors until the leadership election has taken place. Are we using the wrong options for maint or is there a second step to take after putting a node into maintenance?

Thanks & best regards

Steve

James Phillips

unread,

Mar 15, 2017, 11:01:25 AM3/15/17

to consu...@googlegroups.com

Hi Steve,

Consul currently doesn't have a mechanism to gracefully transfer
leadership (and consul maint unfortunately doesn't help here). Having
the current leader leave the cluster will kick off an election, and
there will be a brief period without a leader while that transpires.

We do have retry logic in the RPC client that attempts to hide this
from callers, having them experience just a longer request time. Are
your failing KV writes happening against Consul agents in client mode,
or are they happening against Consul server agents? Since you have the
Raft multiplier set to 1, I'd expect that the 500 errors won't make it
to the clients very often, but that logic may not be present if you
are doing KV writes directly against a Consul server since it's a
slightly different code path.

In general, we do recommend that your app has some retry logic since
it may experience legit 500 errors for a while if a Consul server is
suddenly lost. If you'd like to open a Github issue, we could look at
adding a more graceful mechanism for planned operations that take out
a leader.

-- James

> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/bfc0c105-b913-4aa1-94bc-42e362d7d2e5%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Steve H

unread,

Mar 15, 2017, 12:57:39 PM3/15/17

to Consul

Thanks James,

We see similar behaviour when using the RPC client when the local agent is in either server or client mode during the leadership election. With my contrived tight loop of gets and puts we get output like this on from the agent running in client mode:

(Where batch-1lb/10.2.101.24 is the current leader and the other names/IPs are of other servers.)
...

Success! Data written to: test/test

Wed, Mar 15, 2017 4:27:36 PM

Error! Failed writing data: Unexpected response code: 500 (rpc error: rpc error: stream closed)

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: failed to get conn: dial tcp 10.2.101.24:8300: connectex: No connection could be made because the target machine actively refused it.)

Error querying Consul agent: Unexpected response code: 500

Error! Failed writing data: Unexpected response code: 500 (rpc error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused)

Error querying Consul agent: Unexpected response code: 500

Success! Data written to: test/test

Wed, Mar 15, 2017 4:27:38 PM

...

In the logs for the agent we see this sort of thing :

...

2017/03/15 16:27:36 [ERR] consul: RPC failed to server 10.2.101.13:8300: rpc error: rpc error: stream closed

2017/03/15 16:27:36 [ERR] http: Request PUT /v1/kv/test/test, error: rpc error: rpc error: stream closed from=127.0.0.1:54101

2017/03/15 16:27:36 [ERR] consul: RPC failed to server 10.2.101.12:8300: rpc error: rpc error: stream closed

2017/03/15 16:27:36 [ERR] http: Request GET /v1/kv/test/test, error: rpc error: rpc error: stream closed from=127.0.0.1:54104

2017/03/15 16:27:37 [ERR] consul: RPC failed to server 10.2.101.24:8300: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: connectex: No connection could be made because the target machine actively refused it.

2017/03/15 16:27:37 [ERR] http: Request PUT /v1/kv/test/test, error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: connectex: No connection could be made because the target machine actively refused it. from=127.0.0.1:54108

2017/03/15 16:27:37 [ERR] consul: RPC failed to server 10.2.101.8:8300: rpc error: stream closed

2017/03/15 16:27:37 [ERR] http: Request GET /v1/kv/test/test, error: rpc error: stream closed from=127.0.0.1:54113

2017/03/15 16:27:38 [ERR] consul: RPC failed to server 10.2.101.11:8300: rpc error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused

2017/03/15 16:27:38 [ERR] http: Request PUT /v1/kv/test/test, error: rpc error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused from=127.0.0.1:54116

2017/03/15 16:27:38 [ERR] consul: RPC failed to server 10.2.101.13:8300: rpc error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused

2017/03/15 16:27:38 [ERR] http: Request GET /v1/kv/test/test, error: rpc error: rpc error: failed to get conn: dial tcp 10.2.101.24:8300: getsockopt: connection refused from=127.0.0.1:54119

2017/03/15 16:27:39 [INFO] consul: New leader elected: compute-3lb

2017/03/15 16:27:40 [INFO] memberlist: Suspect batch-1lb has failed, no acks received

...

We seem to be able to achieve the least downtime by issuing a leave command to the server agent instead of sending an SIGINT. This seems to only result in one 500 error and then a block until the leadership election is complete. I'm thinking that we'll change the /lib/systemd/system/consul.service to have:

ExecStop=/usr/local/bin/consul leave

and make sure that "rejoin_after_leave" is always true in our setup. We'll also put retry logic in place in order to deal with both expected and unexpected downtime of the service.

I'll write this all up in a github issue so that this info is there for others if needed and if there is scope to make things more graceful in future that would be grand.

Thanks again and best regards

Steve

Jason W

unread,

Mar 22, 2017, 12:43:26 PM3/22/17

to Consul

Steve,

Thank you for going through the time to setup a reproducible test. We too are new to Consul and were struggling with the same issue. We assumed that it was our ignorance on the best way to set this up, but your work confirms that we did the right thing and that this is indeed an issue.

Not being able to have the leader gracefully leave the cluster is a bit disappointing to me. As a core service, I would expect such a thing to be transparent to the clients. I'm not looking forward to adding retry code to every place in my infrastructure that interacts with Consul.

If you find anything else out I'd be interested in hearing about it. Thank you again for creating a test.

JW

Nick Wales

unread,

Mar 23, 2017, 12:44:44 PM3/23/17

to Consul

Rather than adding retry code you can add ?stale to the requests which will enable any consul server to respond, not just the leader.

If you're expecting high traffic volume to your KVs this is also useful for balancing the load. The details are in here: https://www.consul.io/docs/agent/http/kv.html

Reply all

Reply to author

Forward