Consul Cluster Leader Election

289 views
Skip to first unread message

William Bengtson

unread,
Sep 23, 2016, 2:40:32 PM9/23/16
to Consul
We are attempting to do a blue/green deployment of consul clusters.  Our scenario is this:

Spin up an initial consul cluster of size 3 which join together (blue).  Spin up an additional consul cluster of size 3 and join existing consul cluster (green).  Cluster now consists of 6 nodes, 3 blue and 3 green.  Consul leader is in the blue nodes.  Delete all 3 blue nodes to make green the only nodes alive.

We spin up and delete the clusters with cloudformation.  We successfully get all the nodes to join and communicate, but when we delete the blue stack in cloudformation, things operate until the leader is deleted and then we get a 500 internal server error when trying to talk the ui.  Querying for the leader reveals " ".

Any thoughts on how to solve this?


Thanks,

Will

nss pradeep

unread,
Sep 23, 2016, 4:29:32 PM9/23/16
to consu...@googlegroups.com
Hi Will,

As you mentioned when querying for leader you get "" which means there is no leader. If there is no leader and your queries will fail with 500 internal error unless to you use stale option in querying.

You can find details about consistency modes of consul in this link (https://www.consul.io/docs/agent/http.html

I would start the debugging process as below:

1. Run the command "consul members" to check the status of consul servers. It is important to find out if when you delete the stack the consul servers of that stack leave the consul cluster gracefully. If they dont then you can see them in "failed" state

2. You run the command "consul monitor --log-level=debug" on any of the consul servers to see the logs which will point possible reasons for no leader.

I hope this helps.

Thanks,
Pradeep NSS

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/8c15d58f-7005-44cf-8842-8dc5cae2144e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Phillips

unread,
Sep 23, 2016, 4:35:09 PM9/23/16
to consu...@googlegroups.com
Hi William,

Are you terminating all three of the old servers at once? If you have
6 servers in the cluster then you can only handle the failure of two
at any time. The best thing in this situation is to walk the count
back down to 3 by making each of the old servers leave and then
shutting them down, one at a time. This will reduce the quorum
requirements as they depart, and leadership should transition to one
of the new servers by the end.

-- James
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.

William Bengtson

unread,
Sep 23, 2016, 9:26:17 PM9/23/16
to consu...@googlegroups.com
Thanks James!  That worked well.  I was able to have one member leave and then destroy my cluster so that only 2 nodes would leave at the same time!

-Will

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/Quih7ywlXAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAGoWc07kKCOisBzcg5pD1xgibH2O_%2BxHXchkaMCNmm0Y4s7rZQ%40mail.gmail.com.

James Phillips

unread,
Sep 23, 2016, 9:30:57 PM9/23/16
to consu...@googlegroups.com
Sounds good. Please be sure that the two that were destroyed actually
left the cluster before they were shut down - if they are still part
of the quorum then things will look ok but you won't be able to lose
another server. You can usually issue a "consul force-leave <node
name>" on them from one of the remaining servers to kick them out. You
can hit https://www.consul.io/docs/agent/http/status.html#status_peers
to make sure everything looks as you expect (or
https://www.consul.io/docs/agent/http/operator.html#raft-configuration
on Consul 0.7 and later, which has more detailed info).

-- James
>> > email to consul-tool...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/consul-tool/8c15d58f-7005-44cf-8842-8dc5cae2144e%40googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> This mailing list is governed under the HashiCorp Community Guidelines -
>> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
>> of those guidelines may result in your removal from this mailing list.
>>
>> GitHub Issues: https://github.com/hashicorp/consul/issues
>> IRC: #consul on Freenode
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Consul" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/consul-tool/Quih7ywlXAI/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> consul-tool...@googlegroups.com.
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/CAMtMDi8%2Bgc1kjLSo1CzRe8LmKh2nVGOVBr%3DCOPBeHkxSsBHbxQ%40mail.gmail.com.

Martin Atkins

unread,
Sep 24, 2016, 1:15:55 AM9/24/16
to Consul
I also make a habit of explicitly stopping each consul agent process after it has left, to prevent accidentally rejoining it either by human error or some traces left elsewhere of the old consul server addresses.

(We put our server addresses in DNS so our other hosts can find them and join on boot, but the DNS doesn't update atomically with the consul leave. Other hosts joining with stale server IP addresses can "helpfully" reconnect the old servers to the gossip pool and cause confusion and strife.)

William Bengtson

unread,
Sep 24, 2016, 2:52:55 PM9/24/16
to consu...@googlegroups.com
Thanks for the info James!


>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/consul-tool/8c15d58f-7005-44cf-8842-8dc5cae2144e%40googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> This mailing list is governed under the HashiCorp Community Guidelines -
>> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
>> of those guidelines may result in your removal from this mailing list.
>>
>> GitHub Issues: https://github.com/hashicorp/consul/issues
>> IRC: #consul on Freenode
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Consul" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/consul-tool/Quih7ywlXAI/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to

>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/consul-tool/CAGoWc07kKCOisBzcg5pD1xgibH2O_%2BxHXchkaMCNmm0Y4s7rZQ%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> To view this discussion on the web visit
>
> For more options, visit https://groups.google.com/d/optout.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/Quih7ywlXAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAGoWc07UGmCnkvdR3noPoVAY1FfVzunWESUYydxh73d%2BW5Ay8A%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages