How can I safely "reset" a Consul cluster?

2,639 views
Skip to first unread message

David Tinker

unread,
Mar 23, 2018, 4:23:18 AM3/23/18
to Consul
Is it sufficient to 
- shutdown all consul instances
- nuke all data dirs: rm -rf /var/consul-data-dir/*
- startup all servers
- startup all clients
?

Our cluster has 5 servers and 12 clients.

In case anyone is wondering why I want to do this ...

We upgraded from 0.7.0 to 1.0.6 to supposedly fix issues de-registering services. Now service de-registration occasionally works, mostly appears to work but the service returns from the dead and sometimes fails with a 500 "unknown service" even when /v1/health/service/xxx lists the service instance. Also it seems that servers cannot leave the cluster ("consul leave" looks like it works but the server still shows up in consul members on other nodes).

We only use Consul for service discovery and our services register themselves so we don't have any data to worry about.

Werner Dijkerman

unread,
Mar 25, 2018, 2:20:20 PM3/25/18
to Consul
Looks like a plan. How do you (de)register services in Consul? It could be that the problem you describe is not related to Consul itself, but with the tool/script registering the service.

Op vrijdag 23 maart 2018 09:23:18 UTC+1 schreef David Tinker:

David Tinker

unread,
Mar 26, 2018, 1:24:07 AM3/26/18
to consu...@googlegroups.com
We de-reg by a PUT to /v1/agent/service/deregister/:service_id. I have
trued this on client nodes, server nodes, the node the service was
registered on etc. with all kinds of mixed results. I will have to
wait until we have one to re-reg to see if the reset sorted all this
out.
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> Community chat: https://gitter.im/hashicorp-consul/Lobby
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Consul" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/consul-tool/r8OM_2gj6TU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/be2b300a-7e65-41ba-a273-0f7d1a8efac4%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Preetha Appan

unread,
Mar 27, 2018, 11:16:01 AM3/27/18
to Consul
Hi 
The /agent/service/deregister endpoint will only successfully remove services that agent knows about. It relies on removing it from its internal state, and using anti entropy  syncing to remove it from the catalog. It is usually syncing every few seconds so it should become consistent soon. Maybe you tried on an agent that didn't have the service? 

Anyway, the rest of the steps you mentioned to clear out state and start fresh are correct. Note that the client agents also store some info about its services and checks, so clear out the data directory on both clients and servers. 

Hope this helps
Preetha

On Monday, March 26, 2018 at 12:24:07 AM UTC-5, David Tinker wrote:
We de-reg by a PUT to /v1/agent/service/deregister/:service_id. I have
trued this on client nodes, server nodes, the node the service was
registered on etc. with all kinds of mixed results. I will have to
wait until we have one to re-reg to see if the reset sorted all this ncin

David Tinker

unread,
May 2, 2018, 4:14:09 AM5/2/18
to consu...@googlegroups.com
i am still having issues de-registering services .. it seems i have to
run the PUT on the node that the service originally registered with ..
if i run it on other nodes then the call appears to succeed but the
service comes back

# consul version
Consul v1.0.6
Protocol 2 spoken by default, understands 2 to 3 (agent will
automatically use protocol >2 when speaking to compatible agents)

$ curl -v -XPUT
"http://127.0.0.1:8500/v1/agent/service/deregister/xxx_service_id:8141?token=xxx
< HTTP/1.1 200 OK
< Content-Length: 0

any ideas? this is a bit of a pain and will be a real issue if the
original machine is no longer available
>> > consul-tool...@googlegroups.com.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/consul-tool/be2b300a-7e65-41ba-a273-0f7d1a8efac4%40googlegroups.com.
>> >
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> Community chat: https://gitter.im/hashicorp-consul/Lobby
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Consul" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/consul-tool/r8OM_2gj6TU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/1ab54127-bf77-4b3a-978b-dc670bddbc86%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages