deregister services from a dead node

1,310 views
Skip to first unread message

Webert de Souza Lima

unread,
Jul 29, 2016, 10:47:43 AM7/29/16
to Consul
Greetings!

I have many small clusters with 3 Consul Masters and 3 Consul Agents each. I have the Consul service running on the host machine, while all my services runs in Docker Containers.
All my containers starts with a script that registers its services using the host's agent, and deregisters using the same agent right after the service dies.

That way works very, very well. I can put up and down as many containers as I want everytime I want and all nodes in the cluster are able to discover services (workers) through DNS, using consul.
Recently I had a poor performance on a very important service, because one node died (the machine froze completely) but DNS queries kept returning the services that were registered using this 
machine's agent. I couldn't deregistering those services, as using other agents, the request (using HTTP API) didn't have any effect.

I had to wait for about 30 minutes until I had that server back online, deregistering all the services using it's own agent (and it worked), then starting my containers again so they would be re-registered.

So, is there a way to deregister a service that was registered on a agent that is now dead? Thanks in advance.

Nick Wales

unread,
Jul 29, 2016, 4:21:26 PM7/29/16
to Consul
I believe if you run the below it will remove the node and all its services:

consul force-leave <node_name> 

I've had similar issues, where removing from the catalog only worked temporarily.

James Phillips

unread,
Aug 17, 2016, 8:42:12 PM8/17/16
to consu...@googlegroups.com
Hi,

Consul should take care of this for you if you were running the Consul agent on the node that died. There's a built-in health check that Consul performs under the hood that should be able to detect a frozen machine (it's called serfHealth). The serfHealth check is implicitly AND-ed with the service-level checks for that node, so if the node dies, Consul won't return any services on that node via DNS. If it was still serving these, I'd be interesting in chasing down what happened there.

-- James

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/54c61e10-2392-4209-bed3-e281c502900c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Webert de Souza Lima

unread,
Sep 8, 2016, 2:41:02 PM9/8/16
to Consul
Yes, it happens as describe if the machine crashes (kernel panic). So I must manually deregister each service after the node is back up. Until there, I can't even deregister from other nodes.

James Phillips

unread,
Sep 20, 2016, 11:08:57 PM9/20/16
to consu...@googlegroups.com
Is it possible you are using the /v1/catalog API to register services
vs. using the /v1/agent API on the agent where the service is running?
It sounds like things are getting registered against some other node
(or no node).

On Thu, Sep 8, 2016 at 5:42 AM, Webert de Souza Lima
<weber...@gmail.com> wrote:
> Unfortunately this is not happening as expected here. Whenever one of our
> nodes dies (freezes, kernel panics, forcefully shutdown, etc), DNS queries
> on other nodes keep returning the dead node's services.
> And even when this node returns to life I have to manually deregister all of
> it's services (using the HTTP API) and register them again.
>>> email to consul-tool...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/consul-tool/54c61e10-2392-4209-bed3-e281c502900c%40googlegroups.com.
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/2c6e28c2-6619-4a46-85f2-89dd9a13053e%40googlegroups.com.

Webert de Souza Lima

unread,
Sep 21, 2016, 12:22:56 PM9/21/16
to Consul
Hi,

not really. I'm using /v1/agent/service/register and /v1/agent/service/deregister.
registration and deregistration are fine as long as they happen on the same node.

The problem is only when that node dies unexpectedly so I cannot deregister.

James Phillips

unread,
Sep 21, 2016, 12:38:08 PM9/21/16
to consu...@googlegroups.com
The only thing I can think of is that the serfHealth check is getting
removed as one of the node-level checks somehow. Can you take a look
at the result of something like
https://demo.consul.io/v1/health/service/redis?pretty for one of your
services and see if the serfHealth check is there?

It's probably worth opening a GitHub issue with any repro info you can
provide so we can see what's going on here.

On Wed, Sep 21, 2016 at 9:22 AM, Webert de Souza Lima
> https://groups.google.com/d/msgid/consul-tool/00744cb5-28be-4837-8bf2-274a318eed7f%40googlegroups.com.

Webert de Souza Lima

unread,
Sep 21, 2016, 12:48:55 PM9/21/16
to consu...@googlegroups.com
Right now everything is fine and the checks are there as you asked. 
I'll try to get a repro run on this as soon as possible to check it then and so I can open that issue.
Thanks for replying, I'll try to bring more info in a few days.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAGoWc05x9mAn2FdUwr2baEWRpKYjm8Z2cK%3D3eS1Nigthu%3DCBiA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages