Terraform consul_service gets removed automatically after adding

251 views
Skip to first unread message

Ziv Meidav

unread,
Jan 30, 2019, 7:44:15 AM1/30/19
to Consul
Hey all,

I have a bizarre issue with registering a new service using terraform.

I'm using the consul provider and trying to use the resource "consul_service" in order to create a new service.

However, any service registered with this resource, gets automatically de-registered after 30-60 seconds.

However if I use the deprecated resource consul_agent_resource it does work and stay alive.

This issue reproduces on both my production cluster and dev consul agent.

Here are the logs:

    2019/01/30 14:37:20 [DEBUG] http: Request GET /v1/catalog/node/test-node?dc=dc1 (200.426µs) from=127.0.0.1:52233
    2019/01/30 14:37:20 [INFO] agent: Synced service "test-name-working"
    2019/01/30 14:37:20 [DEBUG] agent: Node info in sync
    2019/01/30 14:37:20 [DEBUG] http: Request PUT /v1/agent/service/register?dc=dc1 (406.648µs) from=127.0.0.1:52232
    2019/01/30 14:37:20 [DEBUG] agent: Service "test-name-working" in sync
    2019/01/30 14:37:20 [DEBUG] agent: Node info in sync
    2019/01/30 14:37:20 [DEBUG] agent: Service "test-name-working" in sync
    2019/01/30 14:37:20 [DEBUG] agent: Node info in sync
    2019/01/30 14:37:20 [DEBUG] http: Request GET /v1/agent/services?dc=dc1 (129.522µs) from=127.0.0.1:52232
    2019/01/30 14:37:20 [DEBUG] http: Request PUT /v1/catalog/register?dc=dc1 (594.779µs) from=127.0.0.1:52233
    2019/01/30 14:37:20 [DEBUG] http: Request GET /v1/catalog/service/test-name-doesnt-work?dc=dc1 (308.384µs) from=127.0.0.1:52232
    2019/01/30 14:37:20 [DEBUG] http: Request GET /v1/catalog/service/test-name-doesnt-work?dc=dc1 (165.924µs) from=127.0.0.1:52232
    2019/01/30 14:37:27 [DEBUG] consul: Skipping self join check for "test-node" since the cluster is too small
    2019/01/30 14:37:27 [INFO] consul: member 'test-node' joined, marking health alive
    2019/01/30 14:38:27 [DEBUG] consul: Skipping self join check for "test-node" since the cluster is too small
    2019/01/30 14:38:27 [INFO] consul: member 'test-node' joined, marking health alive
    2019/01/30 14:38:50 [DEBUG] manager: Rebalanced 1 servers, next active server is test-node.dc1 (Addr: tcp/127.0.0.1:8300) (DC: dc1)
    2019/01/30 14:39:08 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
    2019/01/30 14:39:08 [DEBUG] agent: Service "test-name-working" in sync
    2019/01/30 14:39:08 [INFO] agent: Deregistered service "test-name-doesnt-work"
    2019/01/30 14:39:08 [INFO] agent: Synced node info
    2019/01/30 14:39:09 [DEBUG] http: Request GET /v1/catalog/services (176.205µs) from=127.0.0.1:52265
    2019/01/30 14:39:27 [DEBUG] consul: Skipping self join check for "test-node" since the cluster is too small


Here is the terraform file:

# Configure the Consul provider
provider "consul" {
address = "localhost:8500"
datacenter = "dc1"
}

resource "consul_agent_service" "add_fqdn_works" {
address = "10.0.0.1"
name = "test-name-working"
port = 18000
tags = ["test"]
}

resource "consul_service" "add_fqdn_doesnt_work" {
name = "test-name-doesnt-work"
node = "test-node"
port = 18000
tags = ["TEST"]
address = "10.0.0.1"
}

eedwa...@gmail.com

unread,
Feb 20, 2019, 1:33:37 PM2/20/19
to Consul
According to docs:

>If the Consul agent is running on the node where this service is registered, it is not recommended to use this resource.

So, are you sure you mean to be registering a service in the catalog? Services for client agents generally come from config files present on that machine.

Ziv Meidav

unread,
Feb 20, 2019, 4:01:34 PM2/20/19
to Consul
Unfortunately terraform for example is going to deprecate the service registration via agent API.
Do you have any idea still, why would a service registered via the catalog be removed automagically after 30-60 seconds?

R.B. Boyer

unread,
Feb 20, 2019, 5:10:03 PM2/20/19
to consu...@googlegroups.com
Agents are the source of truth for which Services should be present in the Catalog for a given Node. If you register a Service into the Catalog directly and the Node you assign it to also happens to run an Agent you will run into issues. When the next anti-entropy sync is performed by that Agent it will deregister the Service you pushed into the catalog to correct what it assumes is an "incorrect" catalog entry associated with its Node.

When you register the Service via the Agent instead of going directly to the Catalog it allows the anti-entropy syncs from the Agent to continue to keep the Catalog updated.

The Terraform provider should only be used to register external services into Consul. From https://www.terraform.io/docs/providers/consul/r/service.html (emphasis mine):

A high-level resource for creating a Service in Consul in the Consul catalog. This is appropriate for registering external services and can be used to create services addressable by Consul that cannot be registered with a local agent.

If the Consul agent is running on the node where this service is registered, it is not recommended to use this resource.

I hope that helps clarify what you are experiencing.


--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
Community chat: https://gitter.im/hashicorp-consul/Lobby
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/db5762d1-888f-471a-ad1c-c982a1c4c5e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ziv Meidav

unread,
Feb 21, 2019, 3:45:07 AM2/21/19
to Consul
Hey,
I would agree that this makes sense, however, what I experience is that my machine running terraform, added a service via the catalog API into the cluster.
There was no agent running on the terraform server, and yet the service got removed by the anti-entropy.

Since adding a service will only be enabled via the catalog api ( terraform provider ) I'm trying to understand why this happens.

R.B. Boyer

unread,
Feb 21, 2019, 11:15:57 AM2/21/19
to consu...@googlegroups.com
Can you clarify something for me? Is one of these roughly your current setup?
  • topology A:
    • machine1
      • runs terraform
    • machine2
      • runs consul server
    • machine3
      • runs consul client agent
      • runs YourService
      • "YourService" is registered into the catalog as existing on the node "machine3"
  • topology B:
    • machine1
      • runs terraform
      • runs YourService
      • "YourService" is registered into the catalog as existing on the node "machine1"
    • machine2
      • runs consul server

Ziv Meidav

unread,
Feb 21, 2019, 12:55:18 PM2/21/19
to Consul
Actually it's this:
Machine 1: Terraform
Machines 2-4: Consul cluster
Machine 5: Consul agent with consul-template that listens on the registered services to create some nginx conf
Machine 6: Actual service, no consul here

If we use consul agent to register, then this thing works
If we use the catalog ( which will be the only option in some future version ) it gets added, then removed after 30-60 secs.

R.B. Boyer

unread,
Feb 21, 2019, 2:47:24 PM2/21/19
to consu...@googlegroups.com
Can you share the (possibly redacted) relevant portions of your terraform config using the consul_* resources?

Ziv Meidav

unread,
Feb 21, 2019, 2:54:57 PM2/21/19
to Consul
It's the same as the first msg in this post - the one that works is the agent based, the one that doesn't work is the consul_service via catalog.

R.B. Boyer

unread,
Feb 21, 2019, 3:01:55 PM2/21/19
to consu...@googlegroups.com
When I attempt to run your original example locally it fails because there is no "consul_node" resource for "test-node". What registers "test-node" in your consul cluster?

  + consul_agent_service.add_fqdn_works
      id:         <computed>
      address:    "10.0.0.1"
      name:       "test-name-working"
      port:       "18000"
      tags.#:     "1"
      tags.0:     "test"

  + consul_service.add_fqdn_doesnt_work
      id:         <computed>
      address:    "10.0.0.1"
      datacenter: <computed>
      name:       "test-name-doesnt-work"
      node:       "test-node"
      port:       "18000"
      service_id: <computed>
      tags.#:     "1"
      tags.0:     "TEST"


Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

consul_service.add_fqdn_doesnt_work: Creating...
  address:    "" => "10.0.0.1"
  datacenter: "" => "<computed>"
  name:       "" => "test-name-doesnt-work"
  node:       "" => "test-node"
  port:       "" => "18000"
  service_id: "" => "<computed>"
  tags.#:     "" => "1"
  tags.0:     "" => "TEST"
consul_agent_service.add_fqdn_works: Creating...
  address: "" => "10.0.0.1"
  name:    "" => "test-name-working"
  port:    "" => "18000"
  tags.#:  "" => "1"
  tags.0:  "" => "test"
consul_agent_service.add_fqdn_works: Creation complete after 0s (ID: test-name-working)

Error: Error applying plan:

1 error(s) occurred:

* consul_service.add_fqdn_doesnt_work: 1 error(s) occurred:

* consul_service.add_fqdn_doesnt_work: Node does not exist: 'test-node'


Ziv Meidav

unread,
Feb 21, 2019, 3:13:25 PM2/21/19
to Consul
No one registers it.
It's either the node name of one of the 3 consul servers.
If you want to run locally you can run consul in dev mode and put the node name there..

R.B. Boyer

unread,
Feb 21, 2019, 3:28:04 PM2/21/19
to consu...@googlegroups.com
When you register external services into consul (directly into the catalog) the node name cannot be the node name of any consul agent already in the system.

That's why in the external services documentation example a synthetic node named "google" is used. A similar thing is done in the consul_service terraform example.

resource "consul_service" "google" {
  name    = "google"
  node    = "${consul_node.compute.name}"
  port    = 80
  tags    = ["tag0"]
}

resource "consul_node" "compute" {
  name    = "compute-google"
  address = "www.google.com"
}

Under the covers
consul_node doesn't actually create anything per-se (with any specific consul HTTP API calls) but it is used to be more explicit about telling terraform "this node is artificial".

When you run "consul catalog nodes", which of the following does the output resemble?

####### A #######
$ consul catalog nodes
Node       ID        Address    DC
my-server  0f43f3e5  127.0.0.1  dc1
test-node  d820b7d1  10.0.0.1   dc1

####### B #######
$ consul catalog nodes
Node       ID        Address    DC
my-server  70c37375  127.0.0.1  dc1
test-node            10.0.0.1   dc1

If you are seeing (A) then "test-node" is actually hooked up to an actual Consul Agent which is still doing anti-entropy. To correct you'll have to ensure that the node that the consul_service resource is configured to use is definitely not one created by an Agent.

Ziv Meidav

unread,
Feb 25, 2019, 2:47:54 AM2/25/19
to Consul
thanks for the help!
looks like I creating a "fake" node helps to resolve this issue.
Reply all
Reply to author
Forward
0 new messages