Issue with CNAME Records and the Auto-Registered "vault" Service in Consul

492 views
Skip to first unread message

Rusty Ross

unread,
Dec 17, 2016, 9:01:06 PM12/17/16
to Consul
I am seeing an issue in Consul 0.7.1 and Vault 0.6.3 (not necessarily exclusive to these versions) in regards to the "vault" service that Vault auto-registers to Consul.

Specifically, If I do a DNS lookup on the generic "vault" service without the "active" or "standby" tag, I get back two CNAMES:

$ dig @localhost -p 8600 vault.service.oregon.consul

[...]

;; ANSWER SECTION:
vault.service.oregon.consui. 0 IN CNAME vault-node-002.node.oregon.consul.
vault-node-002.node.oregon.consuli. 0 IN A 172.29.26.198
vault.service.oregon.consul. 0 IN CNAME vault-node-001.node.oregon.consuli.

I believe this (returning multiple CNAMES for the same name) violates RFC 1034 Section 3.6.2. This is a problem (and the reason I even noticed it) is that some downstream recursors (Bind in this case) will simply reject this behavior and return a SERVFAIL when this happens.

Typically, I have always seen Consul simply return multiple A records for a service (no CNAMES), and of course, this is entirely proper and RFC-compliant and works fine:

$ dig @localhost -p 8600 foo.service.oregon.consul

[...]

;; ANSWER SECTION:
foo.service.oregon.consuli. 0 IN A 172.29.25.111
foo.service.oregon.consuli. 0 IN A 172.29.25.82
foo.service.oregon.consuli. 0 IN A 172.29.26.93

I'm not sure what is different about the Vault service registration, or why Consul is exhibiting what seems to me to be non-comlpliant behavior, at lest as far as the RFC is concrned.

I initially raised this on the Vault Google Group and here is the URL to that thread, just for reference:


Best,
Rusty

James Phillips

unread,
Dec 17, 2016, 10:54:26 PM12/17/16
to consu...@googlegroups.com
Hi Rusty,

One quick question:

;; ANSWER SECTION:
vault.service.oregon.consui. 0 IN CNAME vault-node-002.node.oregon.consul.
vault-node-002.node.oregon.consuli. 0 IN A 172.29.26.198
vault.service.oregon.consul. 0 IN CNAME vault-node-001.node.oregon.consuli.

^ one is ".consui" and the other is ".consul". Did you fix that up for
the email, or could that possibly be related to the weirdness.

-- James
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/c323821b-e226-4c1b-8a58-23dd306d9d02%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Rusty Ross

unread,
Dec 17, 2016, 11:30:19 PM12/17/16
to Consul
Ah, great eye, but that is actually just a small typo on this mailing list related to a minor redaction I did for purposes of posting on a public forum. So, not related to this issue. But again, nice catch!

Rusty

James Phillips

unread,
Dec 19, 2016, 8:40:07 PM12/19/16
to consu...@googlegroups.com
Hi Rusty,

This is coming from here in Vault -
https://github.com/hashicorp/vault/blob/master/physical/consul.go#L612.

The redirect configuration in Vault is getting populated as Vault's
service address when it registers with Consul, and since that's likely
not an IP, Consul is returning an CNAME RR for it. I was able to get
Consul to do this completely independent of Vault by registering some
static services set with their addresses as non-IPs.

It does seem like we need a special case on the Consul side to avoid
responses with multiple CNAME records. Does it make sense to you that
your redirect config populates the Vault service record in this way?
I'm also trying to figure out if we should make a change on the Vault
side as well. I know newer versions of Vault have request forwarding
and may not suffer from this issue (you can talk to any of the Vault
servers, so you wouldn't configure a redirect).

-- James

On Sat, Dec 17, 2016 at 8:30 PM, Rusty Ross <rusty...@full360.com> wrote:
> Ah, great eye, but that is actually just a small typo on this mailing list related to a minor redaction I did for purposes of posting on a public forum. So, not related to this issue. But again, nice catch!
>
> Rusty
>
> --
> This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/2e27a3b5-e43d-4f96-8c0f-758be383efb9%40googlegroups.com.

Rusty Ross

unread,
Dec 20, 2016, 1:57:12 PM12/20/16
to Consul
James,

Thanks for following up on this.

Your analysis does make sense, and does (mostly) align with what I can see specifically on this side. (Although I would have thought that in a 2-node Vault cluster, the redirect config would be to a single Vault node, and hence, one Vault CNAME.)

Anyway, as I see this, it really seems like there should be a change on both the Vault and Consul sides.

In regards to Vault, it seems that Vault should simply register all Vault nodes as the "vault" service in the typical manner so that Consul returns all of them as A records for the "vault.service.tld" endpoints. Since Vault is already (properly) using tagging for "active" and "standby", those endpoints (ie: active.vault.service.tld, and standby.vault.service.tld) are available if redirect-free behavior is desired by the consumer.

In regards to Consul, it seems that Consul should probably honor the RFC snd not return multiple CNAMES in cases where it currently would. It seems to me the behavior should be: Never return more than one CNAME, though if the catalog contains more than one CNAME, then rotate the responses across multiple queries (ie: DNS-based load balancing), much in the same way that A records are rotated across responses by Consul today.

Any thoughts on this? Would it be worthwhile to open this as a Github issue in the Consul and Vault repos? Worthwhile to invite Jeff or Vishal to join in on the thread here?

(Incidentally, I believe there may also be a similar issue in regards to Nomad's automatic service of registration of the "nomad" service in Consul, but need to do a little more testing to confirm.)

Best.
Rusty

James Phillips

unread,
Jan 5, 2017, 7:24:39 PM1/5/17
to consu...@googlegroups.com
Hi Rusty,

I'd need to do a little more digging but I think you are correct that
we should change both sides. The CNAME idea seems correct as well. If
you could open up issues and link to this thread I'd appreciate it!

-- James
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/981da74c-c58f-4fd5-bcc5-0ba227351f49%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages