Consul DNS recursion and zone forwarding by dnsmasq

1,185 views
Skip to first unread message

AoiK

unread,
Aug 31, 2016, 9:03:30 PM8/31/16
to Consul
Hi, I am using consul v0.6.3 and on top of it, I am using dnsmasq to take care of dns zone forwarding.
Because consul cluster cannot manage multiple domains and we have multi datacenters and multi environment, we are using
dnsmasq zone forwarding for solving names outside of our consul domain as well as solving names of other environment's consul domains.

recently, I have found out that only with this zone forwarding, I cannot get A record from CNAME that is created in consul as external service by using domain name instead of a static IP in an address attribute.
To query recursively IP address from this CNAME record, I had to enable recursors options in the consul servers.
So having both of dnsmasq's zone forwarding options and recursors,
when I set up external service like dev-consul.service.prod.local CNAME consul.service.dev.local (let's say...I have 2 clusters : one for prod.local and one for dev.local), clients can get a IP of consul.service.dev.local from prod.local's dns servers.

however, once I enable recursor options, I get tons of logs in consul servers that are set as "recursors" like below.
Aug 31 23:35:48 <dev consul server> consul[26421]: dns: all resolvers failed for {xx.xx.xx.xx.in-addr.arpa. 12 1} from client <prod consul server>:10927 (udp)

once this messages start, these consul servers' CPU usage gets really high and eventually VM dies.
In fact, this failing record is our NTP servers' PTR record in the prod.local domain and it should be resolved by prod consul server without querying recursively.


root@ <prod consul server>:~# dig @127.0.0.1 -p 8600 xx.xx.xx.xx.in-addr.arpa.
; <<>> DiG 9.9.5-3ubuntu0.8-Ubuntu <<>> @127.0.0.1 -p 8600 xx.xx.xx.xx.in-addr.arpa. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33164 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available <<<<< ???
;; QUESTION SECTION: ;43.186.20.172.in-addr.arpa. IN A
;; ANSWER SECTION: 43.186.20.172.in-addr.arpa. 0 IN PTR vntp-003.node.prod.local.
;; Query time: 1 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Wed Aug 31 23:40:56 UTC 2016 ;; MSG SIZE rcvd: 117
root@vconsul-001:~# dig @127.0.0.1 -p 53 xx.xx.xx.xx.in-addr.arpa.
; <<>> DiG 9.9.5-3ubuntu0.8-Ubuntu <<>> @127.0.0.1 -p 53 xx.xx.xx.xx.in-addr.arpa. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49771 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION: ;43.186.20.172.in-addr.arpa. IN A
;; ANSWER SECTION: xx.xx.xx.xx.in-addr.arpa. 0 IN PTR vntp-003.node.prod.local.
;; Query time: 2 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Aug 31 23:41:15 UTC 2016 ;; MSG SIZE rcvd: 117

as shown above, both of 8600(consul) and 53(dnsmasq) interfaces are returning the record, but somehow for 8600, the query is going to the recursors, too.
because the result can be retrieved, I don't mind if this doesn't do anything but it actually consumes OS resources and really gets consul servers unstable so I'd like to understand what exactly is going on here and how to stop it.

my consul server has resolv.conf like this :
nameserver 127.0.0.1

so the request goes to its port 53, which dnsmasq is configured like this :
akadoya@vconsul-001:~$ cat /etc/dnsmasq.d/10-consul
server=/prod.local/127.0.0.1#8600   << for A
rev-server=<CIDR for prod IP ranges>,127.0.0.1#8600   << for PTR

## zone forwarding
server=/<dc>.prod.local/127.0.0.1#8600     << to answer request with dc code, it has this forwarding.
server=/stg.local/<stg consul server 1>#8600   << forwardings for other environment's consul clusters
server=/stg.local/<stg consul server 2>#8600
rev-server=<stg range>,<stg consul server 1>#8600
rev-server=<stg range>,<stg consul server 2>#8600
server=/dev.local/<dev consul server 1>#8600
rev-server=<dev range>,<dev consul server 1>#8600

server=/hoge.local/<other bind server>#53

is something wrong with my setting? 
In my understanding, 
xx.xx.xx.xx.in-addr.arpa. would be hit the second line of dnsmasq config and localhost's consul would answer the record without asking it to the upstream dns servers but consul doesn't work like that?

it'd be appreciated if you could help me with figuring out if this is what it's supposed to be like or not.
Thanks,
Aoi

James Phillips

unread,
Sep 20, 2016, 7:56:03 PM9/20/16
to consu...@googlegroups.com
Hi,

Do you see this only with external services? I have an idea about what
might be happening here but need to trace through the code a bit -
would you mind opening a GitHub issue so we can work this over there?

Thanks!

-- James
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/354f235d-5835-44d7-8242-07a6e921c4a5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

AoiK

unread,
Sep 21, 2016, 12:30:58 PM9/21/16
to Consul
Hi, no, I am aware of this recursion behavior itself happening to PTR records only. (it could be happening to any record, but I don't know)
I've actually raised this issue but no reaction yet.

The reason I have configured consul and dnsmasq like this is, external services.
When I want to add CNAME to external domain's name record, I needed to use the external service and use the name instead of IP address for address option.
and To resolve this CNAME to IP, I needed to enable both of recursor option and dnsmasq's zone forwarding.

Let me know if you need more information, we can work further in the ticket above.

Thanks,
Aoi
Reply all
Reply to author
Forward
0 new messages