Consul DNS caching - dnsmasq

942 views
Skip to first unread message

Pablo Fischer

unread,
Jul 17, 2015, 7:00:00 PM7/17/15
to consu...@googlegroups.com
Howdy,

We are planning in rolling out dnsmasq and forwarding the responses of
consul to consul and anything else to our existing resolver. All our
tests so far are OK.

However now we are testing all those scary cases such as "well, what
happens if for some reason the consul servers crash". For this I
changed the TTL by node to 5s and of service (*) to 10s (quiet a lot,
but this is just to test out). I can verify that I get the right TTLs
when I do a dig, both by hitting directly local consul and also the
local dnsmasq.


.... however we are having problem with testing our a full consul
outage. For "10 seconds" (our service TTL) we are able to get the
response, but once that happens we get the famous "name not found". So
question (and I know this is not the dnsmasq but I assume many folks
here use dnsmasq+consul):

Is there a way that when dnsmasq is not able to resolve, it gives a
cached response? Basically the "last we got", even better if we can
give a TTL to this. Basically using a "dead TTL".

Our config is fairly simple:

domain-needed
bogus-priv
resolv-file=/etc/resolv.conf.original
strict-order
interface=lo
expand-hosts
log-queries
conf-dir=/etc/dnsmasq.d

For consul, we use node_ttl = 5s, service_ttl (*) = 10s and allow stale DNS.

Thanks!
--
Pablo

Darron Froese

unread,
Jul 17, 2015, 7:39:06 PM7/17/15
to consu...@googlegroups.com
I never found a way to make dnsmasq hold onto DNS entries from a dead Consul.

As an alternate method, we have been generating an additional hosts file using Consul Template, distributing that and then having dnsmasq load that with:

--addn-hosts=/etc/hosts.consul --local-ttl=10

That works really well - most queries actually don't ever even make it to Consul - they're served directly from dnsmasq with a TTL of 10 seconds.

This way - if you have to actually stop Consul - the whole world doesn't end because you can't resolve DNS entries.

Maybe something like that can work for your scenario?

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAJRPuozNi_wcPvgA%2B9dgWcEK1EmnEZ3ZPi5jFUWKhKqy3uFLqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brian Lalor

unread,
Jul 18, 2015, 6:37:34 AM7/18/15
to consu...@googlegroups.com
That’s really interesting, Darron.  What’s your consul-template template look like?


For more options, visit https://groups.google.com/d/optout.

— 
Brian Lalor

Darron Froese

unread,
Jul 18, 2015, 8:43:55 AM7/18/15
to consu...@googlegroups.com
The template looks like this:


It looks for keys under /services in the KV tree to know what Consul services to query:


We wanted to avoid some including services we have - they're not useful in this case and would have added to config churn.

Consul Template iterates through those services and creates a host file with all the tags for each service. That host file ends up looking something like this:


IMPORTANT NOTE: We have not been able to run this template on all nodes at our scale - it causes way too many Consul leadership transitions - so we run it on a single node, sort and uniq the entries, stuff it into the KV store and then have a watch that pulls it out and loads it:


The file was distributed to 500 nodes (at the time) within 1-2 seconds - the shasum lets us see if what's being loaded is arriving intact. We also limit the amount of times it can run to once every 30 seconds.

Before we had the limit in place we had it run during a rolling restart of all the Consul agents - which meant it ran about 30 times a minute for 20 minutes. 20K into the KV - distributed to 500 nodes (at the time) in around 1-2 seconds - over and over and over. To Consul's credit didn't have a single leadership transition during that entire period and it pushed out a TON of bandwidth - blue was bytes sent by the leader at the time:


This is how much bandwidth it normally puts out:


I was sweating a bit while I was adding the 30 second limit - but it ended up being no problem at all.

Ryan Breen

unread,
Jul 18, 2015, 9:20:52 AM7/18/15
to consu...@googlegroups.com
Darron,

That is awesome!  Thanks for sharing.

Ryan

Pablo Fischer

unread,
Jul 18, 2015, 5:17:13 PM7/18/15
to consu...@googlegroups.com
Yep. Very smart.

Darron, just to make sure I understood.  You generate the contents of the hosts file on a single host and then publish it with k/v system? I thought that k/v had a limitation of how much you can store on a single key. But maybe I'm wrong.

We will possibly do the same thing as you, we have 5 colos and our big ones have quiet a hundred of services and thousands of nodes, so generating this on every host would definetly be overkill.


For more options, visit https://groups.google.com/d/optout.


--
Pablo

Brian Lalor

unread,
Jul 18, 2015, 5:25:15 PM7/18/15
to consu...@googlegroups.com
I don't know if consul-template supports this, but can you solve the load problem by using stale reads?

-- 
Brian Lalor
bla...@bravo5.org

Darron Froese

unread,
Jul 19, 2015, 12:06:08 AM7/19/15
to consu...@googlegroups.com
Pablo,

Yes - that's what we do - we generate it on a single system (one of the Consul servers - running under Consul lock).

The end result is only about 25K - so within the size limitations.

Darron Froese

unread,
Jul 19, 2015, 12:13:06 AM7/19/15
to consu...@googlegroups.com
Brian,

Consul Template has a few options that apply to this:

"wait" - "The minimum(:maximum) to wait before rendering a new template to disk and triggering a command" - we use this in production so that the hosts file doesn't get built more than once every 30 seconds.

"max-stale" - "The maximum staleness of a query. If specified, Consul will distribute work among all servers instead of just the leader. The default value is 0 (none)." - we also use this in production on the one node. This seems to be the equivalent of Consul's "stale" consistency mode:


My test repo has max-stale but not wait - it seemed to be a reliable way to trigger the kinds of things we're seeing at 700 nodes with a much smaller cluster.

Pablo Fischer

unread,
Jul 20, 2015, 2:28:57 PM7/20/15
to consu...@googlegroups.com
Sounds great. One last question. When you generate the file do you verify that it will fit into one key or has it been just pure luck that it's been fitting?

We have one service with almost two hundred nodes and that's just one service so I'm pretty sure we will exceed the size. I'm thinking that we will need to split the file into separate files (one per service) and once we generate and update the k/v we will update another key that says all the files we updated.

For more options, visit https://groups.google.com/d/optout.


--
Pablo

Darron Froese

unread,
Jul 20, 2015, 2:34:33 PM7/20/15
to consu...@googlegroups.com
Our file is only 25K in size at the moment.

Unless this has changed, the limit is 512K:

https://github.com/hashicorp/consul/issues/123

We're not doing any size checking at the moment - but yes - if you have that many nodes and services you may need to split and reassemble.

I had also toyed with the idea of compressing it before the insert - might be worth looking at that as well for your infrastructure.

ig...@encompasscorporation.com

unread,
Jul 21, 2015, 7:03:45 AM7/21/15
to consu...@googlegroups.com, dar...@froese.org
Hi Darron,

First I want to say this is really great. But looking at your hosts file I assume you have consul deployed in AWS, US region. Assuming again that you have the consul servers spread across all AZ's of the region and that a disaster scenario in which you loose them all is highly unlikely, based on my experience of running production loads on AWS for more than 2 years, wonder what was your motivation going through all this trouble? Lets say in case of AWS it is not consul HA, is it the benefit of the caching on the clients reducing the DNS traffic to the servers?

In case if consul HA is really what worries you, isn't it much easier to go with auto scaling solution for the consul server cluster than jumping through all this hoops with dnsmasq?

Again I'm just curious and this is under assumption you are really running in AWS so please forgive me if my assumption is wrong.

Thanks,
Igor

Darron Froese

unread,
Jul 21, 2015, 2:29:50 PM7/21/15
to ig...@encompasscorporation.com, consu...@googlegroups.com
Igor,

Yes - we're in AWS and we are spread across 3 AZs for some fault tolerance / high availability for the Consul servers.

The reason we added this procedure to generate a hosts file and add it to dnsmasq was to satisfy two requirements:

1. Our internal DNS based service discovery shouldn't stop working if the local Consul agent is dead and/or the Consul system is in the midst of a leadership transition.

2. With the volume of traffic and requests that we have already, reducing read pressure on Consul is desirable for us - dnsmasq has been able to help with that.

This solution satisfied that requirement and also allows us some flexibility to reduce Consul server pressure without killing our DNS based service discovery.

Make sense?

ig...@encompasscorporation.com

unread,
Jul 22, 2015, 8:21:11 PM7/22/15
to Consul, dar...@froese.org
Yep that makes sense. Although wonder if using the "stale" consistency read mode would achieve the same as you now have with dnsmasq. It is documented here https://www.consul.io/docs/agent/http.html and here https://www.consul.io/docs/internals/consensus.html#stale. It says:

This mode allows any server to service the read regardless of if it is the leader. This means reads can be arbitrarily stale but are generally within 50 milliseconds of the leader. The trade-off is very fast and scalable reads but with stale values. This mode allows reads without a leader meaning a cluster that is unavailable will still be able to respond.

Also, maybe it is not critical, but aren't you introducing a SPOF with that single node creating the hos file? What if that node dies?

Thanks,
Igor

Brian Lalor

unread,
Jul 22, 2015, 9:13:52 PM7/22/15
to consu...@googlegroups.com, dar...@froese.org
Darren’s solution also allows for continuity if the consul *agent* dies on a host.

It’d be easy to remove the SPOF through the use of “consul lock”.


For more options, visit https://groups.google.com/d/optout.

— 
Brian Lalor

ig...@encompasscorporation.com

unread,
Jul 22, 2015, 9:21:58 PM7/22/15
to Consul, dar...@froese.org, bla...@bravo5.org
Brian, I'm not disputing the validity of Darren's solution I'm sure it works. The point is why going through the trouble and adding complexity and more stuff to maintain if the solution might be simple and already provided by the platform.

Brian Lalor

unread,
Jul 22, 2015, 10:15:50 PM7/22/15
to ig...@encompasscorporation.com, Consul, dar...@froese.org
He’s designing a failsafe for the case where the platform is unavailable.  If the agent’s dead on a node, you’ve lost your ability to query *anything* from the cluster.
— 
Brian Lalor

ig...@encompasscorporation.com

unread,
Jul 22, 2015, 10:26:06 PM7/22/15
to Consul, dar...@froese.org, bla...@bravo5.org
If the agent dies you can re-spawn it, easily done. Any other benefits?

coo...@yahoo-inc.com

unread,
Sep 16, 2015, 5:32:32 PM9/16/15
to Consul, dar...@froese.org
Great info and so thoroughly explained!  Thanks, Darron!  You're a life saver.

I'm not sure if I am just new to consul-template, new to Go Templates, or if perhaps the consul-template syntax has changed but I had to update the template content as follows.  I'm using consul-template v0.10.0.  My modifications include:
  • using {{range services}} instead of {{range ls "services/"}}
  • using .Name instead of .Key
So the result is:

{{range services}}

{{range $tag, $services := service .Name | byTag}}

{{range $services}}{{.Address}} {{$tag}}.{{.Name}}.service.consul

{{end}}{{end}}

{{range service .Name}}{{.Address}} {{.Name}}.service.consul

{{end}}

{{end}}


Darron Froese

unread,
Sep 16, 2015, 5:53:32 PM9/16/15
to coo...@yahoo-inc.com, Consul
Those changes make sense.

Our system doesn't grab ALL of the services available - just the ones that we deemed as important. Those ones we put in the KV store - which is why I used:

{{ range ls "services/" }}

It's just looking for the services we manually added to the KV store underneath the path "services/" - those services are also shown here:


Make a little more sense now?

coo...@yahoo-inc.com

unread,
Sep 16, 2015, 5:56:45 PM9/16/15
to Consul, coo...@yahoo-inc.com, dar...@froese.org
Oh yeah!  Suddenly that makes sense.  You guys have a list of the services you want in a key named 'services'.  Bingo, makes perfect sense.  :)  Well I guess I've contributed a solution for peeps who want all services. :)

Grant Rodgers

unread,
Dec 9, 2015, 5:39:20 PM12/9/15
to Consul, dar...@froese.org
Darron,

I noticed in https://gist.github.com/darron/7dcacaedf0793f3dc38c that your handler reads the key value itself, disregarding the input consul passes to the handler. Is there a reason for that? Is it a workaround for something?

Darron Froese

unread,
Dec 9, 2015, 6:09:42 PM12/9/15
to Consul
Grant,

The script wasn't smart enough to read the watch payload - so we just read from the KV store.

That's honestly a REALLY naive way to do it - but it worked really well until we replaced it about a month ago.

The "new way" we're doing that has some protection for watches that are firing a bit too eagerly:


It also makes sure that it doesn't actually run if nothing's going to change - but that's in the other binary that's actually doing the querying of Consul and writing the file.

I'm hoping to be able to talk more about that in January at Scale14x - but are still testing and rolling it out as we tweak.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages