Hi,
let me preface this post by saying that we work in an environment with a high frequency of requests with a need for low latency. The consul dns api is being used to find other services in our micro services architecture. We run a cluster of three t1.micro instances in aws.
To reduce the latency in looking up services, the consul clients are configured something like this:
"dns_config": {
"allow_stale": true,
"service_ttl": {
"*": "10s"
}
I measure latency the following way. First i run consul monitor -log-level debug and do a dns lookup with dig:
dig @127.0.0.1 -p 8600 myservice.service.consul
The log looks like this:
[DEBUG] dns: request for {myservice.service.consul. 1 1} (1.618034ms)
Dig output looks like this:
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
;; Query time: 2 msec
I run the lookups repeatedly but the latency stays mostly the same, which is confusing to me. Because after the first request, the dns should be cached for the next ten seconds and therefore the resolution should be much faster. Dig shows me the expected TTL of ten seconds, so i am assuming i configured everything correctly. One interesting thing i observe is, that the TTL actually always stays the same. Meaning for consecutive calls to the same dns entry, dig shows always the same TTL. If i do the same thing with any other dns server i can see that the time is being counted down and the restarted once it reaches zero.
It behaves like this:
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
I would expect it to behave like this:
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
;; ANSWER SECTION:
myservice.service.consul. 7 IN A 10.0.27.43
;; ANSWER SECTION:
myservice.service.consul. 3 IN A 10.0.27.43
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
We also mirror all dns entries to route53 private hosted zone for failover in case something goes extremely wrong. If i run dig @10.0.0.2 myservice.service.consul i see the following output.
First request:
;; ANSWER SECTION:
myservice.service.consul. 10 IN A 10.0.27.43
;; Query time: 3 msec
Second request:
;; ANSWER SECTION:
myservice.service.consul. 4 IN A 10.0.27.43
;; Query time: 0 msec
As you can see i get the expected behaviour. It is just very confusing to me, why the route53 lookup would be faster than consul, since i the request has to leave the machine.
In another experiment i tried to cache dns resolutions locally with dnsmasq to see if i could speed up the lookups. Surprisingly dnsmasq refuses to cache all consul dns entries! I have the following dnsmasq config:
server=/consul/127.0.0.1#8600
I activated logging in dnsmasq and monitor the requests with tail -f /var/log/syslog
dnsmasq[5659]: forwarded myservice.service.consul to 127.0.0.1
dnsmasq[5659]: reply myservice.service.consul is 10.0.27.43
After the second call i should see something like this, but it never shows up:
dnsmasq[5659]: cached myservice.service.consul
I even dumped the whole cache to see if logging is just not showing it, but i can the see that the entry is never cached. Also latency is still at 2 msec. The funny thing that every other dns entry which is not answered by the local consul client (e.g. google.com) is cached properly in dnsmasq.
So my three questions are:
Best
Raphael
let me preface this post by saying that we work in an environment with a high frequency of requests with a need for low latency. The consul dns api is being used to find other services in our micro services architecture. We run a cluster of three t1.micro instances in aws.
I run the lookups repeatedly but the latency stays mostly the same, which is confusing to me. Because after the first request, the dns should be cached for the next ten seconds and therefore the resolution should be much faster. Dig shows me the expected TTL of ten seconds, so i am assuming i configured everything correctly. One interesting thing i observe is, that the TTL actually always stays the same. Meaning for consecutive calls to the same dns entry, dig shows always the same TTL. If i do the same thing with any other dns server i can see that the time is being counted down and the restarted once it reaches zero.
We also mirror all dns entries to route53 private hosted zone for failover in case something goes extremely wrong. If i run dig @10.0.0.2 myservice.service.consul i see the following output.
In another experiment i tried to cache dns resolutions locally with dnsmasq to see if i could speed up the lookups. Surprisingly dnsmasq refuses to cache all consul dns entries! I have the following dnsmasq config:
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/4e2a0973-fe3d-42d8-90a0-c8716db680f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.