Cluster DNS: bottleneck with ~1000 outbound connections per second

1,302 views
Skip to first unread message

Evan Jones

unread,
Oct 5, 2017, 1:47:07 PM10/5/17
to Kubernetes user discussion and Q&A
TL;DR: Kubernetes dnsPolicy: ClusterFirst can become a bottleneck with a high rate of outbound connections. It seems like the problem is filling the nf_conntrack table, causing client applications to fail to do DNS lookups. I resolved this problem by switching my application to dnsPolicy: Default, which provided much better performance for my application that does not need cluster DNS.

It seems like this is probably a "known" problem (see issues below), but I can't tell: Is there a solution being worked on for this? 

Thanks!


Details:

We were running a load generator, and were surprised to find that the aggregate rate did not increase as we added more instances and nodes to our cluster (GKE 1.7.6-gke.1). Eventually the application started getting errors like "Name or service not known" at surprisingly low rates, like ~1000 requests/second. Switching the application to dnsPolicy: Default resolved the issue.

I spent some time digging into this, and the problem is not the CPU utilization kube-dns / dnsmasq itself. On my small cluster of ~10 n1-standard-1 instances, I can get about 80000 cached DNS queries/second. I *think* the issue is that when there are enough machines talking to this single DNS server, it fills the nf_conntrack table, causing packets to get dropped, which I believe ends up rate limiting the clients. dmesg on the node that is running kube-dns shows a constant stream of:

[1124553.016331] nf_conntrack: table full, dropping packet
[1124553.021680] nf_conntrack: table full, dropping packet
[1124553.027024] nf_conntrack: table full, dropping packet
[1124553.032807] nf_conntrack: table full, dropping packet

It seems to me that this is a bottleneck for Kubernetes clusters, since by default all queries are directed to a small number of machines, which will then fill the connection tracking tables.

Is there a planned solution to this bottleneck? I was very surprised that *DNS* would be my bottleneck on a Kubernetes cluster, and at shockingly low rates.


Related Github issues

The following Github issues may be related to this problem. They all have a bunch of discussion but no clear resolution:

Run dnsmasq on each node; mentions conntrack: https://github.com/kubernetes/kubernetes/issues/32749
kube-dns should be a daemonset / run on each node https://github.com/kubernetes/kubernetes/issues/26707

dnsmasq intermittent connection refused: https://github.com/kubernetes/kubernetes/issues/45976

kube-aws seems to already do something to run a local DNS resolver on each node? https://github.com/kubernetes-incubator/kube-aws/pull/792/

Tim Hockin

unread,
Oct 5, 2017, 1:54:08 PM10/5/17
to Kubernetes user discussion and Q&A
We had a proposal to avoid conntrack for DNS, but no real movement on it.

We have flags to adjust the conntrack table size.

Kernel has params to tweak timeouts, which users can tweak.

Sustained 1000 QPS DNS seems artificial.
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-use...@googlegroups.com.
> To post to this group, send email to kubernet...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.

Evan Jones

unread,
Oct 5, 2017, 4:29:28 PM10/5/17
to Kubernetes user discussion and Q&A
The sustained 1000 qps comes from an application making that many outbound connections. I agree that the application is very inefficient and shouldn't be doing a DNS lookup for every request it sends, but it's a python program that uses urllib2.urlopen so it creates a new connection each time. I suspect this isn't that unusual? This could be a server that hits an external service for every user request, for example. Given the activity on the GitHub issues I linked, it appears I'm not the only person to have run into this. 

Thanks for the response though, since that answers my question: there is currently no plans to change how this works. Hopefully if anyone else hits this they might find this email so they can solve it faster than I did.

Finally the fact that dnsPolicy: Default is *not* the default is also surprising. It should probably be called dnsPolicy: Host or something instead.


> email to kubernetes-users+unsubscribe@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Rodrigo Campos

unread,
Oct 5, 2017, 5:26:41 PM10/5/17
to kubernet...@googlegroups.com
On Thu, Oct 05, 2017 at 04:29:21PM -0400, Evan Jones wrote:
> The sustained 1000 qps comes from an application making that many outbound
> connections. I agree that the application is very inefficient and shouldn't
> be doing a DNS lookup for every request it sends, but it's a python program
> that uses urllib2.urlopen so it creates a new connection each time. I
> suspect this isn't that unusual? This could be a server that hits an
> external service for every user request, for example. Given the activity on
> the GitHub issues I linked, it appears I'm not the only person to have run
> into this.

But is always on different domains? If not, it can probably be cached (as long
as the TTL allows) by the DNS server and, even if your app makes so many
requests, it should be answered quite fast.

Evan Jones

unread,
Oct 5, 2017, 5:46:55 PM10/5/17
to Kubernetes user discussion and Q&A
My script is always looking up the same domain, and I believe it is cached by dnsmasq. I think the limit is the kernel NAT connection tracking, because each DNS query comes from a new ephemeral port, so it ends up using up all NAT mappings on the node running kube-dns. This is why dnsPolicy: Default fixes the problem: It uses the host's DNS configuration which avoids the NAT connection limits.

Details including the Python code and configs to reproduce it on a brand new GKE cluster are at the bottom of https://github.com/kubernetes/kubernetes/issues/45976

I did a separate test, using a Go DNS query generator, which was able to do 80000 DNS queries per second, so dnsmasq does not appear to be the limit.

Thanks!

Evan


Tim Hockin

unread,
Oct 5, 2017, 7:53:35 PM10/5/17
to Kubernetes user discussion and Q&A
On Thu, Oct 5, 2017 at 1:29 PM, Evan Jones <evan....@bluecore.com> wrote:
> The sustained 1000 qps comes from an application making that many outbound
> connections. I agree that the application is very inefficient and shouldn't
> be doing a DNS lookup for every request it sends, but it's a python program
> that uses urllib2.urlopen so it creates a new connection each time. I
> suspect this isn't that unusual? This could be a server that hits an
> external service for every user request, for example. Given the activity on
> the GitHub issues I linked, it appears I'm not the only person to have run
> into this.

You're certainly not the ONLY but it's not that common. Regardless,
the work to make this hurt less has not been done.

> Thanks for the response though, since that answers my question: there is
> currently no plans to change how this works. Hopefully if anyone else hits
> this they might find this email so they can solve it faster than I did.

You can tweak the flags to mitigate, I hope?

> Finally the fact that dnsPolicy: Default is *not* the default is also
> surprising. It should probably be called dnsPolicy: Host or something
> instead.

Yeah "Host" might have been better. I would take PRs to add Host and
let it mean the same as "Default" and deprecate (but not remove)
"Default".

Tim
>> > email to kubernetes-use...@googlegroups.com.
>> > To post to this group, send email to kubernet...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/kubernetes-users.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Kubernetes user discussion and Q&A" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> kubernetes-use...@googlegroups.com.
>> To post to this group, send email to kubernet...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/kubernetes-users.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-use...@googlegroups.com.
> To post to this group, send email to kubernet...@googlegroups.com.

duffie...@coreos.com

unread,
Oct 5, 2017, 7:53:38 PM10/5/17
to Kubernetes user discussion and Q&A
This is a good read on the problem as well:
https://rsmitty.github.io/KubeDNS-Tweaks/

Basically greatly reduce the number of calls by tweaking some kube-dns settings.

On Thursday, October 5, 2017 at 2:46:55 PM UTC-7, Evan Jones wrote:
> My script is always looking up the same domain, and I believe it is cached by dnsmasq. I think the limit is the kernel NAT connection tracking, because each DNS query comes from a new ephemeral port, so it ends up using up all NAT mappings on the node running kube-dns. This is why dnsPolicy: Default fixes the problem: It uses the host's DNS configuration which avoids the NAT connection limits.
>
>
> Details including the Python code and configs to reproduce it on a brand new GKE cluster are at the bottom of https://github.com/kubernetes/kubernetes/issues/45976
>
>
> I did a separate test, using a Go DNS query generator, which was able to do 80000 DNS queries per second, so dnsmasq does not appear to be the limit.
>
>
> Thanks!
>
> Evan
>
>
>
>
> On Thu, Oct 5, 2017 at 5:26 PM, Rodrigo Campos <rod...@sdfg.com.ar> wrote:
> On Thu, Oct 05, 2017 at 04:29:21PM -0400, Evan Jones wrote:
>
> > The sustained 1000 qps comes from an application making that many outbound
>
> > connections. I agree that the application is very inefficient and shouldn't
>
> > be doing a DNS lookup for every request it sends, but it's a python program
>
> > that uses urllib2.urlopen so it creates a new connection each time. I
>
> > suspect this isn't that unusual? This could be a server that hits an
>
> > external service for every user request, for example. Given the activity on
>
> > the GitHub issues I linked, it appears I'm not the only person to have run
>
> > into this.
>
>
>
> But is always on different domains? If not, it can probably be cached (as long
>
> as the TTL allows) by the DNS server and, even if your app makes so many
>
> requests, it should be answered quite fast.
>
>
>
>
>
> --
>
> You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.

Rodrigo Campos

unread,
Oct 5, 2017, 10:55:49 PM10/5/17
to kubernet...@googlegroups.com
Ohh, sorry. My bad, just ignore my past email :-)
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.

Matthias Rampke

unread,
Oct 9, 2017, 5:05:03 AM10/9/17
to kubernet...@googlegroups.com
We encountered this issue too, and tried to counter it by lowering UDP conntrack timeouts so that these entries expire more quickly. However, at the time we found that the corresponding sysctls are not propagated into network namespaces, so we now patch the global defaults in our kernel build (patch attached). This works for us because we build a somewhat customized kernel anyway, but it is a bit heavy handed.

/MR

On Fri, Oct 6, 2017 at 2:55 AM Rodrigo Campos <rodr...@gmail.com> wrote:
Ohh, sorry. My bad, just ignore my past email :-)

On Thursday, October 5, 2017, Evan Jones <evan....@bluecore.com> wrote:
My script is always looking up the same domain, and I believe it is cached by dnsmasq. I think the limit is the kernel NAT connection tracking, because each DNS query comes from a new ephemeral port, so it ends up using up all NAT mappings on the node running kube-dns. This is why dnsPolicy: Default fixes the problem: It uses the host's DNS configuration which avoids the NAT connection limits.

Details including the Python code and configs to reproduce it on a brand new GKE cluster are at the bottom of https://github.com/kubernetes/kubernetes/issues/45976

I did a separate test, using a Go DNS query generator, which was able to do 80000 DNS queries per second, so dnsmasq does not appear to be the limit.

Thanks!

Evan

On Thu, Oct 5, 2017 at 5:26 PM, Rodrigo Campos <rod...@sdfg.com.ar> wrote:
On Thu, Oct 05, 2017 at 04:29:21PM -0400, Evan Jones wrote:
> The sustained 1000 qps comes from an application making that many outbound
> connections. I agree that the application is very inefficient and shouldn't
> be doing a DNS lookup for every request it sends, but it's a python program
> that uses urllib2.urlopen so it creates a new connection each time. I
> suspect this isn't that unusual? This could be a server that hits an
> external service for every user request, for example. Given the activity on
> the GitHub issues I linked, it appears I'm not the only person to have run
> into this.

But is always on different domains? If not, it can probably be cached (as long
as the TTL allows) by the DNS server and, even if your app makes so many
requests, it should be answered quite fast.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.
1001-Lower-conntrack-timeouts-for-udp-tcp.txt

mi...@percy.io

unread,
Mar 20, 2018, 1:55:21 AM3/20/18
to Kubernetes user discussion and Q&A
> > email to kubernetes-use...@googlegroups.com.
>
> > To post to this group, send email to kubernet...@googlegroups.com.
>
> > Visit this group at https://groups.google.com/group/kubernetes-users.
>
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
>
> You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/7JBq6jhMZHc/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.

Evan,

This post was very helpful. We've hit this exact same issue in our Kubernetes cluster where we make a lot of outbound connections.

Did you find any downsides with setting "dnsPolicy: Default" and did you end up sticking with that as the solution?

Cheers,
Mike

Evan Jones

unread,
Mar 20, 2018, 7:44:27 AM3/20/18
to Kubernetes user discussion and Q&A
The downside that I am aware of is that you don't get the Kubernetes DNS magic, where names automatically point to your services. For the particular use case where I ran into this, it worked perfectly!

I was also going to attempt to add an alias so we could eventually migrate to dnsPolicy: Host instead of the confusingly named Default, but it seemed challenging enough that I never got around to it.

Evan


To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Jimmi Dyson

unread,
Mar 20, 2018, 7:49:52 AM3/20/18
to kubernet...@googlegroups.com
Remembered seeing this on Twitter from last week (https://twitter.com/bboreham/status/973871688495652865):

"PSA: In #Kubernetes use absolute DNS names not relative, where possible - put a dot at the end of the name. Cuts DNS lookups by 5x.I.e. instead of "http://example.com " put "http://example.com ."

To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages