Kubernetes Multi-Node DNS

thomas rogers

unread,

Feb 24, 2017, 2:37:31 PM2/24/17

to Kubernetes user discussion and Q&A

Hello,

I am using ansible to deploy Kubernetes to bare metal using the playbooks from https://github.com/kubernetes/contrib/tree/master/ansible and am having problems with DNS resolution from pods not living on the master node.

Here are some details on the setup:

Docker: Docker version 1.13.1, build 092cba3
Flanneld: Flanneld version 0.5.5
Kubernetes: Kubernetes v1.4.5
OS: Linux version 4.4.0-59-generic (buildd@lgw01-11) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017

Testing, using the instructions from https://kubernetes.io/docs/admin/dns/ on a single node (running on the master) yields:

Server:    10.254.0.10
Address 1: 10.254.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local

while testing on a different node yields:

Server:    10.254.0.10
Address 1: 10.254.0.10

nslookup: can't resolve 'kubernetes.default'

I have also used the vagrant scripts provided by the repository using ubuntu16 with libvirt and see consistent behaviour with that of the physical machines.

I should also note that I was able to test the vagrant script using centos7 (which installed kubernetes 1.4.0) which actually appeared to work, however wasn't able to adapt that to the ubuntu16 case.

Some information from the VM setup:

Logs from kube-dns kubedns:

I0224 19:28:08.359739       1 server.go:94] Using https://10.254.0.1:443 for kubernetes master, kubernetes API: <nil>
I0224 19:28:08.365076       1 server.go:99] v1.5.0-alpha.0.1651+7dcae5edd84f06-dirty
I0224 19:28:08.365111       1 server.go:101] FLAG: --alsologtostderr="false"
I0224 19:28:08.365174       1 server.go:101] FLAG: --dns-port="10053"
I0224 19:28:08.365195       1 server.go:101] FLAG: --domain="cluster.local."
I0224 19:28:08.365217       1 server.go:101] FLAG: --federations=""
I0224 19:28:08.365224       1 server.go:101] FLAG: --healthz-port="8081"
I0224 19:28:08.365229       1 server.go:101] FLAG: --kube-master-url=""
I0224 19:28:08.365256       1 server.go:101] FLAG: --kubecfg-file=""
I0224 19:28:08.365262       1 server.go:101] FLAG: --log-backtrace-at=":0"
I0224 19:28:08.365269       1 server.go:101] FLAG: --log-dir=""
I0224 19:28:08.365275       1 server.go:101] FLAG: --log-flush-frequency="5s"
I0224 19:28:08.365281       1 server.go:101] FLAG: --logtostderr="true"
I0224 19:28:08.365285       1 server.go:101] FLAG: --stderrthreshold="2"
I0224 19:28:08.365290       1 server.go:101] FLAG: --v="0"
I0224 19:28:08.365294       1 server.go:101] FLAG: --version="false"
I0224 19:28:08.365299       1 server.go:101] FLAG: --vmodule=""
I0224 19:28:08.365389       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I0224 19:28:08.365615       1 server.go:145] skydns: metrics enabled on : /metrics:
I0224 19:28:08.365684       1 dns.go:166] Waiting for service: default/kubernetes
I0224 19:28:08.366448       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0224 19:28:08.366517       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0224 19:28:38.366301       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.254.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.254.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0224 19:28:38.367639       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.254.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout
E0224 19:28:38.368457       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.254.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout
I0224 19:29:09.367199       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.254.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.254.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0224 19:29:09.368291       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.254.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout
E0224 19:29:09.369085       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.254.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout
I0224 19:29:14.031575       1 server.go:133] Received signal: terminated, will exit when the grace period ends
I0224 19:29:40.368220       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.254.0.1:443/api/v1/namespaces/default/services/kubernetes: dial tcp 10.254.0.1:443: i/o timeout. Sleeping 1s before retrying.
E0224 19:29:40.369086       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.254.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout
E0224 19:29:40.369799       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.254.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.254.0.1:443: i/o timeout

Logs from kube-dns dnsmasq:

dnsmasq[1]: started, version 2.76 cachesize 1000
dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq[1]: using nameserver 127.0.0.1#10053
dnsmasq[1]: read /etc/hosts - 7 addresses

Logs from kube-dns healthz:

2017/02/24 19:14:33 Healthz probe on /healthz-kubedns error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-02-24 19:14:33.130111978 +0000 UTC, error exit status 1
...
2017/02/24 19:30:03 Healthz probe on /healthz-dnsmasq error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2017-02-24 19:29:43.134054709 +0000 UTC, error exit status 1

Thanks,
Thomas

Matthias Rampke

unread,

Feb 24, 2017, 4:35:46 PM2/24/17

to kubernet...@googlegroups.com

"DNS not working" is likely a symptom. First, (by `kubectl exec`ing around), verify that pods on multiple nodes can talk to each other, and that they can talk to the internet (`ping 8.8.8.8; host google.com 8.8.8.8`).

Also a common problem is that the kubedns pods inherit the DNS settings from the host. Check if /etc/resolv.conf in these is sensible (should point at some DNS resolver, and that should be reachable too).

/MR

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

thomas rogers

unread,

Feb 24, 2017, 4:56:33 PM2/24/17

to Kubernetes user discussion and Q&A

Hi,

Thanks for your help. It looks like you are correct in that the pods are unable to communicate with each other, even by ip address. I am currently investigating this issue.

So far, I have discovered problems with flanneld and etcd. The following changes have been made:

/etc/sysconfig/flanneld -> We have changed it to use the ip address of the master server instead of the hostname
/etc/systemd/system/etcd.service -> Type has been changed to "notify", Restart to "always", and NotifyAccess was added with the value "all"

The services now appear to be working correctly, but still no luck with even pinging pods.

Regards,
Thomas

thomas rogers

unread,

Feb 24, 2017, 5:20:16 PM2/24/17

to Kubernetes user discussion and Q&A

Hi,

Just to add, I also tried pinging 8.8.8.8 and it seems to work.

Reply all

Reply to author

Forward