Pod networking troubleshooting

671 views
Skip to first unread message

adam....@container-solutions.com

unread,
Sep 5, 2017, 3:09:41 AM9/5/17
to kubernetes-sig-network

Hey networking peeps, 

I have a 1.7.5 node which can run Pods but they can’t communicate to any other pods or services (except ones that run on the same node). kube-proxy is running and reporting that it's setting up all endpoint fine. I checked iptables and it had all kinds crap in it created by docker, but I removed all that and set docker to not touch iptables again. My iptables looks exactly the same as on my other node which is working. Where else should I look for clues? This is my first time having to dig this deep into k8s networking. All help is much appreciated!

Some info:
kubelet & kube-proxy version: 1.7.5
control plane versions: 1.7.4

------------------------------------------------------------ iptables -------------------------------------------------------------------------------------------
iptables --list -v
Chain INPUT (policy ACCEPT 25 packets, 11525 bytes)
 pkts bytes target     prot opt in     out     source               destination
 209K  171M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere
 210K  172M KUBE-SERVICES  all  --  any    any     anywhere             anywhere             /* kubernetes service portals */

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 20 packets, 1937 bytes)
 pkts bytes target     prot opt in     out     source               destination
 196K   17M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere
 196K   17M KUBE-SERVICES  all  --  any    any     anywhere             anywhere             /* kubernetes service portals */

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       all  --  any    any     anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------- kube-proxy logs ---------------------------------------------------------------------
Sep 03 20:55:01 worker3 systemd[1]: Started Kubernetes Kube Proxy.
Sep 03 20:55:02 worker3 kube-proxy[3397]: W0903 20:55:02.132538    3397 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are depre
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.166494    3397 server.go:478] Using iptables Proxier.
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.182674    3397 server.go:513] Tearing down userspace rules.
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.193290    3397 server.go:621] setting OOM scores is unsupported in this build
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.197228    3397 server.go:630] Running in resource-only container "/kube-proxy"
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.197616    3397 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.197656    3397 conntrack.go:52] Setting nf_conntrack_max to 131072
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.197690    3397 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.197711    3397 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.198500    3397 config.go:202] Starting service config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.198512    3397 controller_utils.go:994] Waiting for caches to sync for service config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.198583    3397 config.go:102] Starting endpoints config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.198587    3397 controller_utils.go:994] Waiting for caches to sync for endpoints config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.298806    3397 controller_utils.go:1001] Caches are synced for endpoints config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299398    3397 controller_utils.go:1001] Caches are synced for service config controller
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299612    3397 proxier.go:320] Adding new service port "test/words-db:" at 10.32.0.54:27017/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299787    3397 proxier.go:320] Adding new service port "test/backend:" at 10.32.0.91:80/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299809    3397 proxier.go:320] Adding new service port "test/frontend:" at 10.32.0.213:80/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299822    3397 proxier.go:320] Adding new service port "kube-system/kubernetes-dashboard:" at 10.32.0.42:80/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299834    3397 proxier.go:320] Adding new service port "default/kubernetes:https" at 10.32.0.1:443/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299855    3397 proxier.go:320] Adding new service port "kube-system/kube-dns:dns-tcp" at 10.32.0.10:53/TCP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.299866    3397 proxier.go:320] Adding new service port "kube-system/kube-dns:dns" at 10.32.0.10:53/UDP
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.300042    3397 proxier.go:1013] Stale udp service kube-system/kube-dns:dns -> 10.32.0.10
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.312618    3397 proxier.go:1718] Opened local port "nodePort for test/words-db:" (:30677/tcp)
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.313035    3397 proxier.go:1718] Opened local port "nodePort for test/backend:" (:32174/tcp)
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.313285    3397 proxier.go:1718] Opened local port "nodePort for test/frontend:" (:30864/tcp)
Sep 03 20:55:02 worker3 kube-proxy[3397]: I0903 20:55:02.316319    3397 conntrack.go:36] Deleting connection tracking state for service IP 10.32.0.10

hzxuzhonghu

unread,
Sep 5, 2017, 3:18:45 AM9/5/17
to kubernetes-sig-network, adam....@container-solutions.com

cni does not work well on your node. Have nothing to do with kube-proxy.  container network issue.
在 2017年9月5日星期二 UTC+8下午3:09:41,adam....@container-solutions.com写道:

Adam Sandor

unread,
Sep 5, 2017, 3:25:41 AM9/5/17
to hzxuzhonghu, kubernetes-sig-network
I’m using the kubenet plugin with the required CNI plugins downloaded from here: https://github.com/containernetworking/plugins/releases/tag/v0.6.0

Any ideas where to look for errors?
--
Adam Sándor
Senior Engineer/Consultant - Container Solutions
@adamsand0r

Tim Hockin

unread,
Sep 5, 2017, 2:16:18 PM9/5/17
to Adam Sandor, hzxuzhonghu, kubernetes-sig-network
What value did you set the hairpin mode in kubelet?

On Tue, Sep 5, 2017 at 12:25 AM, 'Adam Sandor' via
kubernetes-sig-network <kubernetes-...@googlegroups.com>
wrote:
> --
> You received this message because you are subscribed to the Google Groups
> "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to
> kubernetes-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
> For more options, visit https://groups.google.com/d/optout.

Adam Sandor

unread,
Sep 6, 2017, 4:17:40 AM9/6/17
to Tim Hockin, hzxuzhonghu, kubernetes-sig-network
It’s on default. Here is my full kubelet configuration:

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/bin/kubelet \
  --allow-privileged=true \
  --cluster-dns=10.32.0.10 \
  --cluster-domain=cluster.local \
  --container-runtime=docker \
  --network-plugin=kubenet \
  --serialize-image-pulls=false \
  --register-node=true \
  --tls-cert-file=/var/lib/kubelet/worker3.pem \
  --tls-private-key-file=/var/lib/kubelet/worker3-key.pem \
  --cert-dir=/var/lib/kubelet \
  --v=2 \
  --kubeconfig=/var/lib/kubelet/worker3.kubeconfig \
  --require-kubeconfig \
  --pod-cidr=10.200.3.0/24 \
  --enable-custom-metrics
Restart=on-failure
KillMode=process

Tim Hockin

unread,
Sep 7, 2017, 12:57:13 AM9/7/17
to Adam Sandor, hzxuzhonghu, kubernetes-sig-network
try --hairpin-mode="promiscuous-bridge"

Adam Sandor

unread,
Sep 8, 2017, 7:19:21 AM9/8/17
to Tim Hockin, hzxuzhonghu, kubernetes-sig-network
Thanks Tim, unfortunately it didn’t solve it but I seem to have some very weird stuff going on no this cluster. 
I launched a pod on another node and this one could curl the service, but when I kept trying over and over about 1 in 5 requests failed. I don’t understand how is this possible but as the cluster is hand-made (following KTHW) there is probably too many things I could have messed up. Probably best to just kill the whole thing and recreate from scratch. I’m doing this to prep for the CNCF admin exam btw.

Tim Hockin

unread,
Sep 8, 2017, 1:20:19 PM9/8/17
to Adam Sandor, hzxuzhonghu, kubernetes-sig-network
Is that 1 of 5 on the same VM? That is a start for investigation.

On Fri, Sep 8, 2017 at 4:19 AM, Adam Sandor

Adam Sandor

unread,
Sep 9, 2017, 8:31:37 AM9/9/17
to Tim Hockin, hzxuzhonghu, kubernetes-sig-network
Alright then :) I would love to solve this, but have no clue where to go from here. Can you give me some things to try? I can also give you access to the cluster if you want to take a look.

Adam

Tim Hockin

unread,
Sep 10, 2017, 12:12:28 AM9/10/17
to Adam Sandor, hzxuzhonghu, kubernetes-sig-network
I am, sadly, not going to have time to take you up on the debug
session, as fun as that sounds (seriously). I'm just too swamped with
stuff to do right now.

I would check for iptables DROP rules or polices that default to deny
(newer Docker versions do that).

I would use tcpdump to see what packets actually come through and
deduce where it might be failing

You could try iptables TRACE, perhaps


On Sat, Sep 9, 2017 at 5:31 AM, Adam Sandor

Adam Sandor

unread,
Sep 11, 2017, 1:03:47 PM9/11/17
to Tim Hockin, hzxuzhonghu, kubernetes-sig-network
Sure Tim I can understand that. Thanks for the help so far!

Prateek Gogia

unread,
Sep 11, 2017, 2:00:47 PM9/11/17
to kubernetes-sig-network
Hi Adam

One thing I see missing from your kubelet configuration is  --non-masquerade-cidr flag.
Kubelet needs to be run with this option for traffic to outside clusterIP range. Refer here - kubenet
  • Kubelet should also be run with the --non-masquerade-cidr=<clusterCidr> argument to ensure traffic to IPs outside this range will use IP masquerade.
Not sure, if this is the cause, but looks like this is a requirement and is missing from the Kubelet config.

--Prateek

>> >> >> > To post to this group, send email to
>> >> >> > kubernetes-...@googlegroups.com.
>> >> >> > Visit this group at
>> >> >> > https://groups.google.com/group/kubernetes-sig-network.
>> >> >> > For more options, visit https://groups.google.com/d/optout.
>> >> >
>> >> > --
>> >> > Adam Sándor
>> >> > Senior Engineer/Consultant - Container Solutions
>> >> > @adamsand0r
>> >
>> > --
>> > Adam Sándor
>> > Senior Engineer/Consultant - Container Solutions
>> > @adamsand0r
>
> --
> Adam Sándor
> Senior Engineer/Consultant - Container Solutions
> @adamsand0r

Adam Sandor

unread,
Sep 15, 2017, 8:12:24 AM9/15/17
to Prateek Gogia, kubernetes-sig-network
Thanks Prateek, unfortunately that didn’t help either. Are you sure this one should be set though? Kelsey doesn’t mention it in KTHW. Maybe it’s supposed to be set only with certain networking plugins?


>> >> >> > To post to this group, send email to

>> >> >> > Visit this group at
>> >> >> > https://groups.google.com/group/kubernetes-sig-network.
>> >> >> > For more options, visit https://groups.google.com/d/optout.
>> >> >
>> >> > --
>> >> > Adam Sándor
>> >> > Senior Engineer/Consultant - Container Solutions
>> >> > @adamsand0r
>> >
>> > --
>> > Adam Sándor
>> > Senior Engineer/Consultant - Container Solutions
>> > @adamsand0r
>
> --
> Adam Sándor
> Senior Engineer/Consultant - Container Solutions
> @adamsand0r
--
Adam Sándor
Senior Engineer/Consultant - Container Solutions
@adamsand0r

--
You received this message because you are subscribed to a topic in the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-sig-network/fTqPIbfLQjU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-sig-ne...@googlegroups.com.

To post to this group, send email to kubernetes-...@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-sig-network.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages