Flannel pods CrashLoopBackOff when using kubeadm to install kubernetes cluster on ubuntu 16.04

6,884 views
Skip to first unread message

Chen Li

unread,
Mar 2, 2017, 4:25:45 AM3/2/17
to kubernetes-sig-cluster-lifecycle
Hi all,

I'm following the guide:https://kubernetes.io/docs/getting-started-guides/kubeadm/ to install k8s on 3 nodes ubuntu 16.04.

On the master node, I have run   
a) kubeadm init --pod-network-cidr 10.244.0.0/16
c) kubectl create -f kube-flannel.yml

Then I run the join command on the other 2 nodes.

But, the flannel pod on the other 2 nodes always down in a while.

NAMESPACE     NAME                                                         READY     STATUS    RESTARTS   AGE       IP                NODE
default       kube-flannel-ds-9cx5f                                        1/2       CrashLoopBackOff 7          15m       174.37.107.68     glb-shanghai-jason-performance-006
default       kube-flannel-ds-fwfl3                                        2/2       CrashLoopBackOff  7          14m       173.193.128.142   glb-shanghai-jason-performance-005
default       kube-flannel-ds-vnx2t                                        2/2       Running   0          15m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   dummy-2088944543-42419                                       1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   etcd-glb-shanghai-jason-performance-004                      1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   kube-apiserver-glb-shanghai-jason-performance-004            1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   kube-controller-manager-glb-shanghai-jason-performance-004   1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   kube-discovery-1769846148-jdhbh                              1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   kube-dns-2924299975-gf0mm                                    4/4       Running   0          16m       10.244.0.2        glb-shanghai-jason-performance-004
kube-system   kube-proxy-0rvnz                                             1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004
kube-system   kube-proxy-3bjxb                                             1/1       Running   0          15m       174.37.107.68     glb-shanghai-jason-performance-006
kube-system   kube-proxy-452dz                                             1/1       Running   0          14m       173.193.128.142   glb-shanghai-jason-performance-005
kube-system   kube-scheduler-glb-shanghai-jason-performance-004            1/1       Running   0          16m       173.193.128.154   glb-shanghai-jason-performance-004



Use docker command to check the pod, I can get:

docker logs 7589cb83f2a2
E0302 09:09:51.398785       1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'default/kube-flannel-ds-9cx5f': Get https://10.96.0.1:443/api/v1/namespaces/default/pods/kube-flannel-ds-9cx5f: dial tcp 10.96.0.1:443: i/o timeout



The 10.96.0.1 is the kubernet service IP:

kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.96.0.1    <none>        443/TCP   17m


Anyone can help me here ???

Thanks,
-chen


Pavel Moukhataev

unread,
Mar 2, 2017, 7:01:03 PM3/2/17
to kubernetes-sig-cluster-lifecycle
It seems that you run into the same issue I did https://groups.google.com/forum/#!topic/kubernetes-sig-cluster-lifecycle/Ze_4kxxdeA4.
10.96.0.1 is service network. And flannel is trying to access api server. There are iptables rules that are used to redirect this to real address.

To make sure this is a problem can you please attach the following:
1) list of network interfaces you have (ip address)
2) list of routes (ip route)
but you didn't change interface so default is to be used in your case.
3) provide iptables-save
4) check if api server is from within master node (with different address), from worker node with different address, from worker node with 10.96.0.1 address and try another network interfaces - like curl --interface <> ...


четверг, 2 марта 2017 г., 12:25:45 UTC+3 пользователь Chen Li написал:

Chen Li

unread,
Mar 2, 2017, 8:40:06 PM3/2/17
to kubernetes-sig-cluster-lifecycle
Get from one of  the 2 slave nodes: 

1) list of network interfaces you have (ip address)


ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:25:90:f5:46:b2 brd ff:ff:ff:ff:ff:ff
    inet 10.6.27.89/26 brd 10.6.27.127 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:fef5:46b2/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:25:90:f5:46:b3 brd ff:ff:ff:ff:ff:ff
    inet 174.37.107.68/27 brd 174.37.107.95 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:fef5:46b3/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:25:90:f5:46:b4 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:25:90:f5:46:b5 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 1e:12:58:46:14:b5 brd ff:ff:ff:ff:ff:ff
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:6c:9a:20:59 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever



2) list of routes (ip route)


ip route
default via 174.37.107.65 dev eth1 onlink 
10.0.0.0/8 via 10.6.27.65 dev eth0 
10.6.27.64/26 dev eth0  proto kernel  scope link  src 10.6.27.89 
161.26.0.0/16 via 10.6.27.65 dev eth0 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
174.37.107.64/27 dev eth1  proto kernel  scope link  src 174.37.107.68 



3) provide iptables-save


iptables-save
# Generated by iptables-save v1.6.0 on Thu Mar  2 19:37:16 2017
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0]
:KUBE-SEP-NEA4B7MEWJDH6SEV - [0:0]
:KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-NEA4B7MEWJDH6SEV -s 173.193.128.154/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-NEA4B7MEWJDH6SEV -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-NEA4B7MEWJDH6SEV --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 173.193.128.154:6443
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-IT2ZTR26TO4XFPTO
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-NEA4B7MEWJDH6SEV --mask 255.255.255.255 --rsource -j KUBE-SEP-NEA4B7MEWJDH6SEV
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-NEA4B7MEWJDH6SEV
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-YIL6JZP7A3QYXJU2
COMMIT
# Completed on Thu Mar  2 19:37:16 2017
# Generated by iptables-save v1.6.0 on Thu Mar  2 19:37:16 2017
*filter
:INPUT ACCEPT [8:540]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [19:4968]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
COMMIT
# Completed on Thu Mar  2 19:37:16 2017





4) check if api server is from within master node (with different address), from worker node with different address, from worker node with 10.96.0.1 address and try another network interfaces - like curl --interface <> ...


curl: (7) Failed to connect to 10.36.249.226 port 443: Connection refused

curl: (7) Failed to connect to 173.193.128.154 port 443: Connection refused

===> Not return in a very long time....

Chen Li

unread,
Mar 2, 2017, 8:42:52 PM3/2/17
to kubernetes-sig-cluster-lifecycle
Tried the command you posted on another page:



url --interface eth1 https://10.96.0.1:443/
curl: (60) server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.


 curl --interface eth0 https://10.96.0.1:443/



Looks like eth1 can work ??? 

How to solve this ???

Pavel Moukhataev

unread,
Mar 3, 2017, 6:19:24 AM3/3/17
to kubernetes-sig-cluster-lifecycle
Yep - seems you have the same trouble. You have 2 interfaces - eth0 and eth1 and flannel or proxy setup iptables to use eth1. This is because eth1 is default interface on your host and to send request to 10.36.249.226 eth0 is used. You have route there:

# ip route
10.0.0.0/8 via 10.6.27.65 dev eth0 

So you have either use another network for services - like 192.168. In that case default interface will to be used to send data to it.
Or you can just remove that route: 10.0.0.0/8 via 10.6.27.65 dev eth0 - I don't know why do you that network configuration, so can't say exactly. May be there is VPN.


четверг, 2 марта 2017 г., 12:25:45 UTC+3 пользователь Chen Li написал:
Hi all,

Vasista T

unread,
Apr 21, 2017, 7:11:23 AM4/21/17
to kubernetes-sig-cluster-lifecycle
I'm having the same issue on Centos 7 VM machine.


My ip route output shows:

default via 192.168.170.1 dev ens160  proto static  metric 100
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
192.168.170.0/23 dev ens160  proto kernel  scope link  src 192.168.170.152  metric 100

I can see ens160   instead of eth0 or eth1.

is this an issue ?
Reply all
Reply to author
Forward
0 new messages