Getting unusual timeouts with LoadBalancer on Kubernetes 1.3.3 running on GCE

59 views
Skip to first unread message

Christopher McKenzie

unread,
Jul 27, 2016, 2:15:33 PM7/27/16
to gce-discussion, Vicken Simonian
Hi,
I am getting unusual timeouts with the 'LoadBalancer' service type on Kubernetes 1.3.3 running on GCE, and I don't know where to start troubleshooting.

My environment:
KUBERNETES_VERSION=1.3.3
KUBERNETES_PROVIDER=gce
KUBE_GCE_ZONE=us-central1-b
NODE_SIZE=n1-standard-4

Cluster created via the `cluster/kube-up.sh` script.

Here is some repro steps to try to illustrate what I'm seeing. 
1. Create two simple nginx RCs and two LoadBalancer services.
2. Curl the first nginx's LoadBalancer IP.
3. Scale replicas for the second nginx RC.
4. Watch the curl command of the first nginx timeout for 2 minutes.

--------------------
1. Create two simple nginx RCs and two LoadBalancer services.
---------------------
kubectl create -f - <<- EOF
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx-alpine
spec:
  replicas: 1
  selector:
    name: nginx-alpine
  template:
    metadata:
      labels:
        name: nginx-alpine
    spec:
      containers:
      - name: nginx-alpine
        image: rohan/nginx-alpine
        ports:
        - containerPort: 80
EOF
kubectl create -f - <<- EOF
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx-alpine2
spec:
  replicas: 1
  selector:
    name: nginx-alpine2
  template:
    metadata:
      labels:
        name: nginx-alpine2
    spec:
      containers:
      - name: nginx-alpine2
        image: rohan/nginx-alpine
        ports:
        - containerPort: 80
EOF
kubectl create -f - <<- EOF
apiVersion: v1
kind: Service
metadata:
  name: nginx-alpine
  labels:
    app: nginx-alpine
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
    name: http
  selector:
    name: nginx-alpine
EOF
kubectl create -f - <<- EOF
apiVersion: v1
kind: Service
metadata:
  name: nginx-alpine2
  labels:
    app: nginx-alpine2
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
    name: http
  selector:
    name: nginx-alpine2
EOF

------
2. Curl the first nginx's LoadBalancer IP 10times/sec, ts for timestamps
------
while true; do /usr/bin/curl -k -I http://104.155.xxx.xxx | ts ; sleep 0.1; done

------
3. Scale replicas for the second nginx RC.
------
kubectl scale rc nginx-alpine2 --replicas 4

--------
4. Watch the curl command of the first nginx timeout for 2 minutes.
-----

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Jul 26 21:09:50 HTTP/1.1 200 OK
Jul 26 21:09:50 Server: nginx/1.6.2
Jul 26 21:09:50 Date: Wed, 27 Jul 2016 04:09:50 GMT
Jul 26 21:09:50 Content-Type: text/html
Jul 26 21:09:50 Content-Length: 612
Jul 26 21:09:50 Last-Modified: Mon, 17 Nov 2014 14:48:17 GMT
Jul 26 21:09:50 Connection: keep-alive
Jul 26 21:09:50 ETag: "546a0ab1-264"
Jul 26 21:09:50 Accept-Ranges: bytes
Jul 26 21:09:50 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
...
  0     0    0     0    0     0      0      0 --:--:--  0:02:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:02:05 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:02:06 --:--:--     0
curl: (7) Failed to connect to 104.155.142.86 port 80: Connection timed out
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Jul 26 21:11:58 HTTP/1.1 200 OK
Jul 26 21:11:58 Server: nginx/1.6.2
Jul 26 21:11:58 Date: Wed, 27 Jul 2016 04:11:58 GMT
Jul 26 21:11:58 Content-Type: text/html
Jul 26 21:11:58 Content-Length: 612
Jul 26 21:11:58 Last-Modified: Mon, 17 Nov 2014 14:48:17 GMT
Jul 26 21:11:58 Connection: keep-alive
Jul 26 21:11:58 ETag: "546a0ab1-264"
Jul 26 21:11:58 Accept-Ranges: bytes
Jul 26 21:11:58 

During the timeout, netstat says:
tcp        0      1 10.10.130.104:54848     104.155.xxx.xxx:80       SYN_SENT    17235/curl

Interestingly, a GCE Kubernetes cluster running 1.2.5 does not exhibit this timeout. 
Is this normal or am I doing something wrong?

I should also point out that communications to the service inside the cluster via k8s dns work perfect.

Any help would be greatly appreciated.
Christopher McKenzie

Kamran (Google Cloud Support)

unread,
Jul 31, 2016, 7:51:39 PM7/31/16
to gce-dis...@googlegroups.com, vic...@oblong.com

Hello Christopher,

I noticed that you have posted this issue on Google Containers discussion group. That is the right discussion group for Kubernetes and Google containers. I hope the issue get resolved soon.

If you have any other questions, please let me know.

Sincerely,
Reply all
Reply to author
Forward
0 new messages