Getting unusual timeouts with LoadBalancer on Kubernetes 1.3.3 running on GCE

59 views

Skip to first unread message

Christopher McKenzie

unread,

Jul 27, 2016, 2:15:33 PM7/27/16

to gce-discussion, Vicken Simonian

Hi,

I am getting unusual timeouts with the 'LoadBalancer' service type on Kubernetes 1.3.3 running on GCE, and I don't know where to start troubleshooting.

My environment:

KUBERNETES_VERSION=1.3.3

KUBERNETES_PROVIDER=gce

KUBE_GCE_ZONE=us-central1-b

NODE_SIZE=n1-standard-4

Cluster created via the `cluster/kube-up.sh` script.

Here is some repro steps to try to illustrate what I'm seeing.

1. Create two simple nginx RCs and two LoadBalancer services.

2. Curl the first nginx's LoadBalancer IP.

3. Scale replicas for the second nginx RC.

4. Watch the curl command of the first nginx timeout for 2 minutes.

--------------------

1. Create two simple nginx RCs and two LoadBalancer services.

---------------------

kubectl create -f - <<- EOF

apiVersion: v1

kind: ReplicationController

metadata:

spec:

replicas: 1

selector:

template:

metadata:

labels:

spec:

containers:

- name: nginx-alpine

image: rohan/nginx-alpine

ports:

- containerPort: 80

EOF

kubectl create -f - <<- EOF

apiVersion: v1

kind: ReplicationController

metadata:

spec:

replicas: 1

selector:

template:

metadata:

labels:

spec:

containers:

- name: nginx-alpine2

image: rohan/nginx-alpine

ports:

- containerPort: 80

EOF

kubectl create -f - <<- EOF

apiVersion: v1

kind: Service

metadata:

labels:

app: nginx-alpine

spec:

type: LoadBalancer

ports:

- port: 80

protocol: TCP

selector:

EOF

kubectl create -f - <<- EOF

apiVersion: v1

kind: Service

metadata:

labels:

app: nginx-alpine2

spec:

type: LoadBalancer

ports:

- port: 80

protocol: TCP

selector:

EOF

------

2. Curl the first nginx's LoadBalancer IP 10times/sec, ts for timestamps

------

while true; do /usr/bin/curl -k -I http://104.155.xxx.xxx | ts ; sleep 0.1; done

------

3. Scale replicas for the second nginx RC.

------

kubectl scale rc nginx-alpine2 --replicas 4

--------

4. Watch the curl command of the first nginx timeout for 2 minutes.

-----

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

0 612 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

Jul 26 21:09:50 HTTP/1.1 200 OK

Jul 26 21:09:50 Server: nginx/1.6.2

Jul 26 21:09:50 Date: Wed, 27 Jul 2016 04:09:50 GMT

Jul 26 21:09:50 Content-Type: text/html

Jul 26 21:09:50 Content-Length: 612

Jul 26 21:09:50 Last-Modified: Mon, 17 Nov 2014 14:48:17 GMT

Jul 26 21:09:50 Connection: keep-alive

Jul 26 21:09:50 ETag: "546a0ab1-264"

Jul 26 21:09:50 Accept-Ranges: bytes

Jul 26 21:09:50

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0

...

0 0 0 0 0 0 0 0 --:--:-- 0:02:04 --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:02:05 --:--:-- 0

0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0

curl: (7) Failed to connect to 104.155.142.86 port 80: Connection timed out

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

0 612 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

Jul 26 21:11:58 HTTP/1.1 200 OK

Jul 26 21:11:58 Server: nginx/1.6.2

Jul 26 21:11:58 Date: Wed, 27 Jul 2016 04:11:58 GMT

Jul 26 21:11:58 Content-Type: text/html

Jul 26 21:11:58 Content-Length: 612

Jul 26 21:11:58 Last-Modified: Mon, 17 Nov 2014 14:48:17 GMT

Jul 26 21:11:58 Connection: keep-alive

Jul 26 21:11:58 ETag: "546a0ab1-264"

Jul 26 21:11:58 Accept-Ranges: bytes

Jul 26 21:11:58

During the timeout, netstat says:

tcp 0 1 10.10.130.104:54848 104.155.xxx.xxx:80 SYN_SENT 17235/curl

Interestingly, a GCE Kubernetes cluster running 1.2.5 does not exhibit this timeout.

Is this normal or am I doing something wrong?

I should also point out that communications to the service inside the cluster via k8s dns work perfect.

Any help would be greatly appreciated.

Christopher McKenzie

Kamran (Google Cloud Support)

unread,

Jul 31, 2016, 7:51:39 PM7/31/16

to gce-dis...@googlegroups.com, vic...@oblong.com

Hello Christopher,

I noticed that you have posted this issue on Google Containers discussion group. That is the right discussion group for Kubernetes and Google containers. I hope the issue get resolved soon.

If you have any other questions, please let me know.

Sincerely,

Reply all

Reply to author

Forward

0 new messages