ClusterIP service not distributing requests evenly among pods in Google Kubernetes Engine

5,594 views
Skip to first unread message
Assigned to placi...@gmail.com by me

cristian...@gmail.com

unread,
Apr 13, 2018, 9:41:55 AM4/13/18
to Kubernetes user discussion and Q&A

I have a ClusterIP service in my cluster with 4 pods behind it. I noticed that requests to the service are not evenly distributed among pods. After further reading I learned that the kube-proxy pod is responsible for setting up the iptables rules that forward requests to the pods. After logging into the kube-proxy pod and listing the nat table rules, this is what I got:

Chain KUBE-SVC-4F4JXO37LX4IKRUC (1 references)
target prot opt source destination
KUBE-SEP-6X4IVU3LDAAZJUPD all -- 0.0.0.0/0 0.0.0.0/0 /* default/btm-calculator: */ statistic mode random probability 0.25000000000
KUBE-SEP-TXRPWWIIUWW3MNFH all -- 0.0.0.0/0 0.0.0.0/0 /* default/btm-calculator: */ statistic mode random probability 0.33332999982
KUBE-SEP-HW6SF2LJM4S7X5ZN all -- 0.0.0.0/0 0.0.0.0/0 /* default/btm-calculator: */ statistic mode random probability 0.50000000000
KUBE-SEP-TTJKD52QZSH2OH4O all -- 0.0.0.0/0 0.0.0.0/0 /* default/btm-calculator: */

The comments seem to suggest that the load is balanced according to the static mode random probability with an uneven probability distribution. Is this how it's supposed to work? Every piece of documentation that I read about load balancing by a ClusterIP service indicates that it should be round robin. Obviously this is not the case here.
Is there a way to set a ClusterIP to perform round robin load balancing?

Thank you,
Cristian

Rodrigo Campos

unread,
Apr 13, 2018, 10:17:01 AM4/13/18
to kubernet...@googlegroups.com
Why are obviously not evenly distributed? How are pods asgined to nodes?

And also, how do you noticed, exactly, that they are not "evenly distributed"?
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Cristian Cocheci

unread,
Apr 13, 2018, 10:27:15 AM4/13/18
to kubernet...@googlegroups.com

I have only 1 node with multiple processors and a lot of memory. I actually did this on purpose to eliminate the "how are the pods distributed on nodes" variable.
I tail the application logs of the 4 pods at the same time, that's how I noticed the uneven distribution. Also, in my response from the pod, I added the "hostname", and printing it out in the pod that issues the requests. The troubling issue is that if I send a lot of requests in fast succession (in a loop), they ALL go to the same pod, there is no distribution at all.

On Fri, Apr 13, 2018 at 10:16 AM, Rodrigo Campos <rodr...@gmail.com> wrote:
Why are obviously not evenly distributed? How are pods asgined to nodes?

And also, how do you noticed, exactly, that they are not "evenly distributed"?

On Friday, April 13, 2018, <cristian...@gmail.com> wrote:

I have a ClusterIP service in my cluster with 4 pods behind it. I noticed that requests to the service are not evenly distributed among pods. After further reading I learned that the kube-proxy pod is responsible for setting up the iptables rules that forward requests to the pods. After logging into the kube-proxy pod and listing the nat table rules, this is what I got:

Chain KUBE-SVC-4F4JXO37LX4IKRUC (1 references)
target     prot opt source               destination
KUBE-SEP-6X4IVU3LDAAZJUPD  all  --  0.0.0.0/0            0.0.0.0/0            /* default/btm-calculator: */ statistic mode random probability 0.25000000000
KUBE-SEP-TXRPWWIIUWW3MNFH  all  --  0.0.0.0/0            0.0.0.0/0            /* default/btm-calculator: */ statistic mode random probability 0.33332999982
KUBE-SEP-HW6SF2LJM4S7X5ZN  all  --  0.0.0.0/0            0.0.0.0/0            /* default/btm-calculator: */ statistic mode random probability 0.50000000000
KUBE-SEP-TTJKD52QZSH2OH4O  all  --  0.0.0.0/0            0.0.0.0/0            /* default/btm-calculator: */

The comments seem to suggest that the load is balanced according to the static mode random probability with an uneven probability distribution. Is this how it's supposed to work? Every piece of documentation that I read about load balancing by a ClusterIP service indicates that it should be round robin. Obviously this is not the case here.
Is there a way to set a ClusterIP to perform round robin load balancing?

Thank you,
Cristian

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/lvfyKzUf-Vg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Sunil Bhai

unread,
Apr 13, 2018, 10:30:15 AM4/13/18
to kubernet...@googlegroups.com, cristian...@gmail.com

--

You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.

To post to this group, send email to kubernet...@googlegroups.com.

Sunil Bhai

unread,
Apr 13, 2018, 10:32:15 AM4/13/18
to Kubernetes user discussion and Q&A

Cristian Cocheci

unread,
Apr 13, 2018, 10:34:19 AM4/13/18
to Sunil Bhai, kubernet...@googlegroups.com

Thank you Sunil, but the LoadBalancer type is used for exposing the service externally, which I don't need. All I need is my service exposed inside the cluster.


To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.

To post to this group, send email to kubernetes-users@googlegroups.com.

Sunil Bhai

unread,
Apr 13, 2018, 10:37:33 AM4/13/18
to Cristian Cocheci, kubernet...@googlegroups.com

Check this once

 

http://clusterfrak.com/docker/labs/k8_clustering/

 

I will send out…. Loadbalancing within K8 ( Master & worker nodes)

 

Sent from Mail for Windows 10

 

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.

To post to this group, send email to kubernet...@googlegroups.com.

Tim Hockin

unread,
Apr 13, 2018, 10:39:38 AM4/13/18
to kubernet...@googlegroups.com, Sunil Bhai
The load is random, but the distribution should be approximately equal for non-trivial loads.  E.g. when we run tests for 1000 requests you can see it is close to equal.

How unequal is it?  Are you using session affinity?

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.

To post to this group, send email to kubernet...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

cristian...@gmail.com

unread,
Apr 13, 2018, 10:51:39 AM4/13/18
to Kubernetes user discussion and Q&A
I am not using session affinity, and I am not sending a statistically significant number of requests. In my particular use case I only need to send a number of requests of 100 or less. I also have the problem that I mentioned above, if I send 20 requests in a loop, they ALL go to the same pod. If I wait a while and send another group of 20 requests, they MIGHT go to a different pod, but they all go to the same pod (even if different than the first one). This is a big issue for me, since my requests are actually heavy calculations, an I was hoping to use this mechanism as a way of parallelizing my computations.

Sunil Bhai

unread,
Apr 13, 2018, 10:51:47 AM4/13/18
to Tim Hockin, kubernet...@googlegroups.com

Quick info I hope you have odd number nodes ( 1, 3, 5 etc) this is the best practice.

 

Sent from Mail for Windows 10

 

Tim Hockin

unread,
Apr 13, 2018, 10:59:46 AM4/13/18
to kubernet...@googlegroups.com
Without a statistically significant load, this is random.  What you are seeing satisfies that definition.

The real reason is that round-robin is a lie.  Each node in a cluster will do it's own RR from any number of clients.

cristian...@gmail.com

unread,
Apr 13, 2018, 11:15:43 AM4/13/18
to Kubernetes user discussion and Q&A

Thank you Tim.

Is there now way to set it to true RR? If not, I will have to do my own balancing, if there is no other suggestion.

cristian...@gmail.com

unread,
Apr 13, 2018, 11:40:19 AM4/13/18
to Kubernetes user discussion and Q&A

Coincidentally (or not), while searching for answers about this, yesterday I watched your "life of a packet" presentation at GC Next 17. :-)

cristian...@gmail.com

unread,
Apr 13, 2018, 1:17:04 PM4/13/18
to Kubernetes user discussion and Q&A

OK, I changed my pods to respond almost immediately so that I can test with a statistically significant number of requests (10,000), and I am still observing the same behavior, only 1 pod receives all 10k requests. Can anyone explain why this happens? I am including the service and deployment manifests below:

cpp-btm-calculator-svc.yaml:

apiVersion: v1
kind: Service
metadata:
labels:
app: cpp-btm-calculator
name: cpp-btm-calculator
spec:
ports:
- port: 3006
protocol: TCP
targetPort: 3006
selector:
app: cpp-btm-calculator
sessionAffinity: None
type: ClusterIP


cpp-btm-calculator-depl.yaml:


apiVersion: apps/v1beta1
kind: Deployment
metadata:
labels:
app: cpp-btm-calculator
name: cpp-btm-calculator-depl
spec:
replicas: 4
selector:
matchLabels:
app: cpp-btm-calculator
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: cpp-btm-calculator
spec:
containers:
- image: us.gcr.io/my-project/cpp-btm-calculator:v1.1.2
name: cpp-btm-calculator
imagePullPolicy: IfNotPresent
resources: {}
env:
- name: PORT
value: "3006"
- name: LOG_CONFIG
value: cpp-btm-calculator-logging.config

Rodrigo Campos

unread,
Apr 13, 2018, 1:19:32 PM4/13/18
to kubernet...@googlegroups.com
And how are you running the requests? Against which IP and which port?

cristian...@gmail.com

unread,
Apr 13, 2018, 1:23:22 PM4/13/18
to Kubernetes user discussion and Q&A

I am running them against the service's cluster IP address (through its name, i.e. "btm-calculator" which translates to the cluster IP), and port 3006.

Tim Hockin

unread,
Apr 13, 2018, 2:09:48 PM4/13/18
to kubernet...@googlegroups.com
What are you using for a client?  Is it by chance http and written in go?  Some client libraries, including Go's http, aggressively reuse connections.  

If you try with something like exec netcat, I bet you see different results.

BTW, one might argue that if you depend on RR, you will eventually be broken.  You would have to do that client side or in your own LB.

cristian...@gmail.com

unread,
Apr 13, 2018, 2:18:29 PM4/13/18
to Kubernetes user discussion and Q&A

I am using gRPC on both sides, both in C++. The client sends asynchronous requests. A new channel is created (and destroyed) with every request.

Thank you!

Daniel Smith

unread,
Apr 13, 2018, 2:25:38 PM4/13/18
to kubernet...@googlegroups.com
I haven't checked, but I'd bet that the C++ gRPC library uses HTTP2, which seems to explicitly encourage connection reuse, which leads to this behavior. If you search around you may be able to find some options.

cristian...@gmail.com

unread,
Apr 13, 2018, 2:29:38 PM4/13/18
to Kubernetes user discussion and Q&A

Thanks Daniel!
Reply all
Reply to author
Forward
0 new messages