k8s networking / cluster size limits confusion

1,630 views
Skip to first unread message

David Rosenstrauch

unread,
Aug 11, 2017, 10:36:40 AM8/11/17
to Kubernetes users list
According to the docs, k8s can support systems of up to 150000 pods.
(See https://kubernetes.io/docs/admin/cluster-large/) But given k8s'
networking model, I'm a bit puzzled on how that would work.

It seems like a typical setup is to assign a service-cluster-ip-range
with a /16 CIDR. (Say 10.254.0.0/16) However, I notice that my cluster
assigns a full /24 IP range to each pod that it creates. (E.g., pod1
gets 10.254.1.*, pod2 gets 10.254.2.*, etc.) Given this networking
setup, it would seem that Kubernetes would only be capable of launching
a maximum of 256 pods.

Am I misunderstanding how k8s works in this r? Or is it that the
networking would need to be configured differently to support more than
256 pods?

Thanks,

DR

Ben Kochie

unread,
Aug 11, 2017, 10:41:49 AM8/11/17
to kubernet...@googlegroups.com
Kuberentes will be giving a /24 to each node, not each pod.  Each node will give one IP out of that /24 to a pod it controls.  This default means you can have 253 pods-per-node.  This of course can be adjust depending on the size of your pods and nodes.

This means that you can fully utilize the /16 for pods (minus per-node network, broadcast, gateway)



--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

David Rosenstrauch

unread,
Aug 11, 2017, 10:47:14 AM8/11/17
to kubernet...@googlegroups.com
Ah. That makes a bit more sense.

Thanks!

DR

On 2017-08-11 10:41 am, Ben Kochie wrote:
> Kuberentes will be giving a /24 to each node, not each pod. Each node
> will give one IP out of that /24 to a pod it controls. This default
> means you can have 253 pods-per-node. This of course can be adjust
> depending on the size of your pods and nodes.
>
> This means that you can fully utilize the /16 for pods (minus per-node
> network, broadcast, gateway)
>
> On Fri, Aug 11, 2017 at 4:36 PM, David Rosenstrauch
> <dar...@darose.net> wrote:
>
>> According to the docs, k8s can support systems of up to 150000 pods.
>> (See https://kubernetes.io/docs/admin/cluster-large/ [1]) But
>> given k8s' networking model, I'm a bit puzzled on how that would
>> work.
>>
>> It seems like a typical setup is to assign a
>> service-cluster-ip-range with a /16 CIDR. (Say 10.254.0.0/16 [2])
>> However, I notice that my cluster assigns a full /24 IP range to
>> each pod that it creates. (E.g., pod1 gets 10.254.1.*, pod2 gets
>> 10.254.2.*, etc.) Given this networking setup, it would seem that
>> Kubernetes would only be capable of launching a maximum of 256 pods.
>>
>> Am I misunderstanding how k8s works in this r? Or is it that the
>> networking would need to be configured differently to support more
>> than 256 pods?
>>
>> Thanks,
>>
>> DR
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Kubernetes user discussion and Q&A" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to kubernetes-use...@googlegroups.com.
>> To post to this group, send email to
>> kubernet...@googlegroups.com.
>> [3].
>> For more options, visit https://groups.google.com/d/optout [4].
>
> --
> You received this message because you are subscribed to the Google
> Groups "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to kubernetes-use...@googlegroups.com.
> To post to this group, send email to
> kubernet...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.
>
>
> Links:
> ------
> [1] https://kubernetes.io/docs/admin/cluster-large/
> [2] http://10.254.0.0/16
> [3] https://groups.google.com/group/kubernetes-users
> [4] https://groups.google.com/d/optout

Matthias Rampke

unread,
Aug 11, 2017, 10:54:54 AM8/11/17
to kubernet...@googlegroups.com
And yes, with the defaults you are limited to 256 nodes per cluster. If you're running that large a cluster, I suppose you can be expected to twiddle some flags :)

David Rosenstrauch

unread,
Aug 11, 2017, 11:05:17 AM8/11/17
to kubernet...@googlegroups.com
Actually, that begs another question. The docs also specify that k8s
can support up to 5000 nodes. But I'm not clear on how the networking
can support that.

So let's go back to that service-cluster-ip-range with the /16 CIDR.
That only supports a maximum of 256 nodes.

Now the maximum size for the service-cluster-ip-range appears to be /12
- e.g., --service-cluster-ip-range=10.240.0.0/12 (Beyond that you get a
"Specified service-cluster-ip-range is too large" error.) So that means
12 bits for the high part of the address, and with each node taking the
lower 8 bits for the IP address of individual pods, that leaves 12
remaining bits worth of unique IP address ranges. 12 bits = 4095
possible IP addresses for nodes. How then could anyone scale up to 5000
nodes?

DR

Matthias Rampke

unread,
Aug 11, 2017, 11:26:32 AM8/11/17
to kubernet...@googlegroups.com
Oh hold on. the service cluster IP range is not for pod IPs at all. It's for the ClusterIP of services, so you can have up to 64k services in a cluster at the default setting. The range for pods is the --cluster-cidr flag on kube-controller-manager.

David Rosenstrauch

unread,
Aug 14, 2017, 12:03:43 PM8/14/17
to kubernet...@googlegroups.com
Thanks for the feedback. I see I didn't quite understand k8s networking
properly (and had my cluster misconfigured as a result).

I now have it configured as:

--cluster-cidr=10.240.0.0/12
--service-cluster-ip-range=10.128.0.0/16

And I'm deducing that the /12 in the cluster-cidr is what would then
allow this cluster to go beyond 256 nodes.


One other point about the networking I'm a little confused about that
I'd like to clarify: it seems that IP's in the cluster-cidr range
(i.e., service endpoints) are reachable from any host that is on the
flannel network, while IP's in the service-cluster-ip-range (i.e.,
services) are only reachable from the worker nodes in the cluster.

So, for example, I have a k8s setup with 4 machines: a master, 2 worker
nodes, and a "driver" machine. All 4 machines are on the flannel
network. I have a nginx service defined like so:

$ kubectl get svc nginx; kubectl get ep nginx
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx 10.128.105.78 <nodes> 80:30207/TCP 2d
NAME ENDPOINTS AGE
nginx 10.240.14.5:80,10.240.27.2:80 2d


Now "curl 10.128.105.78" only succeeds on the 2 worker node machines,
while "curl 10.240.14.5" succeeds on all 4.

I'm guessing this is expected / makes sense, since 10.240.0.0/12
addresses are accessible to any machine on the flannel network, whereas
10.128.0.0/16 addresses can only be reached via iptables rules - i.e.,
only accessible on machines running kube-proxy, aka the worker nodes.

Again I guess this is makes sense in retrospect. But I was a bit
surprised when I first saw this, as I had thought that services' cluster
IP's would be reachable from all machines. (Or at least from the master
too.)

Perhaps you could confirm that I'm understanding this all correctly.
(And have my cluster configured correctly?)

Thanks,

DR

On 2017-08-11 11:26 am, Matthias Rampke wrote:
> Oh hold on. the _service cluster IP range_ is not for pod IPs at all.
> It's for the ClusterIP of services, so you can have up to 64k services
> in a cluster at the default setting. The range for pods is the
> --cluster-cidr flag on kube-controller-manager.
>
> On Fri, Aug 11, 2017 at 3:05 PM David Rosenstrauch <dar...@darose.net>
> wrote:
>
>> Actually, that begs another question. The docs also specify that
>> k8s
>> can support up to 5000 nodes. But I'm not clear on how the
>> networking
>> can support that.
>>
>> So let's go back to that service-cluster-ip-range with the /16 CIDR.
>> That only supports a maximum of 256 nodes.
>>
>> Now the maximum size for the service-cluster-ip-range appears to be
>> /12
>> - e.g., --service-cluster-ip-range=10.240.0.0/12 [1] (Beyond that
>> [2] [2])
> [1] http://10.240.0.0/12
> [2] http://10.254.0.0/16

Tim Hockin

unread,
Aug 14, 2017, 12:14:16 PM8/14/17
to Kubernetes user discussion and Q&A
On Mon, Aug 14, 2017 at 9:03 AM, David Rosenstrauch <dar...@darose.net> wrote:
> Thanks for the feedback. I see I didn't quite understand k8s networking
> properly (and had my cluster misconfigured as a result).
>
> I now have it configured as:
>
> --cluster-cidr=10.240.0.0/12

/12 gives you room for ~4000 nodes at /24 each. (24 - 12 = 12, 2^12 = 4096)

> --service-cluster-ip-range=10.128.0.0/16

room for 65536 Services (2^16)

> And I'm deducing that the /12 in the cluster-cidr is what would then allow
> this cluster to go beyond 256 nodes.
>
>
> One other point about the networking I'm a little confused about that I'd
> like to clarify: it seems that IP's in the cluster-cidr range (i.e.,
> service endpoints) are reachable from any host that is on the flannel
> network, while IP's in the service-cluster-ip-range (i.e., services) are
> only reachable from the worker nodes in the cluster.

Generally correct.

> So, for example, I have a k8s setup with 4 machines: a master, 2 worker
> nodes, and a "driver" machine. All 4 machines are on the flannel network.
> I have a nginx service defined like so:
>
> $ kubectl get svc nginx; kubectl get ep nginx
> NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
> nginx 10.128.105.78 <nodes> 80:30207/TCP 2d
> NAME ENDPOINTS AGE
> nginx 10.240.14.5:80,10.240.27.2:80 2d
>
>
> Now "curl 10.128.105.78" only succeeds on the 2 worker node machines, while
> "curl 10.240.14.5" succeeds on all 4.
>
> I'm guessing this is expected / makes sense, since 10.240.0.0/12 addresses
> are accessible to any machine on the flannel network, whereas 10.128.0.0/16
> addresses can only be reached via iptables rules - i.e., only accessible on
> machines running kube-proxy, aka the worker nodes.

Right. To get to Services you need to either route the Service range
to your VMs (and use them as gateways) or expose them via some other
form of traffic director (e.g. a load-balancer).

David Rosenstrauch

unread,
Aug 14, 2017, 1:56:13 PM8/14/17
to kubernet...@googlegroups.com
On 2017-08-14 12:13 pm, 'Tim Hockin' via Kubernetes user discussion and
Can you clarify what you mean by "route the Service range to your VMs"?
I'm familiar with the load balancer approach you mentioned - i.e., to
get outside machines to access your service you could set up a load
balancer that points to the NodePort of each machine that's running the
service. How would it work to route the service range?

Thanks,

DR

Tim Hockin

unread,
Aug 14, 2017, 2:37:00 PM8/14/17
to Kubernetes user discussion and Q&A
Unfortunately, I can not easily clarify. It depends on your
infrastructure. If you have an L2 domain you should be able to set up
static routes on each machine or use proxy ARP. If you have L3
infrastructure, you can maybe use BGP or something else, or statically
manipulate the routing tables.

E.g. in GCP you can establish a Route resource pointing to a VM, for
the service range. Set up multiple routes for ECMP-ish behavior and
high(er) availability. But since it is static you need to manage it
manually.
Reply all
Reply to author
Forward
0 new messages