iptables proxy performance numbers

Dan Williams

unread,

Aug 26, 2016, 6:48:45 PM8/26/16

to kubernetes-sig-network

Tim,

Here are some concrete (and totally unscientific) numbers for 500
services with *zero endpoints* for only the 'iptables-restore' exec
time.

Kubernetes run with hack/local-up-cluster.sh.
OpenShift run as docker-in-docker (eg, master and nodes are run in
docker containers on my laptop).

Machine is a Thinkpad X1 Carbon with an Intel i7-5600U and SSD storage,
running Fedora 23 with a 4.6.6 kernel and iproute-4.1.1.

restore-time #rules/service
-------------------------------------------------------------
kube-proxy: 250ms - 350ms 3
openshift: 600ms - 800ms 8

The #rules/service difference is due to the OpenShift ingress
controller automatically adding LoadBalancer.IP and ExternalIPs to the
Service. This causes the standard kube iptables proxier to add three
"external IP" rules and two "loadbalancer IP" rules, which make up the
difference.

Given the difference in # rules I think the openshift proxy times are
reasonable (250 * 3 = 750 which is within 600 - 800ms). But the times
kinda suck for a lot of services...

Is this the info you were looking for? Should I open an issue to track
this kind of thing? What kind of numbers do you see and what kind of
machines are those numbers seen on?

One issue is that both kube-proxy *and* kubelet+kubenet (via the
HostPort stuff) will both be running iptables-restore. Even if you're
not running with kubenet, docker itself will actually be doing
iptables-restore to implement the HostPort stuff, so you're running
into iptables contention no matter what you do.

Dan
>

Tim Hockin

unread,

Aug 26, 2016, 7:26:51 PM8/26/16

to Dan Williams, kubernetes-sig-network

On Fri, Aug 26, 2016 at 3:48 PM, Dan Williams <dc...@redhat.com> wrote:
> Tim,
>
> Here are some concrete (and totally unscientific) numbers for 500
> services with *zero endpoints* for only the 'iptables-restore' exec
> time.

It would be interesting to graph this from 0 to 500 and see if we can
approximate a scaling factor.

> Kubernetes run with hack/local-up-cluster.sh.
> OpenShift run as docker-in-docker (eg, master and nodes are run in
> docker containers on my laptop).
>
> Machine is a Thinkpad X1 Carbon with an Intel i7-5600U and SSD storage,
> running Fedora 23 with a 4.6.6 kernel and iproute-4.1.1.
>
> restore-time #rules/service
> -------------------------------------------------------------
> kube-proxy: 250ms - 350ms 3
> openshift: 600ms - 800ms 8
>
> The #rules/service difference is due to the OpenShift ingress
> controller automatically adding LoadBalancer.IP and ExternalIPs to the
> Service. This causes the standard kube iptables proxier to add three
> "external IP" rules and two "loadbalancer IP" rules, which make up the
> difference.
>
> Given the difference in # rules I think the openshift proxy times are
> reasonable (250 * 3 = 750 which is within 600 - 800ms). But the times
> kinda suck for a lot of services...

First, this is a lot less than you made me think. I thought you were
talking multiple seconds. It is a sadly large number, but is there
anything to be done? Can we pre-parse something? short of batching
the operations...

> Is this the info you were looking for? Should I open an issue to track
> this kind of thing? What kind of numbers do you see and what kind of
> machines are those numbers seen on?

Worthwhile questions

> One issue is that both kube-proxy *and* kubelet+kubenet (via the
> HostPort stuff) will both be running iptables-restore. Even if you're
> not running with kubenet, docker itself will actually be doing
> iptables-restore to implement the HostPort stuff, so you're running
> into iptables contention no matter what you do.
>
> Dan
>>
>

> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to kubernetes-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
> For more options, visit https://groups.google.com/d/optout.

Tim Hockin

unread,

Aug 26, 2016, 8:06:46 PM8/26/16

to Dan Williams, kubernetes-sig-network

I ran up 500 services with 0 endpoints on my cluster here, and I saw
it peak at around 90 ms to save/restore 2100 iptables lines.

My test was:

iptables-save | wc -l
iptables-save | iptables-restor

| don't know if that last part is triggering some optimization maybe?
I accidentally deleted the data so I am re-running to 1000 services
now...

This is a 2-core GCE VM on kernel 3.16

Prashanth B

unread,

Aug 26, 2016, 8:38:59 PM8/26/16

to Tim Hockin, Dan Williams, kubernetes-sig-network

Fwiw, the times I saw it take longer was when we were limiting kube-proxy to 2 cpu shares because of a bug.

>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>> To post to this group, send email to kubernetes-sig-network@googlegroups.com.

>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-network@googlegroups.com.

Dan Williams

unread,

Aug 29, 2016, 11:24:31 AM8/29/16

to Tim Hockin, kubernetes-sig-network

I was previously timing the entire syncProxyRules() function, which
*does* take up to 1.7 seconds for OpenShift on my machine. That's done
under the proxy mutex, so it's obviously blocking other proxy sync
operations at the same time, but the proxy should eventually arrive at
the right ruleset. It's a bit worrisome that it can take the Go code
almost 1 second to generate the ruleset, though that does include the
iptables-save which will block on any other iptables operations. Go's
scheduling model isn't particularly deterministic.

One reason it can take multiple seconds is the proxy isn't the only
though touching iptables. If HostPorts are enabled, then either docker
or kubenet will also run iptables-restore for its DNAT/SNAT rules, and
that instance will race with and block kube-proxy.

Anyway, since I was trying to figure out where the bulk of time was
spent I decided to time as small a work unit as possible and these are
the numbers for that. Still not great, especially kube-proxy seems
very aggressive resyncing the rules (5s updated I think?).

Dan

RC

unread,

Aug 29, 2016, 12:04:03 PM8/29/16

to Dan Williams, Tim Hockin, kubernetes-sig-network

On Mon, Aug 29, 2016 at 11:24 AM, Dan Williams <dc...@redhat.com> wrote:

One reason it can take multiple seconds is the proxy isn't the only
though touching iptables. If HostPorts are enabled, then either docker
or kubenet will also run iptables-restore for its DNAT/SNAT rules, and
that instance will race with and block kube-proxy.

I think most of the delay might be actually spent waiting, without even trying to grab the iptables lock. As I discovered when I fixed the kubelet to start using the lock, `-w2` means "try once per second, for up to two seconds" and we should really consider using `-W`, too. See my comment and links:

https://github.com/kubernetes/kubernetes/pull/29135#issuecomment-236672339

--

Rudi

Tim Hockin

unread,

Aug 29, 2016, 3:45:47 PM8/29/16

to RC, Dan Williams, kubernetes-sig-network

I'm open to any and all options to make this better. That said, I
simply can not reproduce iptables itself being absurdly slow. With
1000 Services, I get about 4000 iptables rules and I can't make
`iptables-save | iptables-restore` take more than 160 ms (user+sys).

That said, it's clear that the proxy could batch a bunch of requests -
adding a single endpoint to all 1000 of those Services seems to call
iptables for each individual Service, and is still running 30 seconds
later.

Alex Pollitt

unread,

Aug 30, 2016, 5:05:35 AM8/30/16

to Tim Hockin, RC, Dan Williams, kubernetes-sig-network

Calico makes extensive use of iptables and we've not encountered save/restore performing absurdly slow either, if that observation is worth anything.

Dan Williams

unread,

Aug 30, 2016, 1:41:03 PM8/30/16

to Tim Hockin, RC, kubernetes-sig-network

On Mon, 2016-08-29 at 12:45 -0700, Tim Hockin wrote:
> I'm open to any and all options to make this better. That said, I
> simply can not reproduce iptables itself being absurdly slow. With
> 1000 Services, I get about 4000 iptables rules and I can't make
> `iptables-save | iptables-restore` take more than 160 ms (user+sys).

http://people.redhat.com/dcbw/kube-proxy.png

i7-5600U @ 2.6GHz: 700ms for 1000 services (2+4 cores+threads, kernel 4.6.6)
i7-4790 @ 3.6GHz: 270ms for 1000 services (4+8, kernel 4.6.7)

kube git master from Monday, built with Go 1.7, using hack/local-up-cluster.sh

Takeaways:

*** CPU makes a huge difference. Cores don't seem to, I disabled 2
cores (and their threads) on the 4790 and that didn't change the
results much. I don't think disk is a huge impact, as the 5600U has a
pretty fast Samsung SSD while the 4790 has 7200RPM spinning media, plus
iptables-restore doesn't do much with the disk as input is from stdin.

*** The 5600U (my laptop) was doing other stuff (like rendering this
chart) and that varies the results quite a bit, unlike the clean line
for the 4790 which was doing nothing but kube.

*** firewalld running in the background makes no difference

I'm really curious what distribution, kernel, and iptables versions
everyone is using?

Dan

Tim Hockin

unread,

Sep 1, 2016, 3:28:48 AM9/1/16

to Dan Williams, RC, kubernetes-sig-network

Debian Wheezy (pretty old).

Dan Williams

unread,

Sep 7, 2016, 5:32:06 PM9/7/16

to Alex Pollitt, Tim Hockin, RC, kubernetes-sig-network

On Tue, 2016-08-30 at 09:05 +0000, Alex Pollitt wrote:
> Calico makes extensive use of iptables and we've not encountered
> save/restore performing absurdly slow either, if that observation is
> worth
> anything.

What kernel versions are you running on, and what kind of times have
you seen for iptables-restore when run by the proxy? What kind of
machines are they?

Dan

Alex Pollitt

unread,

Sep 8, 2016, 11:42:46 AM9/8/16

to Dan Williams, Tim Hockin, RC, kubernetes-sig-network, Shaun Crampton

IIRC it is typically something like 100ms to do a restore of ~10k rules.

We generally support back to CentOS 6.5 (can't recall what kernel version that is).

Key thing with using iptables restore is you need to be very careful not to overwrite other iptables users' rules. So good structuring and naming of the chains you are updating and using the non-flush mode is recommended.

Reply all

Reply to author

Forward