Costs and performance of informers

902 views
Skip to first unread message

gmik...@gmail.com

unread,
Dec 1, 2017, 3:01:46 PM12/1/17
to K8s API Machinery SIG
I am interested in using parts of Kubernetes to build things that are not Kubernetes.  The API machinery looks pretty generally useful.  Has anybody studied the costs and performance of informers (e.g., client-go/tools/cache.SharedIndexInformer)?  Things like the CPU, memory, and network bandwidth consumed at various tiers, and the latency from API op to invocation of event handler by informer?

Thanks,
Mike

Daniel Smith

unread,
Dec 1, 2017, 4:21:28 PM12/1/17
to gmik...@gmail.com, K8s API Machinery SIG
We have measured latency in the past and it was < 1s, but obviously it depends on how big the throughput is.

Please keep us posted with your experiences. At least some people would like to see the API machinery grow into such use cases.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-machinery+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-api-machinery@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/c5139b12-770d-449f-8a6b-7f63e9a87c8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Clayton Coleman

unread,
Dec 1, 2017, 7:12:54 PM12/1/17
to Daniel Smith, gmik...@gmail.com, K8s API Machinery SIG
I don’t think I’ve seen a production issue at scale which was related to an informer issue directly.  That’s at 10-100k scales for objects driven by the cache.  I can’t say I’ve ever hit a hotspot in the informer either.  

It’s probably as well tuned as anything in Kube these days - probably some issues with massive amounts of registered notifiers, but we’ve seen up to 10-15 be fine.  Remember though that slow readers will lead to unbounded memory growth (2-4 pointers plus old object).
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3baXCGp4-%3D7ijHPVc44FZfXrix9Ryj7uiCuzUa54ABdyKQ%40mail.gmail.com.

gmik...@gmail.com

unread,
Dec 3, 2017, 8:27:43 PM12/3/17
to K8s API Machinery SIG
Clayton, I am not sure whether you were quantifying a rate or a population.

Do you guys know what I should expect in a system with 10^3 worker nodes, 3 clients on each node, one of those clients per node (i.e., a total of 10^3 clients) watching an object kind that has a population of about 10^5 and a change rate of about 10^1.5/sec, making for the total notification rate coming out of the apiservers being 10^4.5/sec?

Thanks,
Mike

gmik...@gmail.com

unread,
Dec 3, 2017, 8:32:52 PM12/3/17
to K8s API Machinery SIG

I should add that each client is a traditionally-structured controller: the event handler of the informer just dumps the object reference into a rate-limiting work queue.  A rate-limiting work queue holds a number of references that is limited to the population size (or maybe double that), right?

Thanks,
Mike 

Clayton Coleman

unread,
Dec 3, 2017, 9:37:30 PM12/3/17
to gmik...@gmail.com, K8s API Machinery SIG
On Sun, Dec 3, 2017 at 8:27 PM, <gmik...@gmail.com> wrote:
Clayton, I am not sure whether you were quantifying a rate or a population.

Do you guys know what I should expect in a system with 10^3 worker nodes, 3 clients on each node, one of those clients per node (i.e., a total of 10^3 clients) watching an object kind that has a population of about 10^5 and a change rate of about 10^1.5/sec, making for the total notification rate coming out of the apiservers being 10^4.5/sec?

You said "informer".  It sounds like you're talking about the whole API chain.  The informer won't even show up in profiles at that scale (i've never seen it).  If you want to use it for restful *like* things

1. use protobuf, or use json-iter or a specific extractor (encoding/json is pretty inefficient)
2. keep your object size small
3. make sure the watch cache is on for your resource type (if you are using watch caches)

We've had much higher change rates (100-200/s) on objects distributed to all nodes in production for larger populations.  The problem isn't informers, it's going to be your encoding and decoding chain.  If you're using the api machinery for serialization, assume 3-4us for encode and 5-7us for decode in protobuf for objects the size of pods, and 1-2us for conversion to internal and vice versa.

Watch cache is probably the biggest win for 1e3 clients and above.  If you reuse the infrastructure but can't use the watch cache for some reason you'll probably have CPU issues on the masters.  If you are memory bound on the master for some reason and can't use watch cache you may need to come up with another mechanism.
 

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-machinery+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-api-machinery@googlegroups.com.

mspr...@us.ibm.com

unread,
Dec 12, 2017, 11:14:26 AM12/12/17
to K8s API Machinery SIG
Where can I learn about watch caches?  I see https://kubernetes.io/docs/reference/generated/kube-apiserver/ tells me the basic syntax of some command line arguments to control this, but clearly there is more to understand.

Thanks,
Mike

Daniel Smith

unread,
Dec 12, 2017, 12:51:43 PM12/12/17
to Mike Spreitzer, K8s API Machinery SIG
"Watch caches" refer to a cache in kube-apiserver. See --watch-cache-sizes documentation: https://kubernetes.io/docs/reference/generated/kube-apiserver/

Frustratingly, the default is not documented.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-machinery+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-api-machinery@googlegroups.com.

mspr...@us.ibm.com

unread,
Dec 12, 2017, 4:37:56 PM12/12/17
to K8s API Machinery SIG
That generated documentation does not explain the concept nor give guidance on how to choose good settings.  These will be lacking even if the formulae that provide the defaults are exhibited.  I do not know what "100" means nor how to decide whether that is a good number.

Clayton Coleman

unread,
Dec 14, 2017, 10:27:36 PM12/14/17
to mspr...@us.ibm.com, K8s API Machinery SIG
Cache size is the number of watch events to cache.

The watch cache holds *all* resource objects in memory plus “size” versions.  If you have multiple versions of a resource, you will get one watch cache per.

Disabling watch cache on 1.7 or before can lead to etcd client side hangs due to a bad quota pool (on etcd 3.2) or due to inefficiencies in etcd watch (on etcd 3.2).

Starting a watch will always read from the watch cache if enabled.  Requesting a resourceVersion=0 will serve that request from the cache (nodes and informers do this by default) if the cache is enabled.

At very high densities, disabling the watch cache on resources with low # of clients or low rate of change can substantially reduce cpu and memory use (for openshift at 15k namespaces on 200 nodes, with 2x the number of resource types as Kube, disabling watch cache for the bulk of resources was a 66% reduction in memory use and a 50-60% reduction in master cpu).

In a perfect world, the watch cache is only necessary when you have high numbers of watchers and listers on a resource type, and could in theory be dynamically enabled when a threshold was reached so as to reduce load.  No plans to do that now.

Note that Kube resources are very memory inefficient (we use lots of small maps for labels and annotations and resources which are about 300-400 bytes each).  A 15k namespace 1.7 controller manager with 18k pods and 200k secrets was about 15gb of heap in use, with 6.5gb allocated for pods, 3.5gb for secrets, and about 3gb of hashmaps and strings total.  
--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To post to this group, send email to kubernetes-sig...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/c3b50ff1-ca2a-4078-9a3e-7edc28329d89%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages