Cache size is the number of watch events to cache.
The watch cache holds *all* resource objects in memory plus “size” versions. If you have multiple versions of a resource, you will get one watch cache per.
Disabling watch cache on 1.7 or before can lead to etcd client side hangs due to a bad quota pool (on etcd 3.2) or due to inefficiencies in etcd watch (on etcd 3.2).
Starting a watch will always read from the watch cache if enabled. Requesting a resourceVersion=0 will serve that request from the cache (nodes and informers do this by default) if the cache is enabled.
At very high densities, disabling the watch cache on resources with low # of clients or low rate of change can substantially reduce cpu and memory use (for openshift at 15k namespaces on 200 nodes, with 2x the number of resource types as Kube, disabling watch cache for the bulk of resources was a 66% reduction in memory use and a 50-60% reduction in master cpu).
In a perfect world, the watch cache is only necessary when you have high numbers of watchers and listers on a resource type, and could in theory be dynamically enabled when a threshold was reached so as to reduce load. No plans to do that now.
Note that Kube resources are very memory inefficient (we use lots of small maps for labels and annotations and resources which are about 300-400 bytes each). A 15k namespace 1.7 controller manager with 18k pods and 200k secrets was about 15gb of heap in use, with 6.5gb allocated for pods, 3.5gb for secrets, and about 3gb of hashmaps and strings total.