What to do about High memory usage in ListAndWatch->List?

14 views
Skip to first unread message

Beni Cherniavsky-Paskin

unread,
Dec 7, 2020, 5:51:50 AM12/7/20
to Operator Framework, Irit Goihman, Nimrod Shneor
Hi.  We have an app using controller-runtime, watching several thousand custom resource objects in several collections.  (we're aware this is stressing etcd.  these are deliberate scale tests.)

We've seen it run out of memory and heap profiles just before that show most memory is eaten by ListAndWatch->List->ReadAll allocating huge []byte (biggest allocs 128MB, 256MB): https://gist.github.com/cben/3f98f73c0ac99aa4c83bb64890eb15ec

(We don't have the stack that led to ListAndWatch but assume it's goroutine(s) launched by controller-runtime because that's the only part that does watching.)

1. This is bigger then the memory used by Unmarshal and is still "inuse" so 
    sounds like it's doing a single huge List call without chunking?
    Chunking should also be easier on apiserver/etcd.
    Is there something we can configure to use chunking in the 
    initial List of ListAndWatch?

2. Looks like go-client does ReadAll() instead of deserializing directly off-the-wire.
    It does use json-iterator library — does anyone know if deserializing directly is doable?

3. We want to experiment with disabling caching.
  Our controllers don't need to see full picture of all resources, 
  reconciling them one-by-one is fine.
  When does controller-runtime actually hit the cache?  
  Are there downsides to disabling caching we should worry about?

Thanks for any advice!

Shawn Hurley

unread,
Jan 4, 2021, 10:51:23 AM1/4/21
to Beni Cherniavsky-Paskin, Operator Framework, Irit Goihman, Nimrod Shneor

Hello,

  1. I think that what you might be looking for is: https://github.com/kubernetes-sigs/controller-runtime/issues/532

  2. I don’t believe that is the case, some ideas on helping the performance here, is that you don’t care about all the resources spec and status, instead you just care about the metadata or just the metadata and status, there was/is discussion on making this option in controller-runtime last I heard (which was admittedly awhile ago).

  3. The biggest issue that we usually see with disabling caching is the exponential increase in API Server traffic.

When does controller-runtime actually hit the cache?

Controller runtime sets up a cache for each resource that is queried for with the default client. So every time you make a mgr.GetClient().Get(xxxxx,xxxx,xxxx)call.

If you only query for a subset of resources (pods with Label X) once per reconcile you may want to use the mgr.GetApiReader() from the docs[1]:

// GetAPIReader returns a reader that will be configured to use the API server.
// This should be used sparingly and only when the client does not fit your
// use case.

Please note that using this reader can cause significant traffic to the API Server and can cause other problems. I hope in your case, you may have the ability to use both the cache and the APIReader to get the best of both worlds?
If you still need to watch those resources but want to filter them this is the issue that you should comment on: https://github.com/kubernetes-sigs/controller-runtime/issues/244

Thanks,

Shawn Hurley

--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CACOt8XszZFn7qb5We5F-wYg%3Dr1%3DJ5XyotontdcYxSGeWQm%2BBQw%40mail.gmail.com.
signature.asc
Reply all
Reply to author
Forward
0 new messages