--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CAD-Ua_hi9sxQ4MrGYGMMDA87iYfo3wXnwoX60dShjs_DivvS-Q%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CACQ0tdB_QtMWr%2Bj1%3DbZO7ZP%3D6OKX%2Bti1DhZmTDxu_EPtQOtj%2Bg%40mail.gmail.com.
> the object has been modified; please apply your changes to the latest version and try again
--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CACQ0tdCb4NuQouOXSV8ruaK6pHXaMrGaPF9nXbOA%2BZA2NJ5uZg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CAD-Ua_i0siuSFmgiGYY1b-moZOj6j%3DdUmTOkdf-t%2BHU%2B_FVTBg%40mail.gmail.com.
Hi @Lars, |
Finally, I'd like to share that in IHMO, the best approach is to persuade the DDD principles. To illustrate my thoughts, let’s think about the classic scenario where the end goal is to have an operator which manages an application and its database. Then, one resource could represent the App, and another one could represent the DB. By having one CRD to specify the App and another one for the DB, we would not be hurting concepts such as encapsulation, the single responsibility principle, and cohesion. Damaging these concepts could cause unexpected side effects, such as difficulty extending, reuse, or maintenance, only to mention a few. Additionally, I'd not develop a controller such as an install.go that reconcile all.
See that following the operator's pattern; you will create controllers(controller-runtime) with a reconcile function responsible for synchronizing the resources until a desired state on the cluster. Not necessary but usually, the primary resource managed by these controllers are from your own APIs which Extend the Kubernetes API with CustomResourceDefinitions. So, the docs shared previously will reveal why to create APIs at all?. Note that, as shared the golden rule is to develop idempotent solutions. PS.: IHMO since the reconcile() perform resources reconciliations then, by understanding the APIs concepts and persuading some principles it can result in more maintainable controllers/reconciliation implementations as well.
There's already a lot of good information in this thread, so I'll just add the ways that I tend to think about the problem:
TL;DR: it is application-specific, but the current tooling and abstractions available to you as an author mean that it’s likely simpler to do small amounts of work and exit early / requeue.
In the language of control theory, your controller is a closed-loop feedback system where your reconciler builds a model of the system before acting upon that model.
Any change to the cluster state is a potentially important change to that model. Most controllers today are written so that they need to re-construct their current model from a set of (cached) resources, and I haven’t seen any controllers written that can cancel a running reconciliation if a cluster event produces some state-significant change to that model. It will often be simpler to exit and reconstruct your model frequently to ensure you are not dealing with an outdated model.
This hints at tooling / libraries / patterns that should likely exist but don’t yet, such as tooling to keep a model up-to-date in memory (instead of keeping a set of cached objects from which the model can be constructed). But even in the absence of something like that, you might choose to rebuild your model more or less frequently for application-specific reasons. Maybe building your model is time or resource intensive, or maybe the effects of working from an outdated model are minimal.
Some of this even bleeds into simple, low-level decisions in a controller: if an `update` call fails for a resource, do you retry with an updated version of that one resource? Or do you need to recreate a larger model with that updated resource?
The other lens I find valuable is that of process scheduling.
When using typical kubernetes controller machinery, you generally have one process handling one resource of a particular type at a time. Spending more time in your reconciliation can lead to a bad UX, where your managed resources may appear stuck, waiting for the controller to come be released so that it can get to them.
Requeueing frequently is more or less yielding frequently from coroutines, which can lead to more concurrency / more apparent work being done.
Hope that helps!
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CACQ0tdBnh5oVXzSYGU%3D%3DWJy7NPPtQH0487Rx10Q1EW-MLEHbRw%40mail.gmail.com.
I realized after providing an answer that I said almost nothing about the loop styles in question:)Which is likely an indication of my Meh:) Here are some specific thoughts.
1. Although possible and easy to understand… It seems immature to have a loop that does it all (which is generally how we introduce the idea). I would not be for a forever loop that queries the state of things and reacts to that.2. Put another way.. there needs to be a “work queue”. A queue of work, which is a collection changes in the system. At that point… do you have 1 queue for an object type? Or a generic queue of all the types? This is likely first point of design which will drive some of decisions3. Another influence on type of loop is… are you solve just your controller interests… or building out something generic to be used by other controllers?4. I would be against a thread per event or change (you don’t know the depth of queue/work)5. You could have a loop per type of interest perhaps… or a thread pool per type… but that doesn’t seem like a great generic solution6. Then you get into… what to do with multiple changes on the same object within the same queue… do you care to track each change… or do you remove dups and work on the last change since the last loop entry. Generally based on eventual consistency, you would favor the last change only.7. You have to protect the loop… you can’t be deadlocked on something… how do you handle timeouts… how do you handle errors8. You want to avoid long running functions within the loop…9. We are dealing with the control plane…10. You need to be able to handle your controller crashing and restarting… the loop should make no assumptions regarding that.
Again… just 1 man’s thoughts…Good luck,KenOn Jan 25, 2021, at 6:39 AM, Ken Sipe <ken...@gmail.com> wrote:Hey Lars! Welcome to the community!Some thoughts here…1. There is likely many code examples out there… with varying degrees of maturity (which from the sounds of it you recognize and is in part a driver of your questions)2. There has been additional maturity in our the Go operatorSDK AND in the Kubernetes community3. I would guess that from this mail list, there is going to be a bias influenced by Go and the controller-runtime. This is the answers and references Camilla provided (thanks Camilla… solid stuff!).4. Controller-runtime is a distillation of ideas and best practices and evolution of thought in the controller / operator space in Kubernetes. There is good reason to follow it’s patterns as I don’t know of another framework or language that is as leading edge (it could be lack of awareness on my part, but it is also hard to imagine as the team working on controller-runtime are working close with api-machinery etc in the the Kubernetes core)
There's already a lot of good information in this thread, so I'll just add the ways that I tend to think about the problem:
TL;DR: it is application-specific, but the current tooling and abstractions available to you as an author mean that it’s likely simpler to do small amounts of work and exit early / requeue.
In the language of control theory, your controller is a closed-loop feedback system where your reconciler builds a model of the system before acting upon that model.
Any change to the cluster state is a potentially important change to that model. Most controllers today are written so that they need to re-construct their current model from a set of (cached) resources, and I haven’t seen any controllers written that can cancel a running reconciliation if a cluster event produces some state-significant change to that model. It will often be simpler to exit and reconstruct your model frequently to ensure you are not dealing with an outdated model.
This hints at tooling / libraries / patterns that should likely exist but don’t yet, such as tooling to keep a model up-to-date in memory (instead of keeping a set of cached objects from which the model can be constructed). But even in the absence of something like that, you might choose to rebuild your model more or less frequently for application-specific reasons. Maybe building your model is time or resource intensive, or maybe the effects of working from an outdated model are minimal.
Some of this even bleeds into simple, low-level decisions in a controller: if an `update` call fails for a resource, do you retry with an updated version of that one resource? Or do you need to recreate a larger model with that updated resource?
The other lens I find valuable is that of process scheduling.
When using typical kubernetes controller machinery, you generally have one process handling one resource of a particular type at a time. Spending more time in your reconciliation can lead to a bad UX, where your managed resources may appear stuck, waiting for the controller to come be released so that it can get to them.
Requeueing frequently is more or less yielding frequently from coroutines, which can lead to more concurrency / more apparent work being done.