Hi Ari,
thanks for raising this topic. The WHY often gets lost in the high velocity of development in the ecosystem. It seems like your question is rather "Why use the Operator pattern" than "Why use the Operator SDK". They are both equally valid and interesting though, so here are my 2 cents:
Why use the Operator pattern?
- standardized interaction via a Kubernetes-native API
- mature API versioning, migration and validation features in contemporary Kubernetes
- easy addition to existing Kubernetes primitives via Mutating Webhooks
- nice client UX to custom extensions using Admission Webhooks (fail fast)
- "free" storage via etcd
- drives GitOps
- Kubernetes RBAC to limit who can use the service
- in general very low learning curve for Operator end users: they can re-use all known control concepts from existing Kubernetes types
Why use the Operator SDK?
- all the benefits of Kubebuilder (which is what the SDK uses for scaffolding) like all kinds of boilerplate code generation for types, deep copy, hooks, registration, etc
- integrated testing from unit tests, over integration tests (envtest) to functional tests (kuttl)
- integrated packaging to distribute updates in a very controlled way (via a graph, using OLM)
- migration support across controller-runtime releases
- a lot of helpers to quickly run a local controller
I think the general advice of using the right tool for the right job still stands. Not everything is worth writing an Operator for. Application-aware procedures like configuring TLS and running application-consistent backup all on their own could also be done without CRD. Initial cluster formation could also be done with initContainers. But it adds up quickly if you want to provide a good experience for users that you do not know at this point in time. And very soon you are shipping a lot of code along your application image that is for management and not for business logic.
With the Operator pattern you are providing a much more consistent user experience that is nowadays well understood. It's the same reason why Kubernetes itself, despite its complexity, is successful: all clusters may be set up differently but work the same. All user knowledge and learnings are transferable. You don't have to re-learn how you interact with them to deploy software, just because you are using Kubernetes distributions from entirely different providers on completely separate infrastructure.
Operators essentially resemble not only handy extensions to the cluster to automate things like certificate management, policies enforcement, quota configuration, security audit, etc but provide managed services for things like Databases, Message Queues, Service Meshes, basically any distributed system sufficiently complex enough that it cannot solely be managed by on-board Kubernetes controllers. This is where reconciliation in a loop to avoid drift, versioned APIs and repeatable UX become very important.
If you are building an Operator it is very likely that some of the backing services and functionalities you need are already provided by other Operators out there. This is why we created OperatorHub.io and donated the entire Operator Framework to CNCF, so that the ecosystem continues to grow and it becomes even more seamless to re-use these Operators. This is not possible without some standardizations in the interfaces and interaction patterns. There are for sure alternatives out there but CRDs in general and the Operator pattern in particular have seen the broadest adoption and support so far.
A bit lengthy but hopefully I could provide some additional angles. Code complexity might be higher initially but that's what tools like Kubebuilder, Kudo, Operator-SDK etc are for.
/Daniel