Etcd operator and k8s

133 views
Skip to first unread message

tra...@live.com

unread,
Nov 14, 2016, 3:53:29 PM11/14/16
to CoreOS Dev

The etcd operator is an awesome addition for the stability of any service that relies on it, including k8s itself. If etcd is safe, the cluster is safe.


Since etcd can run outside kubernetes, the problem of keeping etcd safe is really an independent problem from kubernetes. The project I'm working on (Rook) depends on etcd and has a requirement to run in both a kubernetes environment and a standalone environment. We have started implementing what amounts to a very basic etcd operator that will manage the health of the etcd cluster, but want to replace it with your much more complete operator. We would benefit now and going forward from the etcd operator.


What would it take to factor out the management of etcd from the dependency on kubernetes? Looking at the code, it seems we could define an interface, or interfaces, that define how the operator interacts with a generalized cluster. Methods such as "enumerate etcd members ", "start instance", "stop instance", and other operations that kubernetes takes care of. The etcd operator would become a library to be used by different types of clusters. In different environments where etcd runs, the clusters would benefit from a common implementation of monitoring etcd health, growing/shrinking the membership, backup/restore, and more. 


This means that all references to kubernetes would be factored out to a new package. For the k8s scenario, the etcd-operator would be initialized with the kubernetes cluster implementation. In the Rook scenario, the etcd operator would be initialized with the Rook cluster implementation.


Any reason the operator couldn't run outside kubernetes given this abstraction? 


Another level of abstraction to consider is the operator pattern. In our clusters, we effectively have a Ceph operator that manages the distributed storage subsystems. Currently the etcd and prometheus operators don't appear to share any common operator library.  Is there a planned operator library or is the k8s management all they are expected to have in common? Perhaps this abstraction would become obvious with the other refactoring suggested for etcd, but it might be different. Thoughts on this? 


Thanks!

Travis Nielsen

https://github.com/rook/rook

Maurizio Vitale

unread,
Nov 14, 2016, 4:19:44 PM11/14/16
to coreo...@googlegroups.com
not sure what an 'operator' would be but you have a lot of flexibility in how to handle etcd in k8s.

sure many startup scripts would run etcd in the master, but it doesn't have to be that way. All that is required is that the control panel components can reach an etcd cluster. And as soon as you get into HA you probably want a separate etcd cluster. In my prototype k8s cluster (coreos on virtualbox), I have 3 etcd clusters: one for coreos (for locksmithd, fleet and flannel), one for k8s on servers that are not used for anything else (e.g. not on masters or nodes) and (I believe) one used internally by kube-dns.

and you don't even need to bring up the k8s etcd cluster at the same time as the rest of k8s, although again this is what happens w/ most minikube/kube-up and similar scripts.

I'm not sure about the other things you want to do. If you have the right authorization you can talk to any of those etcd clusters and ask for health, membership etc.

But maybe I'm missing the point of your mail entirely and somebody else can chime in.

Travis Nielsen

unread,
Nov 14, 2016, 4:28:51 PM11/14/16
to coreo...@googlegroups.com
Thanks for the response. I should clarify that I'm referring specifically to the etcd operator here: https://github.com/coreos/etcd-operator. It provides automated management of etcd running in a kubernetes cluster.

Rob Szumski

unread,
Nov 14, 2016, 4:29:38 PM11/14/16
to coreo...@googlegroups.com
Travis is talking about this blog post, which explains “etcd operators”: https://coreos.com/blog/introducing-the-etcd-operator.html

Basically, it is a piece of software that encodes human operational knowledge into software. In this case, the operator works directly against the Kubernetes API as its base-line. Travis wants to remove that Kubernetes dependencies. I don’t think we have any plans for that so far, but other members of the team would know more.

 - Rob

Travis Nielsen

unread,
Nov 14, 2016, 5:03:10 PM11/14/16
to coreo...@googlegroups.com
I was assuming that there weren't plans yet to use it outside the main scenario for which it was created, but wanted to start the discussion early before starting a PR. It would be a non-trivial change to the operator.

From: <coreo...@googlegroups.com> on behalf of Rob Szumski <rob.s...@coreos.com>
Reply-To: <coreo...@googlegroups.com>
Date: Monday, November 14, 2016 at 1:29 PM
To: <coreo...@googlegroups.com>
Subject: Re: Etcd operator and k8s

Reply all
Reply to author
Forward
0 new messages