Dropping support for etcd2

Aaron Crickenberger

unread,

Jul 25, 2017, 9:24:01 PM7/25/17

to Kubernetes developer/contributor discussion

The current "supported releases" policy [1] states that we support three releases at a time. This means when Kubernetes v1.8 is released, we will stop supporting Kubernetes v1.5. Kubernetes v1.5 was the last release to default to etcd2 for new clusters [2], and Kubernetes v1.6 was the first release to default to etcd3 for new clusters [3].

By that line of reasoning alone, I would suggest that the Kubernetes project is about to drop support for etcd2. How do you all feel about that?

I raised the issue during the most recent SIG Architecture meeting [4], and came away with:

- we lack a documented 3rd party version / support / deprecation policy

- we have not announced the deprecation of etcd2

- there is a checklist for deprecating etcd2 that has had no input nor progress [5]

- kops doesn't support etcd3 [6]

- the etcd-upgrade tests that run every 2h have no results for master, and are failing for release-1.5 [7]

- this is an issue that crosses multiple SIG's and should be escalated to the community (hi!)

What I propose:

- we announce that v1.8 is the last release that supports etcd2

- we continue to ignore etcd2 in v1.8 release jobs (it's not part of the set of all v1.7 [8] jobs)

- we continue to ignore etcd2 in release blocking jobs (it's not part of the blocking jobs for v1.6 [9] or v1.7 [10])

- we remove etcd2-specific code/functionality in v1.9

- regardless of the final set of steps to be taken, we decide on a SIG or spin up a WG to own this specific drop in support

- SIG architecture (or the eventual steering committee) drafts a 3rd party version / support / deprecation policy that we use as a model going forward

I'd like to get a few minutes at the upcoming community meeting to discuss what we would like to do about this, summarizing whatever feedback comes up on this thread.

Thoughts?

[1] https://github.com/kubernetes/community/blob/master/contributors/design-proposals/versioning.md#supported-releases-and-component-skew

[2] https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#external-dependency-version-information-2

[3] https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#internal-storage-layer-1

[4] https://www.youtube.com/watch?v=snKxZEh0gIs

[5] https://github.com/kubernetes/kubernetes/issues/44156

[6] https://github.com/kubernetes/kops/issues/2842

[7] https://k8s-testgrid.appspot.com/etcd-upgrades

[8] https://k8s-testgrid.appspot.com/release-1.7-all

[9] https://k8s-testgrid.appspot.com/release-1.6-blocking

[10] https://k8s-testgrid.appspot.com/release-1.7-blocking

- aaron

Justin Santa Barbara

unread,

Jul 27, 2017, 1:08:13 PM7/27/17

to Kubernetes developer/contributor discussion

When we added support for etcd3, there were significant issues, and so we decided not to deprecate etcd2 at that time.

I created a tracking issue to track the issues that block deprecating etcd2: https://github.com/kubernetes/kubernetes/issues/44156 As you say, there's no progress, but that doesn't mean that we just ignore it. etcd2 should still be considered supported. It should still be under testing, and should still be a release blocker IMO.

etcd2 -> etcd3 was not a minor version bump - it is effectively a different key-value store, just with the same name. I don't think of this as a version change, rather we added support for an additional storage backend. I would like to see us add support for Consul, for example, but I do not think that when we do we should stop supporting etcd3 automatically. And if we choose to deprecate etcd3 at that time, we should follow our deprecation policy and do so at a pace that allows our users time to migrate.

Justin

Clayton Coleman

unread,

Jul 27, 2017, 1:15:29 PM7/27/17

to Aaron Crickenberger, Kubernetes developer/contributor discussion

I think it's time to start phasing out etcd2, and to ensure that we have the confidence to do so by tracking and identifying all the known issues. We may very well need more soak time on etcd3.

I do think it's reasonable to stop supporting etcd2 on new versions, and to require those users continue to use old clusters. I don't think the effort in supporting both etcd2 and 3 is worthwhile going forward, nor is splitting our testing and feature set to validate both.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/67c2c101-78b1-4d44-bffc-05a4ee5126d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Smith

unread,

Jul 27, 2017, 1:34:30 PM7/27/17

to Clayton Coleman, Aaron Crickenberger, Kubernetes developer/contributor discussion

I tend to agree w/ Clayton. I think we may need one more release first, though. I think we are still shaking bugs out of etcd3.

Also we should have a discussion about how we adopt minor versions of etcd3. 3.2 is out, but we are still rolling out 3.0.

Should we validate and recommend specific etcd releases for specific Kubernetes versions? Should we decouple as much as possible and leave the choice of minor version up to the cluster operator?

To post to this group, send email to kuberne...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/67c2c101-78b1-4d44-bffc-05a4ee5126d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKU%3DrDnYS5emxK%3DErixMShgfxP-zUoSDZNiqOC45OkmBA%40mail.gmail.com.

Wojciech Tyczynski

unread,

Jul 27, 2017, 1:43:34 PM7/27/17

to Daniel Smith, Clayton Coleman, Aaron Crickenberger, Kubernetes developer/contributor discussion

On Thu, Jul 27, 2017 at 7:34 PM, 'Daniel Smith' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:

I tend to agree w/ Clayton. I think we may need one more release first, though. I think we are still shaking bugs out of etcd3.

+1 for dropping the support for etcd v2 at some point

+1 for delaying it by one release - we should make people enough time to adopt it

Also we should have a discussion about how we adopt minor versions of etcd3. 3.2 is out, but we are still rolling out 3.0.

There is a PR in flight bumping head to 3.1.*: https://github.com/kubernetes/kubernetes/pull/49393

But I completely agree we need some more systematic process about it instead of ad-hoc bumps coming from different people (some of them accepted, some of them rejected).

Should we validate and recommend specific etcd releases for specific Kubernetes versions? Should we decouple as much as possible and leave the choice of minor version up to the cluster operator?

Complete decouping will be a bit hard. We are running test and release validation with some version of etcd. Those are versions

that we have at least some kind of confidence about. Giving a complete freedom may result in a number of bug reports coming

from combinations of etcd and k8s version that we never even run together.

That said, I agree that decouping it have advantages. But we need deeper discussion if we can actually make this happen.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKU%3DrDnYS5emxK%3DErixMShgfxP-zUoSDZNiqOC45OkmBA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAB_J3bbFLvqpk%2Bp8BbdX2NNA89hJg6JCtSffn5HJLUaaTHPu4A%40mail.gmail.com.

Justin Santa Barbara

unread,

Jul 27, 2017, 3:11:15 PM7/27/17

to Kubernetes developer/contributor discussion, dbs...@google.com, ccol...@redhat.com, spi...@gmail.com, Joe Beda

We discussed this in the contributor meeting - I'm sure Aaron can fill in more details:

* I volunteered to get etcd2 tests that are broken working again, so that they can be considered for release signal

* There are lots of users using etcd2, though it's unclear exactly how many - Azure & kops, likely self-install operators. Not OpenShift / GKE users.

* I believe in general everyone agrees that we want to get off etcd2

* We don't seem to have an explicit policy for deprecating for something like etcd (*)

* We have an issue containing a checklist for what should happen to deprecate etcd2 (proposed by me)

* We didn't have any volunteers to do that work to deprecate etcd2

* sig-apimachinery was nominated to (initially?) own the process of deprecation

(*) I didn't think anyone read down this far in the meeting, but Rule #7 in our deprecation policy seems to say 1 year after announcement of deprecation. (Joe: I think you called attention to our deprecation policy, am I misinterpreting?)

Personally, I believe that deprecating etcd2 should mean taking on the work to e2e test the upgrades. We probably should have tied the etcd2 stick to the etcd3 carrot - hopefully a lesson for next time :-) I guess we could (artificially) tie the etcd 3.2 carrot to the etcd2 -> 3 stick.

Justin

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

To post to this group, send email to kuberne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/67c2c101-78b1-4d44-bffc-05a4ee5126d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

To post to this group, send email to kuberne...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKU%3DrDnYS5emxK%3DErixMShgfxP-zUoSDZNiqOC45OkmBA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

To post to this group, send email to kuberne...@googlegroups.com.

Joe Beda

unread,

Jul 27, 2017, 3:18:07 PM7/27/17

to Justin Santa Barbara, Kubernetes developer/contributor discussion, dbs...@google.com, ccol...@redhat.com, spi...@gmail.com

My comment in the meeting wrt deprecation policy is that it is written around things that are user facing. Historically, the policy has applied to those parts of the system that are likely to impact end users and not operators. For example, changes to flags in for the server components haven't been held to the same standards as the API or kubectl.

I'm not passing judgement here but rather calling this out. It probably makes sense to be more specific about what aspects of the system the depredation policy applies to. If we do want to be more lax about operator facing aspects (like the version of etcd or the versions of docker that are supported) we should identify that aspect of the system and be explicit.

In addition, the deprecation policy is a goal. If there is no one signed up to help maintain something that is expensive then we may in fact break the policy. In that case, I suggest that we at least be honest with users about it can call it out.

Not sure if/how this applies to etch specifically but rather thinking through this in general.

Joe

Justin Santa Barbara

unread,

Jul 27, 2017, 3:44:09 PM7/27/17

to Joe Beda, Kubernetes developer/contributor discussion, Daniel Smith, Clayton Coleman, Aaron Crickenberger

Gotcha - thanks Joe. There is this sentence which I think clarifies the policy: "This applies only to significant, user-visible behaviors which impact the correctness of applications running on Kubernetes or that impact the administration of Kubernetes clusters, and which are being removed entirely." But I also don't feel I'm in a place to make a determination - this may need to be the steering committee's first order of business!

I'm signed up to keep etcd2 running :-) In practice that means getting our etcd2 e2e tests green, and ensuring that they are healthy enough to be blocking.

I think we're looking for someone to volunteer to do the upgrade e2e/automation work so that we can deprecate etcd2, that is where our resources are lacking.

And I willingly concede that is very far from an ideal situation, which is why I think we need to tie future version bumps to the less fun work of ensuring a path for existing users.

Daniel Smith

unread,

Jul 27, 2017, 3:57:25 PM7/27/17

to Justin Santa Barbara, Joe Beda, Kubernetes developer/contributor discussion, Clayton Coleman, Aaron Crickenberger

Sorry I couldn't make the meeting this morning--

On Thu, Jul 27, 2017 at 12:43 PM, Justin Santa Barbara <jus...@fathomdb.com> wrote:

Gotcha - thanks Joe. There is this sentence which I think clarifies the policy: "This applies only to significant, user-visible behaviors which impact the correctness of applications running on Kubernetes or that impact the administration of Kubernetes clusters, and which are being removed entirely." But I also don't feel I'm in a place to make a determination - this may need to be the steering committee's first order of business!

I'm signed up to keep etcd2 running :-) In practice that means getting our etcd2 e2e tests green, and ensuring that they are healthy enough to be blocking.

I think we're looking for someone to volunteer to do the upgrade e2e/automation work so that we can deprecate etcd2, that is where our resources are lacking.

I am confused about this--haven't we shipped a container that does this upgrade already? Distributions of Kubernetes that don't use our default container will have to write something special for their own use (but this is not the last upgrade we will ever do, so writing that mechanism is not wasted effort).

On Thu, Jul 27, 2017 at 12:11 PM Justin Santa Barbara <jus...@fathomdb.com> wrote:
We discussed this in the contributor meeting - I'm sure Aaron can fill in more details:

* I volunteered to get etcd2 tests that are broken working again, so that they can be considered for release signal
* There are lots of users using etcd2, though it's unclear exactly how many - Azure & kops, likely self-install operators. Not OpenShift / GKE users.

Not all, anyway :)

* I believe in general everyone agrees that we want to get off etcd2
* We don't seem to have an explicit policy for deprecating for something like etcd (*)
* We have an issue containing a checklist for what should happen to deprecate etcd2 (proposed by me)
* We didn't have any volunteers to do that work to deprecate etcd2

I think we can already say it is deprecated; that is a one-line documentation change. The question here is more when will it be removed, who will do the removal.

* sig-apimachinery was nominated to (initially?) own the process of deprecation

(*) I didn't think anyone read down this far in the meeting, but Rule #7 in our deprecation policy seems to say 1 year after announcement of deprecation. (Joe: I think you called attention to our deprecation policy, am I misinterpreting?)

Without checking, I think that's intended for items (APIs) where users are expected to have a bunch of automation (clients) that requires a painful and lengthy rollout to upgrade. I can see it both ways for etcd but since (IMO) etcd should be an implementation detail (there should not be clients of it other than apiserver), I tend to think an entire year is maybe an excessive maintenance cost. Especially since etcd2 doesn't scale and will never pass tests in big enough clusters.

Justin Santa Barbara

unread,

Jul 27, 2017, 4:10:18 PM7/27/17

to Kubernetes developer/contributor discussion, jus...@fathomdb.com, j...@heptio.com, ccol...@redhat.com, spi...@gmail.com

I think we're looking for someone to volunteer to do the upgrade e2e/automation work so that we can deprecate etcd2, that is where our resources are lacking.

I am confused about this--haven't we shipped a container that does this upgrade already? Distributions of Kubernetes that don't use our default container will have to write something special for their own use (but this is not the last upgrade we will ever do, so writing that mechanism is not wasted effort).

The shipped container did not work for HA clusters (we discovered this very late in the 1.5 cycle I believe, but shipped etcd3 anyway, but this was why kops at least didn't adopt etcd3). wojtekt has a PR to fix it; I haven't had the chance to test it. I'd love to see that get under e2e testing though - this may turn out to actually not be that hard after all :-)

I tend to think an entire year is maybe an excessive maintenance cost. Especially since etcd2 doesn't scale and will never pass tests in big enough clusters.

I don't disagree that a year is a long time, but I'd say it's pretty short for a database migration. Re-considering this time period may well be item #2 for the steering committee!

Re etcd2 scaling - I would like to see the metrics on the various options. My intuition is that having the Node healthcheck not write to etcd except on state transitions would have a bigger performance impact, but I hope someone has a doc that proves me wrong (and that would be an API change of course) :-)

Justin

Daniel Smith

unread,

Jul 27, 2017, 4:20:52 PM7/27/17

to Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Clayton Coleman, Aaron Crickenberger

On Thu, Jul 27, 2017 at 1:10 PM, Justin Santa Barbara <jus...@fathomdb.com> wrote:

I think we're looking for someone to volunteer to do the upgrade e2e/automation work so that we can deprecate etcd2, that is where our resources are lacking.

I am confused about this--haven't we shipped a container that does this upgrade already? Distributions of Kubernetes that don't use our default container will have to write something special for their own use (but this is not the last upgrade we will ever do, so writing that mechanism is not wasted effort).

The shipped container did not work for HA clusters (we discovered this very late in the 1.5 cycle I believe, but shipped etcd3 anyway, but this was why kops at least didn't adopt etcd3). wojtekt has a PR to fix it; I haven't had the chance to test it. I'd love to see that get under e2e testing though - this may turn out to actually not be that hard after all :-)

I haven't checked recently-- do we test anything about HA configurations? We should probably start if not...

I tend to think an entire year is maybe an excessive maintenance cost. Especially since etcd2 doesn't scale and will never pass tests in big enough clusters.

I don't disagree that a year is a long time, but I'd say it's pretty short for a database migration. Re-considering this time period may well be item #2 for the steering committee!

etcd seems to pop out a new version ~quarterly if not faster. If our migrations take more than one release we will always be far behind. Database migrations without schema changes shouldn't be a big deal, right? ;) ;)

Re etcd2 scaling - I would like to see the metrics on the various options. My intuition is that having the Node healthcheck not write to etcd except on state transitions would have a bigger performance impact, but I hope someone has a doc that proves me wrong (and that would be an API change of course) :-)

That would be a major change, yes. Also it is really easy to break an etcd2 cluster (at least on a small machine) just by e.g. writing a bunch of big configmaps, or by accidentally making controllers fight.

Justin

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/f0a4cc78-b4ac-4cef-be4c-02dd6e51da98%40googlegroups.com.

Clayton Coleman

unread,

Jul 27, 2017, 7:28:19 PM7/27/17

to Daniel Smith, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

On Jul 27, 2017, at 4:20 PM, Daniel Smith <dbs...@google.com> wrote:

On Thu, Jul 27, 2017 at 1:10 PM, Justin Santa Barbara <jus...@fathomdb.com> wrote:
I think we're looking for someone to volunteer to do the upgrade e2e/automation work so that we can deprecate etcd2, that is where our resources are lacking.

I am confused about this--haven't we shipped a container that does this upgrade already? Distributions of Kubernetes that don't use our default container will have to write something special for their own use (but this is not the last upgrade we will ever do, so writing that mechanism is not wasted effort).

The shipped container did not work for HA clusters (we discovered this very late in the 1.5 cycle I believe, but shipped etcd3 anyway, but this was why kops at least didn't adopt etcd3). wojtekt has a PR to fix it; I haven't had the chance to test it. I'd love to see that get under e2e testing though - this may turn out to actually not be that hard after all :-)

I haven't checked recently-- do we test anything about HA configurations? We should probably start if not...

I tend to think an entire year is maybe an excessive maintenance cost. Especially since etcd2 doesn't scale and will never pass tests in big enough clusters.

I don't disagree that a year is a long time, but I'd say it's pretty short for a database migration. Re-considering this time period may well be item #2 for the steering committee!

etcd seems to pop out a new version ~quarterly if not faster. If our migrations take more than one release we will always be far behind. Database migrations without schema changes shouldn't be a big deal, right? ;) ;)

Re etcd2 scaling - I would like to see the metrics on the various options. My intuition is that having the Node healthcheck not write to etcd except on state transitions would have a bigger performance impact, but I hope someone has a doc that proves me wrong (and that would be an API change of course) :-)

That would be a major change, yes. Also it is really easy to break an etcd2 cluster (at least on a small machine) just by e.g. writing a bunch of big configmaps, or by accidentally making controllers fight.

I can talk a bit about this - we're running million-key (~10k namespace, 20k pod and service) clusters on 1.5 and etcd2. We are effectively completely bottlenecked on write and read traffic to etcd at this scale (roughly 1k req/s to etcd), and are constantly tuning write traffic down in order to stay below the cap. Each new release adds new challenges. Our plan for openshift is to require etcd3 migrations for 1.7, and no longer support etcd2 at all after 1.6. In addition, while we have a lot of soak time on etcd2, most anomalous events end up uncovering new and exciting problems that impact the rest of the system.

Most of the fixes for cluster scale, performance, reliability, and high peak rate of change cannot be made on etcd2. So from a pure investment perspective, Kube is going to be a healthier and better ecosystem moving to etcd3. While I would never push users to upgrade faster than they are comfortable with, etcd2 isn't a pragmatic area to invest in.

David Oppenheimer

unread,

Jul 27, 2017, 9:30:42 PM7/27/17

to Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Clayton Coleman, Aaron Crickenberger

My intuition is that having the Node healthcheck not write to etcd except on state transitions would have a bigger performance impact,

We've talked about decoupling node liveness heartbeat from NodeStatus update (there is probably a Github issue for it) but as things stand today every node needs to update NodeStatus at least as often as the node failure timeout since we use the lastUpdated field there to detect node failure.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/f0a4cc78-b4ac-4cef-be4c-02dd6e51da98%40googlegroups.com.

Clayton Coleman

unread,

Jul 27, 2017, 9:59:24 PM7/27/17

to David Oppenheimer, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

On Jul 27, 2017, at 9:30 PM, David Oppenheimer <davi...@google.com> wrote:

My intuition is that having the Node healthcheck not write to etcd except on state transitions would have a bigger performance impact,

We've talked about decoupling node liveness heartbeat from NodeStatus update (there is probably a Github issue for it) but as things stand today every node needs to update NodeStatus at least as often as the node failure timeout since we use the lastUpdated field there to detect node failure.

On dense clusters (anything with a pod density above 5-10/node) node patch is not a major workload issue.

Derek's events work reduced event patch traffic from 200/s from nodes to 5/s.

The endpoints controller writing resource versions from pods into the endpoints struct (so every pod status update causes an endpoint write) was another 40/s writes that is mostly wasted - someone fixing that now. Everything else was small compared to that.

Eric Paris

unread,

Jul 28, 2017, 12:05:13 AM7/28/17

to Clayton Coleman, David Oppenheimer, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

On Thu, 2017-07-27 at 21:59 -0400, Clayton Coleman wrote:
>
>
> On Jul 27, 2017, at 9:30 PM, David Oppenheimer <davi...@google.com>
> wrote:
>
> > > > > My intuition is that having the Node healthcheck not write to
> > > > > etcd except on state transitions would have a bigger
> > > > > performance impact,
> >
> > We've talked about decoupling node liveness heartbeat from
> > NodeStatus update (there is probably a Github issue for it) but as
> > things stand today every node needs to update NodeStatus at least
> > as often as the node failure timeout since we use the lastUpdated
> > field there to detect node failure.
>
> On dense clusters (anything with a pod density above 5-10/node) node
> patch is not a major workload issue.
>
> Derek's events work reduced event patch traffic from 200/s from nodes
> to 5/s.
>
> The endpoints controller writing resource versions from pods into the
> endpoints struct (so every pod status update causes an endpoint
> write) was another 40/s writes that is mostly wasted - someone fixing
> that now. Everything else was small compared to that.

I have to go with David on this one. It matters. A large cluster of 500
nodes (don't we claim to support 5,000 nodes?) updating heartbeat
every 10 seconds is a meaningful impact on the number of system writes.
Agreed 50 writes/second certainly wasn't the worst offendor we've seens
but it's getting towards to the top of my list of writers to shut up
(it is currently about 15% of all etcd writes on one of my clusters).

-Eric

David Oppenheimer

unread,

Jul 28, 2017, 2:22:31 AM7/28/17

to Eric Paris, Clayton Coleman, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

I wasn't trying to say that we should or should not do it, just mentioning that we've considered it in the past. I found the Github issue; it's here.

-Eric

Clayton Coleman

unread,

Jul 28, 2017, 9:58:37 AM7/28/17

to Eric Paris, David Oppenheimer, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

Yes, when you grow horizontally it needs to be proportionally
increased. But we already have a simple knob for it, while we don't /
didn't have knobs for the other worse problems.

Both large and dense clusters are best moved to etcd3 sooner rather
than later, and our effort is best spent on ensuring users see the
benefit of that move with minimal risk.

>
> -Eric

Clayton Coleman

unread,

Nov 16, 2017, 9:54:20 PM11/16/17

to Eric Paris, David Oppenheimer, Justin Santa Barbara, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

Restarting this - I saw that kops is still on 2.2 on 1.9. There are features in beta in 1.9 that will not be supported on etcd2, so a cluster on 2.2 won't be fully featured. I don't know of a compelling reason to stay on 2.2 at this point. I'm not aware of any desire / intention in sig-api-machinery to continue to support the etcd2 code path, so I'd say there is rough consensus that etcd2 is practically unsupported, even if it happens to work. Secret encryption and api chunking will not be supported on etcd2 (not possible), and we no longer perform scale testing on that side.

Can someone articulate a reason to stay on etcd2 that would justify the kops job in the queue being on 2.2?

Justin Santa Barbara

unread,

Nov 16, 2017, 10:26:12 PM11/16/17

to Clayton Coleman, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

We have yet to give users a safe upgrade path, so we have real world users using etcd2.

So my articulation: I don't think we should abandon our users without giving them a nice migration path.

Clayton Coleman

unread,

Nov 16, 2017, 10:31:01 PM11/16/17

to Justin Santa Barbara, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

I'm not sure what you mean. The project has already provided numerous install tools with core components for upgrade. If kops as an installer is supporting a particular configuration that is problematic for upgrade (HA) then I would expect kops and its associated community members to make that change. If any individual installer said they can't support a particular upgrade path, I don't think that's a gate to the rest of the project (especially when other install paths have moved past it). By extension, when sig-node drops support for an older docker version, the installers that choose to install docker are also responsible for upgrading their particular configurations.

Justin Santa Barbara

unread,

Nov 16, 2017, 11:41:15 PM11/16/17

to Clayton Coleman, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

I don't know of a good community upgrade solution... Can you link to the one you're endorsing if you feel we do?

Otherwise, does apimachinery want to own that?

Daniel Smith

unread,

Nov 17, 2017, 2:07:00 AM11/17/17

to Justin Santa Barbara, Clayton Coleman, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

https://github.com/kubernetes/kubernetes/issues/43600 has been closed for months, I thought that was the blocker for kops? *we* use the technique / container mentioned there. It works.

I consider etcd2 deprecated, the clock should definitely start no later than 1.9... etcd3 has been the default for how long now? Almost an entire year, I think?

I consider rolling out an etcd 2 -> 3 upgrade the responsibility of distributions, not api machinery or other parts of "core kubernetes" (whatever that means) for the simple reason that it is incredibly environment dependent. We provided a container which does the upgrade, but we can't e.g. exhaustively test it in every environment, we don't know where a random Kubernetes install wants to store its backups, etc.

I am painfully aware of how much work it is to operate a Kubernetes distro, so please don't take this the wrong way; but doing updates is part of that. Rolling over GKE clusters was a 2 ish person 6 ish month process; every reusable output of that process is available already, so hopefully it won't be that hard for anyone else.

If Kubernetes were fully self-hosted everywhere, we might (key word: might) be able to offer a process with more guarantees.

(Who's up for the idea of making LTS versions that don't support upgrades--instead, you have to start a new cluster and roll over all your apps?)

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAFoXKmp7h5Txteo6rvawzWuddfjwXR2N_rY%3D%3D%3Dvm0%3DAEwEfiyw%40mail.gmail.com.

Justin Santa Barbara

unread,

Nov 17, 2017, 2:37:28 AM11/17/17

to Daniel Smith, Clayton Coleman, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

I feel like maybe we should have working group / SIG that does own etcd management. Should it be sig-cluster-lifecycle? The reasons you mentioned why it isn't a fit for sig-apimachinery seem to be exactly the things that sig-cluster-lifecycle deals with.

If we can get a group to take ownership of this, I think deprecating etcd2 (and etcd 3.0 and 3.1?) is probably a good idea for 1.9. apimachinery & the new group should probably announce the deprecation formally (are the release notes the right place?) It's time, but we should still do it right.

I think the agenda for "sig-etcd-n-to-n-plus-plus" is likely to be (1) "establish an etcd backup/recovery tool (or a few, with a consistent format)", (2) "establish a DR approach using that" and (3) "establish a live(ish) upgrade procedure, with the DR procedure as the fallback".

kops came out of figuring out how to upgrade AWS kube-up clusters, so I'm excited to help take this on :-)

Justin

aka "self elected nominee for leader of sig-etcd-n-to-n-plus-plus"

To post to this group, send email to kuberne...@googlegroups.com.

Clayton Coleman

unread,

Nov 17, 2017, 10:43:16 AM11/17/17

to Daniel Smith, Justin Santa Barbara, Eric Paris, David Oppenheimer, Kubernetes developer/contributor discussion, Joe Beda, Aaron Crickenberger

On Fri, Nov 17, 2017 at 2:06 AM, Daniel Smith <dbs...@google.com> wrote:

https://github.com/kubernetes/kubernetes/issues/43600 has been closed for months, I thought that was the blocker for kops? *we* use the technique / container mentioned there. It works.

I consider etcd2 deprecated, the clock should definitely start no later than 1.9... etcd3 has been the default for how long now? Almost an entire year, I think?

I consider rolling out an etcd 2 -> 3 upgrade the responsibility of distributions, not api machinery or other parts of "core kubernetes" (whatever that means) for the simple reason that it is incredibly environment dependent. We provided a container which does the upgrade, but we can't e.g. exhaustively test it in every environment, we don't know where a random Kubernetes install wants to store its backups, etc.

I am painfully aware of how much work it is to operate a Kubernetes distro, so please don't take this the wrong way; but doing updates is part of that. Rolling over GKE clusters was a 2 ish person 6 ish month process; every reusable output of that process is available already, so hopefully it won't be that hard for anyone else.

I started going through all the distros / installer tools - most are reusing the one provided in the image (single node), a few don't have any upgrade process.

Here's the openshift playbooks for the etcd2->3 migration for an HA cluster. Note that it takes down the other members because of known issues with etcd3 around events and TTLs, and another known issue that should have been fixed in the latest etcd 3.2 but we bypassed anyway:

https://github.com/openshift/openshift-ansible/blob/master/playbooks/common/openshift-etcd/migrate.yml

We made this upgrade mandatory between 1.6 and 1.7 (you are not able to install 1.7 without completing this upgrade on a 1.6 cluster) since the operational characteristics of 3 are better.

To post to this group, send email to kuberne...@googlegroups.com.

Aaron Crickenberger

unread,

Jan 31, 2018, 6:35:54 PM1/31/18

to Kubernetes developer/contributor discussion

Who owns the decision to remove etcd2 code from the kubernetes codebase? When would that happen? Are there any prerequisites to it happening?

- aaron

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.

To post to this group, send email to kuberne...@googlegroups.com.

Daniel Smith

unread,

Jan 31, 2018, 7:24:11 PM1/31/18

to Aaron Crickenberger, Kubernetes developer/contributor discussion

I think it falls under sig api machinery. I think we are just waiting for the end of the deprecation period at this point.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/ccdca081-0261-440a-9392-6e87b0daefdc%40googlegroups.com.

Justin Santa Barbara

unread,

Jan 31, 2018, 7:31:40 PM1/31/18

to Daniel Smith, Aaron Crickenberger, Kubernetes developer/contributor discussion, Robert Bailey

I put an item on the agenda for sig-cluster-lifecycle for next Tuesday (Feb 6th), to demo/discuss https://github.com/kopeio/etcd-manager . It allows for etcd management and backups without relying on kubernetes, including upgrades. It avoids some of the potential circular dependencies of the operator approach, which have been a blocker for consensus up until now, so I'm hoping we can reach agreement on this approach - or at the very least the backup structure on S3/GCS etc.

Justin

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/ccdca081-0261-440a-9392-6e87b0daefdc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/DoOl77xjpDA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-dev+unsubscribe@googlegroups.com.

To post to this group, send email to kubernetes-dev@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAB_J3bZSsknyE4qDS9Ezh%2BxOSSZ4hw6xEFpkcSLk1st%2BD%3Dyg2g%40mail.gmail.com.

Daniel Smith

unread,

Jan 31, 2018, 8:04:57 PM1/31/18

to Justin Santa Barbara, Aaron Crickenberger, Kubernetes developer/contributor discussion, Robert Bailey

Just to be super clear, there's two possible contexts in which we can talk about deprecation & removal.

One is the kube-apiserver code & binary, and I maintain that the only blocker is the deprecation window, which will expire sometime in Q4 this year. I therefore expect Kubernetes 1.13 and up will not support etcd2. (And I think this is two versions more than it should have been--because we forgot to start the deprecation clock ticking.)

The other context is migrating existing clusters and making sure no more etcd2 clusters are created. In my view it is up to the folks who created the clusters whether they want to support a migration or not. And how the migration happens is also up to them.

On Wed, Jan 31, 2018 at 4:30 PM, Justin Santa Barbara <jus...@fathomdb.com> wrote:

I put an item on the agenda for sig-cluster-lifecycle for next Tuesday (Feb 6th), to demo/discuss https://github.com/kopeio/etcd-manager . It allows for etcd management and backups without relying on kubernetes, including upgrades. It avoids some of the potential circular dependencies of the operator approach, which have been a blocker for consensus up until now, so I'm hoping we can reach agreement on this approach - or at the very least the backup structure on S3/GCS etc.

Justin

Aaron Crickenberger

unread,

Oct 1, 2018, 6:06:41 PM10/1/18

to Kubernetes developer/contributor discussion

Now that we've entered the 1.13 cycle, it's time to follow through on dropping etcd2 support https://github.com/kubernetes/features/issues/622

I have tagged this as owned by SIG API Machinery, SIG Release, and SIG Testing. If I have left out stakeholders please chime in on that issue.

- aaron

Justin

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/ccdca081-0261-440a-9392-6e87b0daefdc%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/DoOl77xjpDA/unsubscribe.

To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com.

To post to this group, send email to kuberne...@googlegroups.com.

Reply all

Reply to author

Forward