Feature request: Revert-only OWNERS in kubernetes/kubernetes?

21 views
Skip to first unread message

Maciej Borsz

unread,
Jun 10, 2019, 2:52:45 AM6/10/19
to kubernetes-...@googlegroups.com, gke-kubernetes-scalability, Wojciech Tyczynski
Hi all,

In sig-scalability we relatively often have to manually revert PRs affecting gce scalability.
So far, we used wojtekt's superpowers to approve such reverts, but he is a single person in our team that has such permissions (and e.g. this week he is on vacation).

So I wanted to ask to add more folks from our team to OWNERS in kubernetes/kubernetes repo. Given limited knowledge of the codebase, I was thinking that maybe giving us approval power only for reverts makes sense. It would be sufficient for our work and limit the risk.

Few questions:
1. What do you think?
2. (I'm not that familiar with github) Is it technically possible to distinguish revert PRs from other PRs?

Thanks,
Maciej


Stephen Augustus

unread,
Jun 10, 2019, 5:02:48 AM6/10/19
to Maciej Borsz, kubernetes-si...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-sig-release, gke-kubernetes-scalability, Wojciech Tyczynski
(+ SIG Scalability, SIG Testing)

AFAIK, that's not a thing we do today i.e., revert-only OWNERs. I've added SIG Testing to see if that's a thing they'd ever want to do. 
Committing and reverting code has similar potential impact, so I'd imagine we'd want to continue along the path of not weighting the two.

I'm a big +1 to adding more reviewers/approvers to scalability OWNERs, but that's a question/task for SIG Scalability and their subprojects[1], so I've added them here.

-- Stephen


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-release" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-re...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-release/CA%2BbcoB5T9nZqezFZ1vJc5y8BaeC%2BxV77yZbumtSM4L000cnqfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Augustus

unread,
Jun 10, 2019, 5:07:39 AM6/10/19
to Maciej Borsz, kubernetes...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-sig-release, Wojciech Tyczynski
(take two; I didn't realize that kubernetes...@googlegroups.com was the current address for the SIG. Also, I don't believe external users have permissions to post to gke-kubernetes-scalability, so sending that list to bcc.)

Jordan Liggitt

unread,
Jun 10, 2019, 8:56:55 AM6/10/19
to Stephen Augustus, Maciej Borsz, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
Have there been examples of urgent reverts needed that did not find an approver in O(hours)? I tend to agree with Stephen that we wouldn't generally want to add approvers only for certain types of changes.

Maciej Borsz

unread,
Jun 10, 2019, 9:13:28 AM6/10/19
to Jordan Liggitt, Stephen Augustus, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
On Mon, Jun 10, 2019 at 2:56 PM Jordan Liggitt <lig...@google.com> wrote:
Have there been examples of urgent reverts needed that did not find an approver in O(hours)? I tend to agree with Stephen that we wouldn't generally want to add approvers only for certain types of changes.

https://github.com/kubernetes/kubernetes/pull/76190 is an example revert that we wanted to merge quickly.
If we had to revert similar PR e.g. this week (when Wojtek is on vacation), we would have a problem finding an approver for it.

I think that preferably I would like to have more than one person in the team that is able to approve such PRs, but if it's infeasible could you suggest some other European time zone approver we can ask for help in such cases?

Thanks,
Maciej

Stephen Augustus

unread,
Jun 10, 2019, 9:21:56 AM6/10/19
to Maciej Borsz, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
This goes back to the reviewers/approvers pools for SIG Scalability subprojects. The SIG needs to work on increasing it to more than just a few people for 1.16.

Maciej Borsz

unread,
Jun 12, 2019, 4:57:35 AM6/12/19
to Stephen Augustus, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
https://github.com/kubernetes/kubernetes/pull/78931 is another example -- as part of debugging regression in 1.15 we found the faulty PR and now we need we are waiting for review.

If revert-only OWNERS is infeasible, I think that just another (next to Wojtek) Europe time zone approver will help a lot.






On Tue, Jun 11, 2019 at 10:31 AM Maciej Borsz <macie...@google.com> wrote:


On Mon, Jun 10, 2019 at 3:21 PM Stephen Augustus <Ste...@agst.us> wrote:
This goes back to the reviewers/approvers pools for SIG Scalability subprojects. The SIG needs to work on increasing it to more than just a few people for 1.16.

Sorry, I don't understand. The 'sig scalability subprojects' page you mentioned earlier shows OWNERS only for sig-scalability-owned code (like kubemark's code). This is not an issue I meant.

The problem is not about reverting changes to sig-scalability-owned code. The issue is that sometimes we need to revert changes to kubernetes/kubernetes repo as we find that they are breaking scalability tests (and more importantly the scalability of the kubernetes overall).

The reason why we (sig-scalability) need to manually revert PRs is that presubmits covers only basic 100 node cluster while some issues can be detected only in 5000-node scale. We are running a daily tests 5000-node tests and when we see that it's failing, when we find the culprit we simply want to revert this as fast as possible to make sure there is no other regression.

My ask is only to make this process smoother in case when only person in sig-scalability with approval permissions (wojtekt) is unavailable for some reason. Revert-only OWNERS seemed to be a good idea to me, but we can figure out some other solution here if this is infeasible.

Davanum Srinivas

unread,
Jun 12, 2019, 6:04:45 AM6/12/19
to Maciej Borsz, Stephen Augustus, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
I am in US East TZ. I can volunteer for helping with approving reverts as well.

-- Dims

You received this message because you are subscribed to the Google Groups "kubernetes-sig-scale" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-s...@googlegroups.com.
To post to this group, send email to kubernetes...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scale/CA%2BbcoB5i8TiDK%3D9n%2BM7kH3RK6r7KQWOAWZRbgwpqBOxgtt-4%2BA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
Davanum Srinivas :: https://twitter.com/dims

Matt Matejczyk

unread,
Jun 12, 2019, 6:10:52 AM6/12/19
to Maciej Borsz, Jordan Liggitt, Stephen Augustus, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Thanks Maciej, big +1 to having a revert only approval or finding another satisfactory solution.

I'd like to make a few things clear here:

1. In the SIG Scalability Charter it was agreed that the SIG Scalability has every right to rollback a PR that has been identified as a cause of performance/scalability regression. The charter was approved by the Kubernetes community. Quoting the relevant section:

We can rollback any merged PR if it has been identified as a cause of any performance/scalability SLOs regression (identified by the set of release blocking scalability/performance tests). The offending PR should only be merged again after proving to pass tests at scale.

2. Scalability tests are very expensive and take very long, e.g. a 5k scale test takes 10+ hours to run and costs a few thousand dollars. Not being able to revert PRs in a timely manner wastes a lot of resources. 
    Good example is PR/78931, mentioned by Maciek, where we've already identified a cause, but due to no approval rights we'll have to wait another day and waste another run.

3. Not being able to react in time can potentially block a release. There is only one week left till the 1.15 release date and there might be other regressions that we won't be able to identify and fix until the PR/78931 gets merged and another run of 5K scale test is triggered. Given the the tight schedule, we have only a few of scale test runs left and every one of them matters. We shouldn't waste them only because there is no one in our time-zone to approve the revert. 

4. Having a fast-track to revert changes that break CI/CD tests is a common practice in many companies (e.g. Google), I don't see a reason why we shouldn't have anything like that in Kubernetes.

Given that, I believe it's crucial for the SIG Scalability to have means to revert PRs when Wojtek is OOO. 
If there is no mechanism for revert-only approvals, cannot we just add a one or two guys from SIG Scalability in Europe as regular approves annotated by a comment that it's only for reverts and will be used only as described in the SIG Scalability Charter?

Let me know what you think. I really hope we can find a solution here.

Thanks,

Matt


You received this message because you are subscribed to the Google Groups "kubernetes-sig-scale" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-s...@googlegroups.com.
To post to this group, send email to kubernetes...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scale/CA%2BbcoB5i8TiDK%3D9n%2BM7kH3RK6r7KQWOAWZRbgwpqBOxgtt-4%2BA%40mail.gmail.com.

Stephen Augustus

unread,
Jun 12, 2019, 7:50:12 AM6/12/19
to Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
Dims -- thanks for volunteering!

Maciej / Matt--
Appreciate the additional context, especially around the charter i.e., you have the mandate to have that power, but not the permissions to exercise it.
I've created an issue[1] to discuss updating the top-level reviewers/approvers in k/k and assigned the current OWNERS.

The suggestion:
Adding SIG Arch and Steering as I couldn't find a policy on updating this file specifically (outside of --> [2]).
Happy to help ping people/make the OWNERS update, once we settle on something!

-- Stephen

[1] https://github.com/kubernetes/kubernetes/issues/78935
[2] https://github.com/kubernetes/community/blob/master/contributors/guide/owners.md#maintaining-owners-files

lig...@google.com

unread,
Jun 12, 2019, 7:56:53 AM6/12/19
to Matt Matejczyk, Maciej Borsz, Stephen Augustus, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
The current PR is a good example of why we need careful review. The bump of klog in question not only caused the performance regression, it fixed a functional regression as well. Reverting it would reintroduce a release-blocking bug. 

lig...@google.com

unread,
Jun 12, 2019, 8:01:30 AM6/12/19
to Matt Matejczyk, Maciej Borsz, Stephen Augustus, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
My apologies, got the klog bumps mixed up. https://github.com/kubernetes/kubernetes/pull/78216 fixed the functional issue. Revert in 78931 lgtm. 

Matt Matejczyk

unread,
Jun 12, 2019, 8:26:41 AM6/12/19
to lig...@google.com, Maciej Borsz, Stephen Augustus, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Thank you, Stephen! 
This is really great, let's continue the discussion about revert-only approvers in the issue. 

Jordan, thanks for approving the revert and unblocking us!

Best Regards,

Matt

Stephen Augustus

unread,
Jun 12, 2019, 8:36:39 AM6/12/19
to Matt Matejczyk, Jordan Liggitt, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Anytime Matt!

Maciej Borsz

unread,
Jun 12, 2019, 10:53:19 AM6/12/19
to Stephen Augustus, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
On Mon, Jun 10, 2019 at 3:21 PM Stephen Augustus <Ste...@agst.us> wrote:
This goes back to the reviewers/approvers pools for SIG Scalability subprojects. The SIG needs to work on increasing it to more than just a few people for 1.16.

Sorry, I don't understand. The 'sig scalability subprojects' page you mentioned earlier shows OWNERS only for sig-scalability-owned code (like kubemark's code). This is not an issue I meant.

The problem is not about reverting changes to sig-scalability-owned code. The issue is that sometimes we need to revert changes to kubernetes/kubernetes repo as we find that they are breaking scalability tests (and more importantly the scalability of the kubernetes overall).

The reason why we (sig-scalability) need to manually revert PRs is that presubmits covers only basic 100 node cluster while some issues can be detected only in 5000-node scale. We are running a daily tests 5000-node tests and when we see that it's failing, when we find the culprit we simply want to revert this as fast as possible to make sure there is no other regression.

My ask is only to make this process smoother in case when only person in sig-scalability with approval permissions (wojtekt) is unavailable for some reason. Revert-only OWNERS seemed to be a good idea to me, but we can figure out some other solution here if this is infeasible.
 
On Mon, Jun 10, 2019, 09:05 Maciej Borsz <macie...@google.com> wrote:

Christoph Blecker

unread,
Jun 12, 2019, 1:31:22 PM6/12/19
to Stephen Augustus, Matt Matejczyk, Jordan Liggitt, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Is there any case where waiting 12 hours would cause an undue burden on the project? I'm right now -1 on introducing a fast path on reverts.. as Jordan says, a revert can potentially cause more problems than the scalability issue we are trying to correct for. PRs need the same amount of attention and review if they're putting code in, or taking code out.

You received this message because you are subscribed to the Google Groups "kubernetes-sig-testing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-te...@googlegroups.com.

To post to this group, send email to kubernetes-...@googlegroups.com.

Jordan Liggitt

unread,
Jun 12, 2019, 1:33:55 PM6/12/19
to Christoph Blecker, Stephen Augustus, Matt Matejczyk, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
I'd tend to agree. In the cases I'm aware of, urgent PRs (including reverts) needing top-level approval were able to get them in O(hours). That doesn't seem overly problematic. If we want to streamline reverts specifically, the revert bot being discussed in the spawned issue seems like a good path forward.

Davanum Srinivas

unread,
Jun 12, 2019, 1:36:18 PM6/12/19
to Jordan Liggitt, Christoph Blecker, Stephen Augustus, Matt Matejczyk, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Makes sense. +1 to what Christoph and Jordan say about the O(hours) and new bot.

-- Dims


For more options, visit https://groups.google.com/d/optout.


--
Davanum Srinivas :: https://twitter.com/dims

Christoph Blecker

unread,
Jun 12, 2019, 1:54:47 PM6/12/19
to Davanum Srinivas, Jordan Liggitt, Stephen Augustus, Matt Matejczyk, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
What I don't understand is what a "revert bot" would do.. Open a revert PR? Or is it literally just to fast track a revert PR into the code base because it will bypass approval?

I'm reading through the thread in more detail, so I have a couple more thoughts:
- The 5k jobs run no matter what. A PR not being in doesn't cost more money.. it's money that would need to be spent either way. I think it's a misnomer to say we are "wasting" money.
- We could possibly provide a mechanism to hold on 5k jobs that we know are going to break to *save* money.
- If we have more code owners step up in child owner files, then we have less need for top level approvers.

Again, I just feel like a PR needs to be reviewed thoroughly by qualified approvers no matter what. In my opinion, don't think "reverts" should be treated any differently (by bots or otherwise).

If the problem is we need to broaden the approvers pool in general, then that's a separate discussion.
If we have PRs that are breaking scalability at the end of a release cycle, then we should be looking at why these PRs are coming in so late.

Tim Pepper

unread,
Jun 12, 2019, 5:49:39 PM6/12/19
to Christoph Blecker, Davanum Srinivas, Jordan Liggitt, Stephen Augustus, Matt Matejczyk, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski

In some cases it is not been the PR coming in late, but the scalability regression data and debug analysis coming in late.  These are slow tests, the debug is complex, and there is a limited amount of folks contributing (https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit#heading=h.ukbaidczvy3r à one public meeting of the SIG this year???).

 

Either way though, whether risky late patches, slow analysis, or too small of a volunteer pool, I think special treatment to revert patch review is too reactive to the symptom versus addressing the broader issues and thus the wrong approach.  I’m a -1 on this proposal.

 

How can we shift left to get more meaningful collaboration earlier in the scalability domain?

 

 

-- 

Tim Pepper

Orchestration & Containers Lead

VMware Open Source Technology Center

Matt Matejczyk

unread,
Jun 13, 2019, 7:03:26 AM6/13/19
to Tim Pepper, Christoph Blecker, Davanum Srinivas, Jordan Liggitt, Stephen Augustus, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski
Thanks everyone for participating in the discussion, I hope it will eventually end up in finding a solution.

I don't agree that it's a misnomer to say that we're "wasting" money.  In my opinion, we' re wasting not only money but also the limited scale test resources and our (sig-scalability) time. 
The revert PR mentioned in this discussion is a great example. Being so close to the release we needed to run the test manually yesterday to see if there are any other regressions.
These manual tests costed us extra money, our time, and blocked scale testing of other features (we have only 1 shared project for the manual tests). We wouldn't have to do it if we were able to revert the PR in time. 

Unfortunately, the O(hours) doesn't work well in our case because of the issues that were mentioned above. Being in Europe usually effectively means next working day and wasted another CI/CD run.  

I see comments suggesting improvements that in my opinion are orthogonal to this request, e.g. speeding up the scalability regression debugging. Please be aware that sig-scalability is actively working on that. 
The late debug analysis was mostly caused by the prow issues, which have been hitting us a lot in the last months. We've solved them this week by introducing our private Prow build/test cluster. 
We're also investing a lot in instrumentation of our scale tests which helps us debugging scalability issues faster. 
Our aim is to find culprits of the regressions within one working day since the regression is revealed in the scale tests, and, with the recent improvements, I think we're really close to being there. 

I also don't agree with the statements that reverts should be treated as any other PRs. When you submit a change that breaks something it's usually a good practice is to rollback it immediately and then investigate. From my experience, majority of rollbacks are safe.
Please also note that we put a lot of effort in debugging scalability issues and finding the culprit commits. When we revert things it's always backed by great deal of evidence and thorough evaluation. 
We don't plan to revert commits that we're not 100% sure that they are root cause of a regression. We don't plan to revert if we have any doubts that the revert may actually introduce another issues. 

I'm not insisting on any particular solution, but would really appreciate if we could come up with means for the scalability team to exercise rights that were agreed upon in the SIG Scalability Charter.

Thanks,

Matt


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scale" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-s...@googlegroups.com.
To post to this group, send email to kubernetes...@googlegroups.com.

Tim Pepper

unread,
Jun 13, 2019, 12:09:22 PM6/13/19
to Matt Matejczyk, Christoph Blecker, Davanum Srinivas, Jordan Liggitt, Stephen Augustus, Maciej Borsz, kubernetes-sig-release, kubernetes...@googlegroups.com, kubernetes-sig-testing, Wojciech Tyczynski, Andrzej Wasylkowski

This very much depends on the nature of the thing you’re reverting.  If you’re reverting a bug fix, you are introducing a regression.  Since you have broken something should (or even just _could_ unwittingly?) your revert be itself reverted?  Some will answer ‘yes’ to this question.


This is not a silly contrived example either.  There are multiple existing examples of exactly this, especially in vendored dependencies, where one party bumps some code forward, another party hits an issue and reverts, another party hits the original issue and bumps the code forward similarly again.

 

How do you intend to fix prevent that if not though left shifting proactive communication and collaboration?

 

Block and revert is a largely passive action from the rest of the community’s perspective.  When you make a revert, you must proactively be immediately also opening an issue and owning the shepherding of that issue through a special workflow leading the various stakeholders to a place where their issue and your issue are both resolved.  If you’re not going to do a normal up front review and consensus driving flow, then you definitely must own that afterwards reactively.  You haven’t indicated how this is done today or if it is a requirement you feel you own.  Neither the SIG charter nor the https://github.com/kubernetes/community/blob/master/sig-scalability/block_merges.md document indicates that there is any such required follow on action led by and owned by SIG Scalability.

 

(NOTE: These situations are also compounded by reverts typically not having additional highly detailed info in the commit message or PR description AND much of our trivial bug fix and even critical vendored code commit/PR message history being opaque messages like “bump” or “pull in ver 1.2.3”.  This isn’t something I put on SIG Scalability, but beg all parties to improve.)

 

 

-- 

Tim Pepper

Orchestration & Containers Lead

VMware Open Source Technology Center

 

From: Matt Matejczyk <mmate...@google.com>
Date: Thursday, June 13, 2019 at 2:35 AM
To: Tim Pepper <tpe...@vmware.com>
Cc: Christoph Blecker <cble...@gmail.com>, Davanum Srinivas <dav...@gmail.com>, Jordan Liggitt <lig...@google.com>, Stephen Augustus <Ste...@agst.us>, Maciej Borsz <macie...@google.com>, kubernetes-sig-release <kubernetes-...@googlegroups.com>, "kubernetes...@googlegroups.com" <kubernetes...@googlegroups.com>, kubernetes-sig-testing <kubernetes-...@googlegroups.com>, Wojciech Tyczynski <woj...@google.com>, Andrzej Wasylkowski <wasyl...@google.com>
Subject: Re: [kubernetes-sig-testing] Re: Feature request: Revert-only OWNERS in kubernetes/kubernetes?

 

I also don't agree with the statements that reverts should be treated as any other PRs. When you submit a change that breaks something it's usually a good practice is to rollback it immediately and then investigate. From my experience, majority of rollbacks are safe.

Tim Hockin

unread,
Jun 14, 2019, 6:58:32 PM6/14/19
to Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
Apparently my reply only went to sig-scale.

Hey all.  I read through the thread.  Here are my thoughts (as if you asked :)

SIG-Scalability is chartered with this authority, so they should be able to act on it.  I do not believe they have or would act irresponsibly, but like so many other things in our community, we'll be watching and adjust course if we need to.

I don't quite buy that revert PRs are exactly the same as other PRs.  In no way am I  advocating for carelessness or ignorance of context -- PRs should be reverted if and when it is "safe" to do so, and I trust sig-scalability to make that call, if they have to, without another top-level approver.  Now, if a PR fixes a release blocking bug but introduces a release blocking scalability issue - which state is better?  Both are release-blocking.  Context matters.

IMO a bot that approves sig-scale revert PRs is a good compromise that provides visibility and delegation.  It does not answer the "safety" question -- only a human can do that.

Until such time as a bot can be installed, I support granting top-level approval to a SMALL set of sig-scale people, with the caveat that it is for reverts and if they abuse it, there will be trouble.  Ideally it would be a couple people from different companies from different time-zones, and they would seek each other's approvals in a pinch.

Concretely, assume a PR that they want to roll back mid-day CET.  That's middle of night in US.  Person X from Google could make a PR and get approval from person Y at Red Hat (just as examples).  If they can not reach agreement that it is important, it has to wait until others can weigh in.

Let's give people the benefit of the doubt here - it's not like sig-scale is made up of a bunch of newcomers who don't understand the system.  They are tackling some of the hardest problems we have and deserve a modicum of trust.

> we wouldn't generally want to add approvers only for certain types of changes.

It's not without precedent.  We have at least one commented case in k/k/OWNERS.

> We could possibly provide a mechanism to hold on 5k jobs

Yes, but this is a mitigation, not a solution.  A Big Red Button on scale tests seems like a nice idea, especially as the bill shifts to CNCF.


Tim

You received this message because you are subscribed to the Google Groups "steering" group.
To unsubscribe from this group and stop receiving emails from it, send an email to steering+u...@kubernetes.io.
To post to this group, send email to stee...@kubernetes.io.
Visit this group at https://groups.google.com/a/kubernetes.io/group/steering/.
To view this discussion on the web visit https://groups.google.com/a/kubernetes.io/d/msgid/steering/CAFQm5yQ9UX-_ZsE1Q6bjnzwJ6vCoSCdc333CQ%2BXDG9DVaQP%2BOg%40mail.gmail.com.

Brendan Burns

unread,
Jun 17, 2019, 11:50:40 AM6/17/19
to Stephen Augustus, Tim Hockin, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
I'm a little worried about the flow here. If we give sig-scalability the right to revert PRs we're going to make it very difficult for the original PR author to get the PR re-submitted.

Note, I'm not talking about obvious, smoking gun that PR needs performance improvements kind of PRs, but rather the subtle "we think it might be this PR" kind of PRs.

I know that in the past we have reverted PRs that we thought were the cause, only to find out that it wasn't the root cause. I worry that if we allow sig-scalability to be aggressive here, it will be challenging for the original author to "fix" their PR since they won't have the resources/experience to run the scale tests.

Ultimately, I trust the sig-scalability folks to be able to make the decisions here, but I want to make certain that they hold reverts to a _really_ high bar for proof before they enact a revert. Part of the value in having a neutral third-party observer for such reverts is that they can hear arguments from both sides before the revert is enacted.

Again, this is in the case of ambiguous PRs, not obvious smoking guns.

--brendan



From: 'Tim Hockin' via steering <stee...@kubernetes.io>
Sent: Friday, June 14, 2019 3:58 PM
To: Stephen Augustus
Cc: Maciej Borsz; Davanum Srinivas; steering; kubernetes-sig-architecture; Jordan Liggitt; kubernetes...@googlegroups.com; kubernetes-sig-testing; kubernetes-sig-release; Wojciech Tyczynski
Subject: Re: [k8s-steering] Re: Feature request: Revert-only OWNERS in kubernetes/kubernetes?
 

Tim Hockin

unread,
Jun 17, 2019, 12:28:30 PM6/17/19
to Brendan Burns, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
I totally agree, of course.  I assume that is the intention and I have no reason to think otherwise.  Great power, yadda yadda ...

Tim Hockin

unread,
Jun 18, 2019, 2:25:01 PM6/18/19
to Aaron Crickenberger, Brendan Burns, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
On Tue, Jun 18, 2019 at 9:31 AM Aaron Crickenberger <spi...@google.com> wrote:
>
> I have no binding say in this since I'm not in kubernetes/kubernetes/OWNERS, but I just want to put it out there, since I tried to state this before, and feel like it was echoed by others.
>
> Why is it our responsibility to make a shortcut for this one SIG, and not their responsibility to have trained someone up through the same reviewer and approver path that everyone else has to follow to accomplish what they need? If we're making exceptions for this SIG, what else are we going to make exceptions for?

The exception is that this is what they are chartered for. Drawing on
experience, the ability to allow relatively "fresh" changes to be
reverted by less-authorized users has been very useful inside Google
(where, admittedly, the monorepo helps).

> A thing sig-scalability can do, if they need to change what their jobs run off of in a timely manner so urgent that they cannot wait for an existing approver, is to run their jobs off of forks or branches under their control. Changes to what their jobs run off of are completely under their approval (https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-scalability/OWNERS).

This is not a bad idea.

> I don't think we're very good at the "trust but verify" part that compliments "great responsibility". To use the prior art Tim mentions as an example, https://github.com/kubernetes/community/pull/598 is closed, pointing to https://github.com/kubernetes/kubectl/pull/179, which is also closed and points nowhere. Do we still need someone in root OWNERS for this?
>
> - aaron
>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
>> To post to this group, send email to kubernetes-si...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_Rewa2OvDBFMpuMqozny0jYhW5HeWoFiZOhqm243hn6dQCKg%40mail.gmail.com.

Aaron Crickenberger

unread,
Jun 19, 2019, 4:30:25 PM6/19/19
to Tim Hockin, Brendan Burns, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
I have no binding say in this since I'm not in kubernetes/kubernetes/OWNERS, but I just want to put it out there, since I tried to state this before, and feel like it was echoed by others.

Why is it our responsibility to make a shortcut for this one SIG, and not their responsibility to have trained someone up through the same reviewer and approver path that everyone else has to follow to accomplish what they need?  If we're making exceptions for this SIG, what else are we going to make exceptions for?

A thing sig-scalability can do, if they need to change what their jobs run off of in a timely manner so urgent that they cannot wait for an existing approver, is to run their jobs off of forks or branches under their control.  Changes to what their jobs run off of are completely under their approval (https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-scalability/OWNERS).

I don't think we're very good at the "trust but verify" part that compliments "great responsibility".  To use the prior art Tim mentions as an example, https://github.com/kubernetes/community/pull/598 is closed, pointing to https://github.com/kubernetes/kubectl/pull/179, which is also closed and points nowhere.  Do we still need someone in root OWNERS for this?

- aaron

On Mon, Jun 17, 2019 at 9:28 AM 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To post to this group, send email to kubernetes-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_Rewa2OvDBFMpuMqozny0jYhW5HeWoFiZOhqm243hn6dQCKg%40mail.gmail.com.

Brendan Burns

unread,
Jun 19, 2019, 11:59:30 PM6/19/19
to Tim Hockin, Aaron Crickenberger, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release, Wojciech Tyczynski
I think Aaron captures my concerns quite succinctly.

I think that the "prove it in your branch, then argue for revert" approach seems like a really good way to get the right options for people...

--brendan



From: Tim Hockin <tho...@google.com>
Sent: Tuesday, June 18, 2019 11:24 AM
To: Aaron Crickenberger
Cc: Brendan Burns; Stephen Augustus; Maciej Borsz; Davanum Srinivas; steering; kubernetes-sig-architecture; Jordan Liggitt; kubernetes...@googlegroups.com; kubernetes-sig-testing; kubernetes-sig-release; Wojciech Tyczynski

Subject: Re: [k8s-steering] Re: Feature request: Revert-only OWNERS in kubernetes/kubernetes?
On Tue, Jun 18, 2019 at 9:31 AM Aaron Crickenberger <spi...@google.com> wrote:
>
> I have no binding say in this since I'm not in kubernetes/kubernetes/OWNERS, but I just want to put it out there, since I tried to state this before, and feel like it was echoed by others.
>
> Why is it our responsibility to make a shortcut for this one SIG, and not their responsibility to have trained someone up through the same reviewer and approver path that everyone else has to follow to accomplish what they need?  If we're making exceptions for this SIG, what else are we going to make exceptions for?

The exception is that this is what they are chartered for.  Drawing on
experience, the ability to allow relatively "fresh" changes to be
reverted by less-authorized users has been very useful inside Google
(where, admittedly, the monorepo helps).

> A thing sig-scalability can do, if they need to change what their jobs run off of in a timely manner so urgent that they cannot wait for an existing approver, is to run their jobs off of forks or branches under their control.  Changes to what their jobs run off of are completely under their approval (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Ftest-infra%2Fblob%2Fmaster%2Fconfig%2Fjobs%2Fkubernetes%2Fsig-scalability%2FOWNERS&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038440737&amp;sdata=Pn2Xvq9BNZV%2BWjxnJzYkG%2BpFuNZKYgKll0kW8w3rlGg%3D&amp;reserved=0).


This is not a bad idea.

> I don't think we're very good at the "trust but verify" part that compliments "great responsibility".  To use the prior art Tim mentions as an example, https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Fcommunity%2Fpull%2F598&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038440737&amp;sdata=hIZNnsnCojhtVOiDwtg3KBrU13aYopvRAdkZhQt3jBc%3D&amp;reserved=0 is closed, pointing to https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Fkubectl%2Fpull%2F179&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038440737&amp;sdata=cQNQL0DpUuuek04iN6%2BZSGBDQJKU4%2FM5BIX2DQiJoAo%3D&amp;reserved=0, which is also closed and points nowhere.  Do we still need someone in root OWNERS for this?

>>>
>>> On Wed, Jun 12, 2019 at 6:04 AM Davanum Srinivas <dav...@gmail.com> wrote:
>>>
>>> I am in US East TZ. I can volunteer for helping with approving reverts as well.
>>>
>>> -- Dims
>>>
>>> On Wed, Jun 12, 2019 at 4:39 AM 'Maciej Borsz' via kubernetes-sig-scale <kubernetes...@googlegroups.com> wrote:
>>>

>>>
>>> If revert-only OWNERS is infeasible, I think that just another (next to Wojtek) Europe time zone approver will help a lot.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2019 at 10:31 AM Maciej Borsz <macie...@google.com> wrote:
>>>
>>>
>>>
>>> On Mon, Jun 10, 2019 at 3:21 PM Stephen Augustus <Ste...@agst.us> wrote:
>>>
>>> This goes back to the reviewers/approvers pools for SIG Scalability subprojects. The SIG needs to work on increasing it to more than just a few people for 1.16.
>>>
>>> Sorry, I don't understand. The 'sig scalability subprojects' page you mentioned earlier shows OWNERS only for sig-scalability-owned code (like kubemark's code). This is not an issue I meant.
>>>
>>> The problem is not about reverting changes to sig-scalability-owned code. The issue is that sometimes we need to revert changes to kubernetes/kubernetes repo as we find that they are breaking scalability tests (and more importantly the scalability of the kubernetes overall).
>>>
>>> The reason why we (sig-scalability) need to manually revert PRs is that presubmits covers only basic 100 node cluster while some issues can be detected only in 5000-node scale. We are running a daily tests 5000-node tests and when we see that it's failing, when we find the culprit we simply want to revert this as fast as possible to make sure there is no other regression.
>>>
>>> My ask is only to make this process smoother in case when only person in sig-scalability with approval permissions (wojtekt) is unavailable for some reason. Revert-only OWNERS seemed to be a good idea to me, but we can figure out some other solution here if this is infeasible.
>>>
>>>
>>> On Mon, Jun 10, 2019, 09:05 Maciej Borsz <macie...@google.com> wrote:
>>>
>>>
>>>
>>> On Mon, Jun 10, 2019 at 2:56 PM Jordan Liggitt <lig...@google.com> wrote:
>>>
>>> Have there been examples of urgent reverts needed that did not find an approver in O(hours)? I tend to agree with Stephen that we wouldn't generally want to add approvers only for certain types of changes.
>>>

>>> If we had to revert similar PR e.g. this week (when Wojtek is on vacation), we would have a problem finding an approver for it.
>>>
>>> I think that preferably I would like to have more than one person in the team that is able to approve such PRs, but if it's infeasible could you suggest some other European time zone approver we can ask for help in such cases?
>>>
>>> Thanks,
>>> Maciej
>>>
>>>
>>>
>>> On Mon, Jun 10, 2019 at 5:07 AM Stephen Augustus <Ste...@agst.us> wrote:
>>>
>>> (take two; I didn't realize that kubernetes...@googlegroups.com was the current address for the SIG. Also, I don't believe external users have permissions to post to gke-kubernetes-scalability, so sending that list to bcc.)
>>>
>>> On Mon, Jun 10, 2019 at 5:02 AM Stephen Augustus <Ste...@agst.us> wrote:
>>>
>>> (+ SIG Scalability, SIG Testing)
>>>
>>> AFAIK, that's not a thing we do today i.e., revert-only OWNERs. I've added SIG Testing to see if that's a thing they'd ever want to do.
>>> Committing and reverting code has similar potential impact, so I'd imagine we'd want to continue along the path of not weighting the two.
>>>
>>> I'm a big +1 to adding more reviewers/approvers to scalability OWNERs, but that's a question/task for SIG Scalability and their subprojects[1], so I've added them here.
>>>
>>> -- Stephen
>>>

>>>
>>> On Mon, Jun 10, 2019 at 2:52 AM 'Maciej Borsz' via kubernetes-sig-release <kubernetes-...@googlegroups.com> wrote:
>>>
>>> Hi all,
>>>
>>> In sig-scalability we relatively often have to manually revert PRs affecting gce scalability.
>>> So far, we used wojtekt's superpowers to approve such reverts, but he is a single person in our team that has such permissions (and e.g. this week he is on vacation).
>>>
>>> So I wanted to ask to add more folks from our team to OWNERS in kubernetes/kubernetes repo. Given limited knowledge of the codebase, I was thinking that maybe giving us approval power only for reverts makes sense. It would be sufficient for our work and limit the risk.
>>>
>>> Few questions:
>>> 1. What do you think?
>>> 2. (I'm not that familiar with github) Is it technically possible to distinguish revert PRs from other PRs?
>>>
>>> Thanks,
>>> Maciej
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-release" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-re...@googlegroups.com.
>>> To post to this group, send email to kubernetes-...@googlegroups.com.
>>> To view this discussion on the web visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-release%2FCA%252BbcoB5T9nZqezFZ1vJc5y8BaeC%252BxV77yZbumtSM4L000cnqfQ%2540mail.gmail.com&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=%2F%2BcCb6Lq6DH2rizVBezu0yn2Q74g6cILur%2FjCS5qngY%3D&amp;reserved=0.
>>> For more options, visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=%2Bp4GYIzt5viISKVxdqzXGFO2tIjVGkw%2FlaQlO4Y3XNo%3D&amp;reserved=0.

>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-release" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-re...@googlegroups.com.
>>> To post to this group, send email to kubernetes-...@googlegroups.com.

>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scale" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-s...@googlegroups.com.
>>> To post to this group, send email to kubernetes...@googlegroups.com.
>>> To view this discussion on the web visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-scale%2FCA%252BbcoB5i8TiDK%253D9n%252BM7kH3RK6r7KQWOAWZRbgwpqBOxgtt-4%252BA%2540mail.gmail.com&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=pKj8yMDsmho8UK57%2Frtb6%2FR33avw2yLBPmfAJeHehLA%3D&amp;reserved=0.
>>> For more options, visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=%2Bp4GYIzt5viISKVxdqzXGFO2tIjVGkw%2FlaQlO4Y3XNo%3D&amp;reserved=0.
>>>
>>>
>>>
>>> --
>>> Davanum Srinivas :: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fdims&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=43QFzP8qeiDX%2FSLHf5%2B%2FQUerlP5ozZuwzhjIMgEL0Dc%3D&amp;reserved=0

>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "steering" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to steering+u...@kubernetes.io.
>>> To post to this group, send email to stee...@kubernetes.io.
>>> Visit this group at https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fkubernetes.io%2Fgroup%2Fsteering%2F&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=v8Tqktvv%2Fn%2Bjb%2BEZyqNJ4bFyCpHa2F2vdwAyt3otlt0%3D&amp;reserved=0.
>>> To view this discussion on the web visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fkubernetes.io%2Fd%2Fmsgid%2Fsteering%2FCAFQm5yQ9UX-_ZsE1Q6bjnzwJ6vCoSCdc333CQ%252BXDG9DVaQP%252BOg%2540mail.gmail.com&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=rXU8s0hfA3Hvk7eBpHzylgOTiCRdjtiSI5%2BevsOFH%2Fc%3D&amp;reserved=0.

>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "steering" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to steering+u...@kubernetes.io.

>>
>> --
>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
>> To post to this group, send email to kubernetes-si...@googlegroups.com.
>> To view this discussion on the web visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-architecture%2FCAO_Rewa2OvDBFMpuMqozny0jYhW5HeWoFiZOhqm243hn6dQCKg%2540mail.gmail.com&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=95WRe0m%2FanSyDC5Ih%2FcKlhLIn6YP%2FWIlpGebIq3Jiww%3D&amp;reserved=0.
>> For more options, visit https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&amp;data=02%7C01%7Cbburns%40microsoft.com%7Cd138316ee10f43a39f6d08d6f41a46af%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964791038450732&amp;sdata=%2Bp4GYIzt5viISKVxdqzXGFO2tIjVGkw%2FlaQlO4Y3XNo%3D&amp;reserved=0.

Wojciech Tyczynski

unread,
Jun 25, 2019, 3:57:10 AM6/25/19
to Brendan Burns, Tim Hockin, Aaron Crickenberger, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release
  Thank you all for the discussion and sorry for jumping so late (3 weeks of vacation are always good thing).

  I went through the whole conversation and I think that there were couple orthogonal things mentioned in this thread,
which I'm going to respond below.

1. The power to revert for SIG Scalability.
 This is part of the charter that was community-approved. If we want to have the discussion about it again, let's have it,
but until that's resolved we still should have this right.
 One aspect of it is what and when can we revert and whether there are any follow-ups needs for it. Those are all valid
points and I agree we should probably clarify it in more details in charter. We should look into clarifying it in Q3.

2. Scalability reviewers/approvers
 This is not something that solves the problem. The PRs that require reverting are not the ones for things owned by our SIG
but for things that are owned by other SIGs. Note that scalability can be broken by any single change in any part of the system.
So I don't think this discussion is relevant to this topic (and it's not relevant to people outside of SIG scalability).

3. Adding "revert approvers".
 While I personally don't share the concerns here that the powers may be misused (technically when I'm around we could
already revert anything except from api changes, with me being top-level approver), I can probably understand that they
may appear. I want to emphasize Tim`s point that if bug fix is introducing scalability regression, release is blocked
with the PR fixing it and without it. So it doesn't really matter for the release if we revert it or not.

Brendan - to your point about reverts, do you have any example on your mind when we reverted a PR to check if it helps?
The only examples that I'm aware of were actually consulted with authors of those PRs so those weren't really like that.
All the PRs approved by SIG Scalability should have proved impact - if there are other example, I would like to be aware of them.

4. Other mitigations of the problem.
 There were couple of ideas, but i want to point to one that we weren't thinking about before. Which is dedicated branch
that Aaron suggested. I think this is a really great idea that I think we should try to explore in a bit more details. And
potentially experiment with.

 And there is one more that I think noone has been thinking before, which I actually think we should seriously consider.
Which is slight modifications of our test schedule. Currently our tests start at 10am CET. So we often want to revert
something before then to not loose the run if we find something by then. That requires someone with approvals right
around that time. What if we would move that by couple hours so that tests would start somewhere around noon "US East Time"?
That would still mean that tests would finish before European morning (where scalability-related folks are based)
but gives us possibility for approver from at least US East Time (thank you Dims!) to approve before next pass start?

5. Other issues
 You may expect one more email in the context of scalability-related retrospective and AIs in the next day or so from me.


 So just to summarize, based on the above discussion, what I would suggest proceeding with is:
1. Adjusting schedule to mitigate the requirement for quick approval (see point 4 above)
2. Explore a bit deeper the idea of dedicated branch and figure out how exactly it would work 
 (I actually think it can really be done to mitigate all the issues, but it requires a bit more thinking)
3. Clarify couple details in the charter about reverting.

 Does that sounds reasonable?

 thanks
wojtek

Brendan Burns

unread,
Jun 25, 2019, 8:55:19 AM6/25/19
to Wojciech Tyczynski, Tim Hockin, Aaron Crickenberger, Stephen Augustus, Maciej Borsz, Davanum Srinivas, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release
That sounds great to me. I will try to find examples, my concerns were based on my recollection (which could be old/dim)

And to be clear, none of this should be taken as an indication that we think people will do the wrong thing, just a question of baking different priorities.

From: Wojciech Tyczynski <woj...@google.com>
Sent: Tuesday, June 25, 2019 12:56:56 AM
To: Brendan Burns
Cc: Tim Hockin; Aaron Crickenberger; Stephen Augustus; Maciej Borsz; Davanum Srinivas; steering; kubernetes-sig-architecture; Jordan Liggitt; kubernetes...@googlegroups.com; kubernetes-sig-testing; kubernetes-sig-release

Davanum Srinivas

unread,
Jun 25, 2019, 1:46:26 PM6/25/19
to Brendan Burns, Wojciech Tyczynski, Tim Hockin, Aaron Crickenberger, Stephen Augustus, Maciej Borsz, steering, kubernetes-sig-architecture, Jordan Liggitt, kubernetes...@googlegroups.com, kubernetes-sig-testing, kubernetes-sig-release
I'd love to see them as well as i have been closely watching the scalability reverts and bisects especially during release crunch times.

thanks,
Dims
Reply all
Reply to author
Forward
0 new messages