Following an attempt to improve the semantics of the release tooling via shellcheck (
https://github.com/kubernetes/release/issues/726), we found that we were unable to stage releases.
Multiple fixes were merged in an attempt to bring us to a usable state.
An unintended and unexpected side effect of this was a cascading failure of multiple release-blocking jobs. A few for example:
-
https://github.com/kubernetes/kubernetes/issues/79652-
https://github.com/kubernetes/kubernetes/issues/79668-
https://github.com/kubernetes/kubernetes/issues/79669Ultimately, it was decided that the right course of action was to revert back to a known good state in the repo (
https://github.com/kubernetes/release/pull/814) to stop the bleeding.
This implies that, in our current state, it is inadvisable to make any changes to the tooling in this repo.
As such, I'm advising the following course of action (h/t to @nikhita, @liggitt, and @BenTheElder for being a sounding board):
- [ ] (
https://github.com/kubernetes/test-infra/pull/13328) Add a blockade for files that have the potential to impact releasing and CI signal
(this will require repo admins to explicitly approve and override the blockade to merge changes to critical tooling)
- [ ] Examine and document exactly why these release-blocking jobs failed
(they are using **_something_** in k/release; we need to figure out what those somethings are)
- [ ] Tag the repo after executing a successful release of Kubernetes
(this locks in a known good state of k/release that doesn't need to be `master`)
- [ ] Refactor release tooling/jobs that depend on tooling to accept pulling a tag of k/release instead of `master`
At this point, we will have gotten to a place where we can safely make changes to k/release without impacting CI. We will then:
- [ ] Write tests around the specific pieces of the tooling that caused job failure (maybe
https://github.com/sstephenson/bats ?)
- [ ] Setup a presubmit job that can emulate one of the existing jobs that broke recently
For longer term goals, we should seek to:
- [ ] Write go tooling (and tests!) to replace the shell libraries (`lib/{common,gitlib,releaselib}`) and call these new tools in the existing release tooling
(this allows us to get some immediate benefit of a more robust language w/o having to completely refactor)
- [ ] Full refactor of existing tools (shell --> go)
(Some historical references:
https://github.com/kubernetes/kubernetes/pull/28922,
https://github.com/kubernetes/kubernetes/issues/16529,
https://github.com/kubernetes/kubernetes/issues/15560,
https://github.com/kubernetes/kubernetes/issues/8686)
Please take this an initial assessment of the situation and feel free to provide feedback. :)