tl;dr At the end of this week, some jobs may flake more often, see resources at [1] for how to identify and deal with flakes.
Early on in the project's history, we modified ginkgo to automatically retry a test case N number of times if it failed; thus was born the --ginkgo.flakeAttempts=2 flag passed around to many of our e2e jobs. This has allowed flakes to hide in some jobs, while causing merge-blocking or release-blocking pain in others.
Over the last few months we've slowly removed the flag's usage from all release-blocking jobs. Now that 1.17 has been released, it's time for us to rip the bandaid off and remove it from all remaining jobs that use it. If a test fails, the job will fail.
This will allow us to use the relatively quiet period of the next month to gather more data on which flakes are really impacting the project. For more discussion and details, see the issue [2]
The list of jobs impacted is listed in the issue [3]
The PR to remove this flag [4], will be merged by Friday December 13th unless there are strong objections
For those interested in helping out with flakes, or questions about what to do with flakes, please see [1]. It's time for us as a community to figure out how to more effectively deal with flakes, this is a first step to help us ascertain the scope of what we're dealing with.
- aaron