[Action Required] Migrating CI Jobs to Community Infra / Default Cluster Turn-Down

1,250 views
Skip to first unread message

Benjamin Elder

unread,
Feb 27, 2024, 5:42:18 PMFeb 27
to dev, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io

Action Required: ALL CI jobs **MUST** be configured to use community build clusters and not the default / `cluster: default` by August 1st, 2024. After this date the old google.com Prow “build” cluster will be shut down to prepare for migrating the control plane to community infrastructure by early in the 1.32 release cycle following some time after 1.31 thaw.


Additionally, any remaining non-K8s-Project CI jobs have already been been subject to removal and will be removed or migrated to other instances not funded by SIG K8s Infra in the immediate future. This does not affect any Kubernetes projects and is already nearly complete.


We’ve migrated the majority of jobs already (shoutout to Ricky Sadowski!) and will continue to do so for any jobs we reasonably can. Jobs that depend on resources / secrets the community cannot provide SIG-K8s-Infra-provisioned replacements for will not be migrated. See the F.A.Q. below for additional context.


Thanks for helping us achieve a more sustainable, openly operated, future for the project! - Benjamin Elder on behalf of the SIG K8s Infra leads




F.A.Q.


Where can I ask questions?


Please reach out to SIG K8s Infra or SIG Testing (slack, mailing lists) with any questions.


How do I know if my Job has moved?


Your job is still running in google.com if it does not specify the `cluster:` field in the spec or specifies `cluster default`.


It has moved to community infra if it has one of:

`cluster: k8s-infra-prow-build`

`cluster: eks-prow-build-cluster`

`cluster: k8s-infra-prow-build-trusted`

`cluster: k8s-infra-kops-prow-build`


If you’re not sure, feel free to reach out to SIG Testing or SIG K8s Infra.


Will you move my job for me?


We’ve migrated the majority of jobs already (thanks again to Ricky Sadowski!) and will continue to work on this but not all jobs are readily moved due to depending on non-community resources. If your job depends on a resource SIG K8s Infra cannot supply, you will need to work out alternatives. That includes secrets for accounts not owned by the Kubernetes project / CNCF (e.g. Azure). We have been reaching out to leads / representatives for impacted groups / vendors for O(years) and cannot continue to block migration.


You can check the hack/cluster-migration tool in test-infra for a report on job migration.


Jobs for e.g. Google repos that are not part of Kubernetes that have not already been migrated will be removed in the immediate future.


Why are we making this change?


SIG K8s Infra and SIG Testing have been working to migrate the project’s CI (“prow”) out of google.com GCP projects and into SIG K8s Infra managed accounts and resources for multiple years now. 


While prow.k8s.io has historically been run solely by a team at Google on google.com internally owned and funded GCP projects and long predates SIG K8s Infra and community infra credits, the hope is that we can finish shifting the CI to community ownership by end of year. This will allow SIG K8s Infra to maintain access to the underlying infrastructure instead of only Google employees having direct access to critical project infrastructure. SIG K8s Infra managed resources resources have public budgets, configuration, access control, etc. as we have accomplished for other key infrastructure like registry.k8s.io


Previously we’ve had to halt due to budget issues but we have a strong handle on that now and we’ve been making steady progress. At this point our biggest blocker is the remaining dependency on legacy CI resources (build clusters, their secrets, etc) that can only be managed by the existing control plane running in google.com. When this dependency is eliminated we can migrate the CI to run in a community cluster.

Today, SIG K8s Infra has replacement build clusters that are wholly community owned, with open configuration, budget, spend, etc. These clusters contain community-provisioned secrets and services that allow running E2E jobs on GCP, AWS and more. 


The CNCF has provided the project through SIG K8s Infra with resources from GCP, AWS, Equinix, Digital Ocean, Fastly … all of which are in accounts owned by the CNCF+K8s Infra and have funding commitments and we continue to have more vendors come onboard (most recently Oracle).


Using resources provided through the CNCF allows us visibility into overuse and budget constraints to avoid creating dependencies outsized versus funding sources and continuity within the project on how things are provisioned and accessed rather than depending on individuals at specific employers. Not doing this has repeatedly caused major problems for the project in the past and will be avoided going forward.


For more information on the CNCF credits program, see: https://www.cncf.io/credits/


Benjamin Elder

unread,
Feb 27, 2024, 5:47:03 PMFeb 27
to dev, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io
An additional point of clarification:

Kubernetes release blocking jobs already require running on community infra by policy and both release and PR blocking jobs only run on community clusters enforced by test-infra presubmit.

Kubernetes PR and Release blocking jobs in particular are 100% not impacted. They were the first to migrate.

Benjamin Elder

unread,
Jun 10, 2024, 3:52:16 PMJun 10
to dev, Benjamin Elder, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io
Reminder: All CI jobs must be migrated off of `cluster: default` by August 1st in order to unblock migrating the CI control plane to SIG K8s Infra.
Jobs that have not been migrated yet will be removed.

If your job is using:
- Digital Ocean: This is already CNCF funded and we're in touch with the CNCF to clear up this account and unblock migrating the jobs
- Azure: We're in touch with Microsoft and resources should be available soon
- Vsphere: We're in touch with VMware and resources may eventually be available 

In the meantime you may wish to consider using some of the other resources already available to the community instead (e.g. unit tests / integration tests / kind clusters, GCP, AWS, ...)

We've been helping where possible and most, but not all, CI jobs are migrated.

If you haven't yet: Please take a moment soon to check that CI jobs for your projects are migrated. If not, please help us prepare to shut down the default cluster.

Again, we're holding to this deadline because the CI migration _will_ be disruptive and we are aiming to do so early in the Kubernetes release cycle.
This migration is highly overdue and we don't intend to let it slip another release.

You can reach out to #sig-k8s-infra to discuss any questions, see also the FAQ earlier in this thread.

Thanks!
- Ben

Benjamin Elder

unread,
Jun 21, 2024, 3:06:55 PMJun 21
to dev, Benjamin Elder, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io
I'm periodically committing a snapshot of non-migrated jobs to a table here:

https://github.com/kubernetes/test-infra/blob/master/docs/job-migration-todo.md

Please take a look, thanks!

Benjamin Elder

unread,
Jul 23, 2024, 5:17:36 PMJul 23
to dev, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io
FYI: We have merged a test-infra change to restrict adding any new jobs that use the legacy clusters ahead of the August deadline.

We have made a ton of progress on these and job-migration-todo.md is dwindling, thanks to everyone who has pitched in here!

Thanks as well to everyone who joined in the migration plan review (announced to SIG K8s Infra and Testing) today in SIG Testing.
We're nearing ready to migrate the CI control plane and will update again soon with more specifics, still aiming for sometime shortly after 1.31 releases.
Reply all
Reply to author
Forward
0 new messages