[Action Required] Migrating CI Jobs to Community Infra / Default Cluster Turn-Down

537 views
Skip to first unread message

Benjamin Elder

unread,
Feb 27, 2024, 5:42:18 PMFeb 27
to dev, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io

Action Required: ALL CI jobs **MUST** be configured to use community build clusters and not the default / `cluster: default` by August 1st, 2024. After this date the old google.com Prow “build” cluster will be shut down to prepare for migrating the control plane to community infrastructure by early in the 1.32 release cycle following some time after 1.31 thaw.


Additionally, any remaining non-K8s-Project CI jobs have already been been subject to removal and will be removed or migrated to other instances not funded by SIG K8s Infra in the immediate future. This does not affect any Kubernetes projects and is already nearly complete.


We’ve migrated the majority of jobs already (shoutout to Ricky Sadowski!) and will continue to do so for any jobs we reasonably can. Jobs that depend on resources / secrets the community cannot provide SIG-K8s-Infra-provisioned replacements for will not be migrated. See the F.A.Q. below for additional context.


Thanks for helping us achieve a more sustainable, openly operated, future for the project! - Benjamin Elder on behalf of the SIG K8s Infra leads




F.A.Q.


Where can I ask questions?


Please reach out to SIG K8s Infra or SIG Testing (slack, mailing lists) with any questions.


How do I know if my Job has moved?


Your job is still running in google.com if it does not specify the `cluster:` field in the spec or specifies `cluster default`.


It has moved to community infra if it has one of:

`cluster: k8s-infra-prow-build`

`cluster: eks-prow-build-cluster`

`cluster: k8s-infra-prow-build-trusted`

`cluster: k8s-infra-kops-prow-build`


If you’re not sure, feel free to reach out to SIG Testing or SIG K8s Infra.


Will you move my job for me?


We’ve migrated the majority of jobs already (thanks again to Ricky Sadowski!) and will continue to work on this but not all jobs are readily moved due to depending on non-community resources. If your job depends on a resource SIG K8s Infra cannot supply, you will need to work out alternatives. That includes secrets for accounts not owned by the Kubernetes project / CNCF (e.g. Azure). We have been reaching out to leads / representatives for impacted groups / vendors for O(years) and cannot continue to block migration.


You can check the hack/cluster-migration tool in test-infra for a report on job migration.


Jobs for e.g. Google repos that are not part of Kubernetes that have not already been migrated will be removed in the immediate future.


Why are we making this change?


SIG K8s Infra and SIG Testing have been working to migrate the project’s CI (“prow”) out of google.com GCP projects and into SIG K8s Infra managed accounts and resources for multiple years now. 


While prow.k8s.io has historically been run solely by a team at Google on google.com internally owned and funded GCP projects and long predates SIG K8s Infra and community infra credits, the hope is that we can finish shifting the CI to community ownership by end of year. This will allow SIG K8s Infra to maintain access to the underlying infrastructure instead of only Google employees having direct access to critical project infrastructure. SIG K8s Infra managed resources resources have public budgets, configuration, access control, etc. as we have accomplished for other key infrastructure like registry.k8s.io


Previously we’ve had to halt due to budget issues but we have a strong handle on that now and we’ve been making steady progress. At this point our biggest blocker is the remaining dependency on legacy CI resources (build clusters, their secrets, etc) that can only be managed by the existing control plane running in google.com. When this dependency is eliminated we can migrate the CI to run in a community cluster.

Today, SIG K8s Infra has replacement build clusters that are wholly community owned, with open configuration, budget, spend, etc. These clusters contain community-provisioned secrets and services that allow running E2E jobs on GCP, AWS and more. 


The CNCF has provided the project through SIG K8s Infra with resources from GCP, AWS, Equinix, Digital Ocean, Fastly … all of which are in accounts owned by the CNCF+K8s Infra and have funding commitments and we continue to have more vendors come onboard (most recently Oracle).


Using resources provided through the CNCF allows us visibility into overuse and budget constraints to avoid creating dependencies outsized versus funding sources and continuity within the project on how things are provisioned and accessed rather than depending on individuals at specific employers. Not doing this has repeatedly caused major problems for the project in the past and will be avoided going forward.


For more information on the CNCF credits program, see: https://www.cncf.io/credits/


Benjamin Elder

unread,
Feb 27, 2024, 5:47:03 PMFeb 27
to dev, kubernetes-sig-testing, kubernetes-sig-k8s-infra, le...@kubernetes.io
An additional point of clarification:

Kubernetes release blocking jobs already require running on community infra by policy and both release and PR blocking jobs only run on community clusters enforced by test-infra presubmit.

Kubernetes PR and Release blocking jobs in particular are 100% not impacted. They were the first to migrate.
Reply all
Reply to author
Forward
0 new messages