Container runtimes and Kubernetes

551 views
Skip to first unread message

Sergey Kanzhelev

unread,
Oct 8, 2024, 3:48:51 AM10/8/24
to kubernetes-sig-architecture, kubernetes-sig-node

Hi,


This is a follow up from discussions we had before in sig node and this comment: https://github.com/kubernetes/enhancements/pull/4895#discussion_r1791268767


The main question is whether we should continue promoting KEPs which have container runtime dependencies and what can be a good policy for k8s in terms of CRI APIs real-world validation. Before 2.0, we always had at least CRI-O and Containerd support, which was covering a majority of use cases and deployments. The question is whether we can assume 2.0 will happen before 1.32 and whether we should add a buffer for Containerd 2.0 adoption after it is released. I will add this as a topic for the SIG Node meeting.


Last couple of k8s releases we have operated on the assumption that Containerd 2.0 is right around the corner. However it is still not released. In fact, it has 3 items officially tracked as release blocking and 13 open items in the milestone. Moreover, once released, we will not have good feedback from production environments while people will be adopting the 2.0, which has many breaking changes.


I looked over a few recent KEPs to see where we are.


We have accumulated two KEPS in beta with only 2.0 support:


This release two KEPs that are targeting beta in 1.32 are:


Moreover we have alpha KEPs that are not supported on Containerd for a long time:


And any new KEP that will need CRI API change, will only be supported on Containerd 2.0 based on a current Containerd policy.


/Sergey


Davanum Srinivas

unread,
Oct 8, 2024, 6:57:04 AM10/8/24
to Sergey Kanzhelev, kubernetes-sig-architecture, kubernetes-sig-node
Sergey,

Thanks for bringing this up! We are on RC5 for containerd 2.0 and the last thing that needs to land is the following:
https://github.com/containerd/errdefs/pull/21

Containerd has a tradition of releasing during kubecon, so currently unless something else pops up, i'd expect the same for 2.0 as well. Regarding breaking changes, there is definitely an effort to document them, so any help there from folks here would be appreciated.

I think as long as k8s works with 1.7.x (and corresponding runc) degrading gracefully when the newer CRI api is not present, we should be ok. For future KEPs we can be more stringent during the review process itself. The alpha KEPs are definitely worrisome if there is no work going on ... on the containerd side. We should probably be more conditional on work being done in both runtimes going forward.

thanks,
Dims

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CA%2Bsr0%2BAS%3DJau3hOxoz4OLWAxB8Mpc82N-k9RgFix4MhAMU%3DTDQ%40mail.gmail.com.


--
Davanum Srinivas :: https://twitter.com/dims

Kevin Hannon

unread,
Oct 8, 2024, 12:05:22 PM10/8/24
to kubernetes-sig-architecture
Both Forensic checkpointing (KEP-2008) and Split Image Filesystem (KEP-4191) are also in beta. It is my understanding that these features won't graduate to stable until containerd has support for them.Forensic checkpoint is in a good state as there is a PR up. For split filesystem, it was marked as a v2.1 feature.

Tim Hockin

unread,
Oct 8, 2024, 5:16:34 PM10/8/24
to Kevin Hannon, kubernetes-sig-architecture
My take is that we (k8s) should not progress any CRI-dependent KEP
beyond Alpha, unless we have a majority of CRI implementations
supporting it (non-beta). It's tough because even if containerd 2.0
dropped today, it has ~zero adoption, so we would have an
on-by-default feature that almost nobody can possibly use, which seems
like a recipe for sadness and bug reports.

I'd advocate for any KEP which needs CRI changes to sit in Alpha until
containerd and CRI-O both support it, in GA releases, and those
releases have some level of adoption (not sure how to quantify).
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/420a2555-898d-46fe-89f8-9061ce9bc37fn%40googlegroups.com.

Antonio Ojea

unread,
Oct 8, 2024, 5:32:16 PM10/8/24
to Tim Hockin, Kevin Hannon, kubernetes-sig-architecture
> I think now that we have both the RuntimeHandlerFeatures and RuntimeFeatures in the CRI spec, we can decouple kubernetes features from runtime support. We can have features on by default in kube that aren't actually usable and the kubelet can gracefully detect and report that. We even could have example scheduler plugins that read these values from the node object and fail to schedule them.

That will make applications not portable and create fragmentation, I'm
don't think it is a good idea to have features by default that do not
work anywhere, and it is a bad idea to put the responsibility on users
to debug why things don't work.

IMHO Kubernetes should be a consistent platform where users deploy
things in yaml and things just work, IMHO portability of applications
is the success of the project

On Tue, 8 Oct 2024 at 23:16, 'Tim Hockin' via
kubernetes-sig-architecture
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewZL4gYdnDX2sCo-VqAqS%3DX05g0Uuy7yC5NUi3aD7G6aMQ%40mail.gmail.com.

Paco Xu

unread,
Oct 8, 2024, 11:41:42 PM10/8/24
to Antonio Ojea, Tim Hockin, Kevin Hannon, kubernetes-sig-architecture
> My take is that we (k8s) should not progress any CRI-dependent KEP
beyond Alpha, unless we have a majority of CRI implementations
supporting it (non-beta).  

In most cases, I prefer such a rule, unless the KEP includes a fallback mechanism that doesn't introduce security risks or losses, similar to https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri#kubelet. 
`If the runtime does not provide information about the cgroup driver, then kubelet will fall back to using its own configuration`



--
一切皆有可能!

Antonio Ojea

unread,
Oct 11, 2024, 5:19:55 AM10/11/24
to kubernetes-sig-architecture
Let me share an example of this problem from one user that opened an issue recently in kind  related to VolumeSource: OCI Artifact and/or Image #4639 .

Please read the issue, ImageVolume support incomplete? kind#3745 , it perfectly illustrates the problem of how bad is the experience and what fragmentation we'll introduce in the ecosystem.

As we discussed in the other thread What if we eliminate the beta stage for new features? and using Tim's categorization, this specific KEP has the problem that it falls in category  2,

> 2) Extensions on existing APIs, which add one or more fields to one or more GA api group

That means, if the feature gate is beta and disabled by default, those fields in the Pod.Spec are considered: alpha, beta or GA?

I think that is important from an architectural point of view to get a consensus so all SIGs apply these decisions consistently.

For these kind of features that can cause fragmentation and have user facing changes on APIs that are already GA do we make them:

1. alpha until there is a reasonable adoption
2. beta disabled by default until there is reasonable adoption
3. ... 

I vote for 1. as it seems the safest




Patrick Ohly

unread,
Oct 13, 2024, 1:37:25 PM10/13/24
to Antonio Ojea, kubernetes-sig-architecture
Antonio Ojea <antonio.o...@gmail.com> writes:
> As we discussed in the other thread What if we eliminate the beta stage for
> new features?
> <https://groups.google.com/g/kubernetes-sig-architecture/c/VPGTVa6m95M/m/lqAK_AmMBAAJ> and
> using Tim's categorization, this specific KEP has the problem that it falls
> in category 2,
>
>> 2) Extensions on existing APIs, which add one or more fields to one or
> more GA api group
>
> That means, if the feature gate is beta and disabled by default, those
> fields in the Pod.Spec are considered: alpha, beta or GA?
>
> I think that is important from an architectural point of view to get a
> consensus so all SIGs apply these decisions consistently.
>
> For these kind of features that can cause fragmentation and have user
> facing changes on APIs that are already GA do we make them:
>
> 1. alpha until there is a reasonable adoption
> 2. beta disabled by default until there is reasonable adoption
> 3. ...
>
> I vote for 1. as it seems the safest

And how is a feature supposed to obtain "reasonable adoption" when it's
alpha and no-one is supposed to depend on it? That limits its adoption
to experiments or users who absolutely (desperately?) need it, which
probably is not what is meant with "reasonable adoption" - a classic
catch-22.

We would need to distinguish between "alpha, and we really mean it and
will probably still make breaking API changes" and "alpha, stable,
complete and no changes planned, please start adopting it". The second
then might as well just be called beta.

--
Best Regards

Patrick Ohly
Cloud Software Architect

Tim Hockin

unread,
Oct 13, 2024, 6:19:30 PM10/13/24
to Patrick Ohly, Antonio Ojea, kubernetes-sig-architecture
For things that manifest as APIs in k/k and implementations elsewhere,
we DEPEND on the implementations to provide feedback. I think
"adoption" here means "by implementors" not end users.
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/yrjh4j5fucoj.fsf%40pohly-mobl1.fritz.box.

Tim Bannister

unread,
Oct 14, 2024, 7:10:19 AM10/14/24
to kubernetes-sig-architecture
The IETF looks for at least two independent implementations of a thing before moving any RFCs to an internet standard. The two implementations can be open or proprietary but shouldn't have a common codebase / heritage.
I think that's a good approach; we shouldn't specify what the names of those implementations are or how popular they are.

When a new or changed behavior depends on an external component, we shouldn't enable the new thing by default until there are two external implementations that have their support for it available in a public release.
We should also be cautious about moving to beta, but the crucial point is about when we change default behavior.

I like the idea of bumping the tooling default (eg, what you get from kubeadm) a minor release ahead of changing the component default (eg, how kube-apiserver behaves unless overridden). We should decide which of those 2 promotion points is gated on having at least n external implementations.


Tim Bannister
Senior lead consultant
The Scale Factory

Sergey Kanzhelev

unread,
Jan 8, 2025, 2:54:08 PMJan 8
to kubernetes-sig-node, kubernetes-sig-architecture
Hi,

Picking up this thread before the 1.33 release opened, here is the document we discussed yesterday at SIG Node that describes how we want to approach feature development that requires container runtime changes: https://docs.google.com/document/d/1y42XrUPrm-6DZby1RQjexYYoNn822IRR6igsOiy_62c/edit?tab=t.0

I also added it as an agenda topic for tomorrow's SIG Architecture meeting in case there will be some in-person feedback. 

/Sergey

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Sergey Kanzhelev

unread,
Jan 15, 2025, 5:20:52 PMJan 15
to kubernetes-sig-node, kubernetes-sig-architecture
Hi,


/Sergey
Reply all
Reply to author
Forward
0 new messages