Re: [cncf-tag-runtime] Standardization of the OOM kill communication between container runtime and kubelet

408 views
Skip to first unread message

Michał Woźniak

unread,
Oct 3, 2022, 11:41:22 AM10/3/22
to mw21...@gmail.com, wg-b...@kubernetes.io, kubernete...@googlegroups.com, kubernete...@googlegroups.com, cncf-tag...@lists.cncf.io

pon., 3 paź 2022 o 17:26 Michał Woźniak via lists.cncf.io <mw219725=gmai...@lists.cncf.io> napisał(a):

Hi folks, we would like to start a discussion on standardization of the API of communicating the OOM killer events between container runtime and Kubelet. 


The motivation is from the KEP we are working on in Kubernetes: “Retriable and non-retriable Pod failures for Jobs“ (https://github.com/kubernetes/enhancements/issues/3329). In particular, for Beta, we are going to add a Pod condition “ResourceExhausted” to the Pod status whenever a pod’s container is killed by the OOM killer.


Currently, the leading implementations of the CRI set the container’s reason field to OOMKilled:


However, there are two issues with the status quo:

  1. The communication between container runtime and Kubelet is not standardized, leaving the reason field an unrestricted CamelCase string (see CRI API: https://github.com/kubernetes/cri-api/blob/3af67d6e7a5160e066444dac2a62e6218d67066b/pkg/apis/runtime/v1/api.proto#L1147).

  2. There is no way to determine from within the Kubelet if the container was OOM killed due to exceeding its configured limits or due to the system running low on memory.


We suggest the following solutions (up for discussion):

For the first issue, we suggest extending the documentation of the CRI API field “reason”, to say that the reason field should be set to “OOMKilled” if the container is killed due to “OOM killer”. This way we would ensure (in a backwards-compatible way) that the systems which recognize the OOM kill events by observing the reason field equal to OOMKill would not break in the future.


For the second issue we suggest either:

  1. extend the CRI API documentation of the message field to make sure the implementations communicate the information via message in a standard way; 

  2. Introduce a new dedicated field, such as “oom_reason”.

  3. standardize the OOMKill as a prefix for the OOM kill reasons. In this approach we would introduce a pair of new reasons: OOMKilledNamespaceMemoryExceeded and OOMKilledMemoryPressure. However, this might be risky as the current implementations of containerd and CRI-O have 5 years so many systems may already depend on the field being equal to “OOMKill”.


While both issues are related and important from the perspective of our work we could also consider decoupling them, as the first issue of standardization has a higher priority and should be just about freezing the status quo. Fixing the second issue may involve substantial work on the side of the container runtime implementations to convey the information.


Please advise how we should proceed with the effort to agree on the solution and proceed with the implementation.

 
Thanks,
Michał
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#99) | Reply To Sender | Reply To Group | Mute This Topic | New Topic
Your Subscription | Contact Group Owner | Unsubscribe [mw21...@gmail.com]

_._,_._,_

Peter Hunt

unread,
Oct 3, 2022, 12:13:25 PM10/3/22
to Michał Woźniak, wg-b...@kubernetes.io, kubernete...@googlegroups.com, kubernete...@googlegroups.com, cncf-tag...@lists.cncf.io
Hey Mical,

CRI-O maintainer here. We probably should talk about this in either a sig-node call or a tag-runtime call. I'm open to either.

For initial thoughts: I definitely agree on standardizing the variable each implementation uses. I think we could use some discussion on the precise heuristic to determine which OOM reason is relevant. Also, I think talking about early oom killers (like oomd) and how they may help could be useful too.

Thanks
Peter

Reply all
Reply to author
Forward
0 new messages