Hi folks, we would like to start a discussion on standardization of the API of communicating the OOM killer events between container runtime and Kubelet.
The motivation is from the KEP we are working on in Kubernetes: “Retriable and non-retriable Pod failures for Jobs“ (https://github.com/kubernetes/enhancements/issues/3329). In particular, for Beta, we are going to add a Pod condition “ResourceExhausted” to the Pod status whenever a pod’s container is killed by the OOM killer.
Currently, the leading implementations of the CRI set the container’s reason field to OOMKilled:
Containerd (see: https://github.com/containerd/containerd/blob/23f66ece59654ea431700576b6020baffe1a4e49/pkg/cri/server/events.go#L344 and https://github.com/containerd/containerd/blob/36d0cfd0fddb3f2ca4301533a7e7dcf6853dc92c/pkg/cri/server/helpers.go#L62)
However, there are two issues with the status quo:
The communication between container runtime and Kubelet is not standardized, leaving the reason field an unrestricted CamelCase string (see CRI API: https://github.com/kubernetes/cri-api/blob/3af67d6e7a5160e066444dac2a62e6218d67066b/pkg/apis/runtime/v1/api.proto#L1147).
There is no way to determine from within the Kubelet if the container was OOM killed due to exceeding its configured limits or due to the system running low on memory.
We suggest the following solutions (up for discussion):
For the first issue, we suggest extending the documentation of the CRI API field “reason”, to say that the reason field should be set to “OOMKilled” if the container is killed due to “OOM killer”. This way we would ensure (in a backwards-compatible way) that the systems which recognize the OOM kill events by observing the reason field equal to OOMKill would not break in the future.
For the second issue we suggest either:
extend the CRI API documentation of the message field to make sure the implementations communicate the information via message in a standard way;
Introduce a new dedicated field, such as “oom_reason”.
standardize the OOMKill as a prefix for the OOM kill reasons. In this approach we would introduce a pair of new reasons: OOMKilledNamespaceMemoryExceeded and OOMKilledMemoryPressure. However, this might be risky as the current implementations of containerd and CRI-O have 5 years so many systems may already depend on the field being equal to “OOMKill”.
While both issues are related and important from the perspective of our work we could also consider decoupling them, as the first issue of standardization has a higher priority and should be just about freezing the status quo. Fixing the second issue may involve substantial work on the side of the container runtime implementations to convey the information.
Please advise how we should proceed with the effort to agree on the solution and proceed with the implementation.Thanks,Michał_._,_._,_
You receive all messages sent to this group._._,_._,_