Exception Request for KEP-5823: Pod-Level Checkpoint/Restore

23 views
Skip to first unread message

Radostin Stoyanov

unread,
Jun 17, 2026, 5:29:03 AM (5 days ago) Jun 17
to sig-...@kubernetes.io, releas...@kubernetes.io, sig-r...@kubernetes.io, kubernetes-...@googlegroups.com
Hi all,

We would like to request an exception for the Pod-Level Checkpoint/Restore KEP.

Enhancement name: Pod-Level Checkpoint/Restore
Enhancement status: Alpha
SIG: SIG-Node
Additional time needed (in calendar days, due end of day AoE): 3 days (AoE Time)

Reason this enhancement is critical for this milestone: Pod-level checkpoint/restore provides a foundation for reducing the cold-start time of AI inference workloads and enables fault tolerance for long-running jobs (e.g., model training) via periodic checkpoints. For example, distributed inference frameworks like NVIDIA Dynamo [1, 2] are already building out-of-tree workarounds to support this functionality. Capturing an off-by-default alpha in 1.37 establishes an in-tree mechanism that the ecosystem can converge on instead of each project implementing incompatible out-of-tree solutions.

Risks from adding code late: Low. This is an enhancement-freeze exception for the KEP. The implementation will be reviewed within the code-freeze window.

Risks from cutting enhancement: As multiple projects are building out of tree solutions, delaying the Kubernetes development adds risk of ecosystem divergence and increases the maintenance and migration burden for these users. This delays the roadmap of the entire working group and many other KEPs that will begin building on Pod-level Checkpoint/Restore. It also limits the integration with the existing and upcoming APIs (e.g.,  Dynamic Resource Allocation for GPU/device checkpointing).


Many thanks,
Radostin

Dawn Chen

unread,
Jun 17, 2026, 11:17:04 AM (5 days ago) Jun 17
to Radostin Stoyanov, sig-...@kubernetes.io, releas...@kubernetes.io, sig-r...@kubernetes.io, kubernetes-...@googlegroups.com
Approved the request from SIG Node perspective.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-release" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-re...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubernetes-sig-release/CAO6K1%2BMBKnNAcF9ppHDOvkchMGoWJ4zTHNZTQzATx2B2Cxcwdg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages