API review request: KEP-5823 Pod-level Checkpoint/Restore

5 views
Skip to first unread message

Radostin Stoyanov

unread,
Apr 23, 2026, 11:52:53 AMApr 23
to kubernetes-a...@googlegroups.com, wg-checkpo...@kubernetes.io, Adrian Reber, andrey.ve...@gmail.com, peh...@redhat.com, lig...@google.com, timal...@gmail.com
Hello API reviewers,

At the Checkpoint/Restore Working Group we have been working on KEP-5823 [1] introducing new APIs for Pod-level Checkpoint/Restore. After a meeting with SIG API Machinery on April 15, 2026, following a prior discussion on March 4, 2026, we were directed here to discuss our proposal. Based on the guidance from API Machinery, we define a PodCheckpoint as a separate namespace-scoped object modeled after VolumeSnapshot, with its own lifecycle. We have achieved consensus in this direction. The open question that remains is the restore mechanism. We are currently evaluating the following options:
  • Option A: Using a "spec.restoreFrom" field on Pod that is handled by the kubelet during SyncPod, which calls "restorePodSandbox()" instead of "createPodSandbox()".
  • Option B: Using a PodRestore object that references a checkpoint, with the target Pod carrying a "restoreFrom" field as a reference. A restore controller verifies the checkpoint, creates a placeholder Pod, then calls the kubelet restore endpoint.
The VolumeSnapshot precedent uses a "dataSource" field, which supports Option A. However, the original rationale for a separate object was to separate RBAC for "create pod" and "restore from checkpoint". We would like input on the following questions:
  1. Is it acceptable to add another case of a restoreFrom sub-field on Pod whose validation differs from the rest of the object? (i.e., choosing between Option A and Option B)
  2. Is there a precedent or recommended mechanism for restricting who can set a specific field in the Pod spec (admission webhook, ValidatingAdmissionPolicy, sub-resource)? Since checkpoints can contain sensitive data, we expect that users would want to use different permissions for "restoring from checkpoint" and "creating Pod".
  3. Should "restoreFrom" be immutable? Immutable field simplifies the semantics, but mutable field preserves the option of in-place rollback later.
We are targeting 1.37 alpha, with code freeze at the end of May / early June. Given that timeline, we would like to request a slot on the agenda of the bi-weekly SIG Architecture meeting on April 30, 2026 to discuss this proposal. In the meantime, we would be happy to answer any questions about the proposed APIs either here on the list or in slack [2].

[1] https://github.com/kubernetes/enhancements/pull/5851
[2] https://kubernetes.slack.com/messages/wg-checkpoint-restore

Many thanks,
Radostin
Reply all
Reply to author
Forward
0 new messages