Blaine Gardner
unread,Dec 18, 2025, 4:10:15 PM12/18/25Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Mateusz Urbanek, Tim Hockin, Hemant Kumar, Hemant Kumar, sig-s...@kubernetes.io, xingy...@gmail.com, ms...@google.com, Blaine Gardner, Joel Speed
After reviews and feedback from Joel and Tim that press for more thought into what statuses (especially “Ready”) mean, and a quick meeting with Hemant, I have arrived at a final draft for our conditions. I’ll summarize in the email here.
There are many more details in this GH issue for anyone who wants to dig in or get more info about what I’ve written below:
*
Detailed stakeholder needs identified in this comment:
https://github.com/kubernetes-sigs/container-object-storage-interface/issues/207#issuecomment-3667263662
*
The 4th draft corresponding to this email (and future discussion/iteration) occurs below the comment linked above
Hemant brought up a good point about having a clear status that represents readiness, even if it isn’t perfectly accurate to the backend state. He stressed that it's important for this status to be purely monotonic. From my understanding, in the PV/PVC world and Volume Snapshot world, once a resource is successfully provisioned once, it is counted as ready for users. A PV can transition from there to a failed state, but that failed state is final; there is no transitioning back to provisioned.
Hemant brought up cloud platform per-resource billing as a case where this monotonicity is critical. An unprovisioned resource is not billable. A provisioned resource is billable. A failed resource is no longer billable. But the resource should never flap between billable and unbillable states because this would make billing calculations substantially more difficult.
Because the “Ready” condition can be problematic, I’ve proposed that we use a “Provisioned” condition for this instead. Provisioned would be “Unknown” initially. It would transition to “True” after a successful, valid RPC provision. It can transition to “False” if COSI is able to confidently determine that provisioning is impossible or if a resource is definitively lost forever. Once “False”, this is final.
I also identified 2 other conditions that are important for helping end users and administrators troubleshoot issues, especially issues that come up after initial provisioning (day 2). These don’t strongly correlate to readiness in a systemic way, and they are not monotonic.
*
“ProvisionFailed” (negative polarity) reports issues with RPC provisioning calls. This could be as minor as a temporary DNS outage or as bad as a total backend system loss. Storing this status is important for debugging.
*
“ResourcesValidated” (positive polarity) reports when the resource and any referenced resources are in a good state where COSI is able to make RPC calls. Referenced resources are protected, but if they are forcefully removed, this will help debug why.
Any minor issues that come up during reconciliation (like kube API throttling or resource update conflicts) are not important to represent on conditions. That is my understanding, at least. They will be emitted as Events (and obviously in logs) to be helpful.
Hemant recommended representing a first-class “Provisioned” status as an enum in the status. After thinking on that more, this sounds similar to “Phase” which is recommended against in the Kube API Conventions. I think it may be best to plan to omit this, but if anyone recommends otherwise, please chime in.
I’ll begin to make KEP changes based on this (final?) draft. If this plan sounds like it has issues, please chime in here or in the linked issue.
Thanks again everyone for the input, and happy holidays, happy New Year, and happy PTO!
Blaine