COSI API review - Status `Conditions` discussion

20 views
Skip to first unread message

Blaine Gardner

unread,
Dec 11, 2025, 3:41:31 PM (5 days ago) Dec 11
to sig-s...@kubernetes.io, xingy...@gmail.com, ms...@google.com, tho...@google.com, Blaine Gardner, mateusz.u...@gmail.com, Joel Speed
Hi all!

The sig-storage Container Object Storage Interface project is getting an API review from Joel Speed who does broader k/k API reviews.

Here is COSI’s v1alpha2 KEP PR reference, for anyone who wants to look for more context: https://github.com/kubernetes/enhancements/pull/4599

One of the things Joel has suggested is for us is to use ‘Conditions’ in our status instead of the ‘ReadyToUse’ boolean and the timestamped error. However, I also understand that Conditions have been avoided in other sig-storage APIs in recent memory, so I’d like to open more discussion around this.

COSI has often used Volume Snapshotter as a reference when developing our API, which is where this boolean and this error originate. Xing reported that during development of the Volume Snapshotter API, there were discussions that included Michelle Au and Tim Hockin (CC’ed here) where Conditions were specifically avoided. If Michelle, Tim, or anyone has a memory or notes from those past discussions, it would be helpful for this conversation. Xing has noted that one aspect of the prior conversation was that the Conditions should avoid being used to implement a state machine.

My goal is to arrive at a consensus in sig-storage about whether or not it is a continued recommendation to avoid status Conditions in storage APIs, along with reasoning to support this since it is opposite of k/k’s recommendation.

Thank you all for you time and inputs on this topic!
Blaine

-----

I’ll separate my own thoughts on this topic here:

Based on my interactions with Joel, I believe it is in the better interest of COSI to take Joel’s feedback and include conditions in our APIs. The boolean would become a “Ready” condition, and the error would become a to-be-named error condition. 

For errors, I believe that Conditions will be helpful for COSI to distinguish between controller errors and sidecar errors when debugging BucketAccess, which is passed from one to the other during its lifetime.

For the “Ready” condition, I believe Conditions will be more helpful for understanding the system state. Something we have seen is that some object storage backends rotate bucket keys automatically with some frequency. A COSI reconcile failure after such a rotation would merit transitioning from ready to non-ready even after initial provisioning is successful. It might be beneficial for COSI to distinguish between initial provisioning success versus follow-up success. We have also observed that self-hosted object stores can change endpoints over their lifetimes. For example, an admin may update an HTTP store to use TLS, which requires updating the user-facing view to a new HTTPS endpoint. If COSI were to encounter an error during the reconcile that provides this update, it would be beneficial for admins to be made aware.

As a note against the “Ready” Condition, we will have to take care to clarify what “Ready” means, and we may want to be more specific than just “Ready”. It is straightforward for the COSI system to determine whether initial resource provisioning was successful or not. It is more complicated to accurately understand whether the resource is *currently* ​ ready when considering the cases I mentioned above where authentication or location info is changing. It seems useful to me to separate the ideas of “this succeeded once” from “this failed recently and may (or may not) affect availability.”

I think that the current API could support these cases, but I think it would be useful to rename the ‘ReadyToUse’ boolean to ‘InitialProvisionSuccessful’ or something similar. This would indicate that the resource exists, even when there are subsequent reconcile issues. The existing error type could provide some hint that a subsequent provision encountered an error that might (or might not) limit accessibility.

Any suggestions on how to adequately separate initial versus follow-up readiness would be very welcome.

Again, many thanks for your attention!
Blaine

Tim Hockin

unread,
Dec 11, 2025, 6:14:48 PM (5 days ago) Dec 11
to Blaine Gardner, sig-s...@kubernetes.io, xingy...@gmail.com, ms...@google.com, Blaine Gardner, mateusz.u...@gmail.com, Joel Speed
In general boolean fields are a terrible idea.  Conditions are better, though still imperfect in some ways.  I do not recall why we might have avoided Conditions, especially in favor of a naked bool.

"Ready" has been problematic over and over, in particular when we want to change or sharpen what it means, ESPECIALLY when multiple input signals add up to "being ready".  Gateway API invested in a family of conditions which are more precise.  Endpoints had to add new conditions which capture the more specific individual signals.

Tim
Reply all
Reply to author
Forward
0 new messages