API: the `status` litmus test and RBAC

Tim Hockin

unread,

Feb 5, 2021, 5:03:28 PM2/5/21

to kubernetes-sig-architecture

Hi all,

We've always sort of had this litmus test for status fields "If I erased the status struct, would everything in it be reconstructed (or at least reconstructible) from observation?"

We don't always adhere to it, and sometimes we fudge it and say yes "in theory". Over the course of many API reviews I have come to wonder if this is really useful. TTBOMK, we don't ever wipe status, nor even have a mechanism to do so. We don't split resources to store status in a lower-SLO etcd or anything like that. It is, effectively, as durable as spec.

So what is the status stanza good for? First and foremost it delineates intention from result, or maybe user-owned from controller-owned. Secondly it is frequently an RBAC attachment point - users can set anything in spec, but not status, and controllers can set anything in status, but not spec.

Because of this, we can "trust" fields in status in a way that we can not trust spec. This is evident in the recent `externalIPs` CVE - spec fields are trusted FAR more than they should be, but very congruent status fields (load-balancer IP) were deemed a non-issue because most users can't write to status.

As a counter-example, PersistentVolume and PVCLaim logic could probably be much simpler if we had fields we could trust. Ditto Service.

So my proposal here is this - we should either:

1) Get rid of the "if it got erased" litmus test, embrace spec/status as a primary edge for splitting RBAC access to objects, and encourage a request-acknowledge pattern for APIs which allow (but don't require) the user to provide input that could be sensitive. I'm not proposing to go back and change existing APIs for no good reason, but to look for places that are artificially leaning on spec when they could instead use status.

2) Emphasize the litmus test, add a new top-level stanza to represent trusted-spec and to split RBAC, and encourage a request-acknowledge pattern for APIs which allow (but don't require) the user to provide input that could be sensitive.

Curious to hear your thoughts.

Tim

Derek Carr

unread,

Feb 5, 2021, 5:14:50 PM2/5/21

to Tim Hockin, kubernetes-sig-architecture

Our past attempts to support in place resource resizing would have benefited from this discussion.

https://github.com/kubernetes/enhancements/blob/c1cec820b3b3d0fa18dede73107a2cbb43e27e33/keps/sig-node/1287-in-place-update-pod-resources/README.md#api-changes

I think with this discussion, we could have moved the ResourcesAllocated block in the KEP under status and out of spec, and we had back-and-forth on if that was the right thing or not, and if kubelet itself should have done its own checkpointing.

In general, I agree that status is used as an RBAC boundary primarily.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewZY0bg7QO5F_PbiUXXXY1D0-RCtwgchky27_u-cpmHkDQ%40mail.gmail.com.

Tim Hockin

unread,

Feb 5, 2021, 5:18:47 PM2/5/21

to Derek Carr, kubernetes-sig-architecture

Derek,

That KEP was very much on my mind as I wrote this (since the discussion is just NOW wrapping up), but not that alone.

Tim

Bowei Du

unread,

Feb 5, 2021, 5:18:49 PM2/5/21

to Tim Hockin, kubernetes-sig-architecture

+1 to anything that would solve this issue.

We've had a lot of discussion as to where to store controller-based state and in the object itself is the most natural but there is no reasonable home.

Is there any controller implementation/test that we know of that actually tests that the API convention behavior is demonstrated by the controller?

Bowei

--

Daniel Smith

unread,

Feb 5, 2021, 5:21:40 PM2/5/21

to Bowei Du, Tim Hockin, kubernetes-sig-architecture

+1 to dropping this "rule". We already are doing so in the Job KEP: https://github.com/kubernetes/enhancements/pull/2308#discussion_r567809465

We never made use of the property and aren't going to so it doesn't make sense to keep it a hard rule rather than a guideline.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAGv8vBoJ%3DrOBLupouaj2bBj%2BXMscEzHSbfzZzG3ndOc%3Ds8bSvw%40mail.gmail.com.

Tim Hockin

unread,

Feb 5, 2021, 5:22:10 PM2/5/21

to Bowei Du, kubernetes-sig-architecture

If we want to embrace the litmus test, we should probably have a way to wipe status on an object and see what happens. :)

John Belamaric

unread,

Feb 5, 2021, 5:39:19 PM2/5/21

to Tim Hockin, Bowei Du, kubernetes-sig-architecture

I have one concern with this, and it's around the discipline actually following this rule forces on how the controllers utilize the underlying systems they manage. I mentioned this here: https://github.com/kubernetes/enhancements/pull/2308#discussion_r568933053 but to say it a different way:

If status must be reconstructable, then the system controlled by the controller must hold sufficient state to reconstruct that status. This prevents any resources from being leaked. My worry about relaxing this rule is that it could lead to situations where the status data drifts from the "reality" of the underlying resources. Essentially, if we do this then people will stuff pointers to resources in the status, instead of, say, putting a label on the underlying resource in the cloud. This leads to failure modes where we lose track of things we created, without a way to re-discover them.

Now...that's the theory. Does this rule *actually* force folks to do that? I'm not sure.

John

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewY_Gu5d7m_Jbtg52aRG10t6PA97_e3RX_JRYVcZ1ZhEkw%40mail.gmail.com.

Tim Allclair

unread,

Feb 5, 2021, 5:41:34 PM2/5/21

to Tim Hockin, Bowei Du, kubernetes-sig-architecture

How is the "status updates must go through the status subresources" invariant being enforced? If it's not centrally enforced in API machinery, I'm not confident enough that we get it right everywhere to depend on it for security (e.g. it looks like we clear status in a few `PrepareForUpdate` methods, but not in the node strategy). It's also very unintuitive that users should not be granted access to write status, and I'm sure they are being given that access in many cases (likely through wildcard RBAC rules).

I definitely like the idea of controller-owned fields protected by RBAC, but if that's the intent of status we have a lot of room for improvement.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewY_Gu5d7m_Jbtg52aRG10t6PA97_e3RX_JRYVcZ1ZhEkw%40mail.gmail.com.

Daniel Smith

unread,

Feb 5, 2021, 5:42:56 PM2/5/21

to Tim Allclair, Tim Hockin, Bowei Du, kubernetes-sig-architecture

It is not enforced at all.

(it is enforced that you can't touch status via the regular endpoint or spec via the status endpoint, but that's entirely a different thing)

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CALXpagybs8Cj2_khMFn_XYU77K%3DPGaNAcWbDJ1EMnNJbkRZY4Q%40mail.gmail.com.

Tim Allclair

unread,

Feb 5, 2021, 5:45:04 PM2/5/21

to Daniel Smith, Tim Hockin, Bowei Du, kubernetes-sig-architecture

it is enforced that you can't touch status via the regular endpoint ...

That's actually what I was asking about. How is that enforced?

Bowei Du

unread,

Feb 5, 2021, 5:50:04 PM2/5/21

to John Belamaric, Tim Hockin, kubernetes-sig-architecture

In my experience, the current status of this rule is purely aspirational (hence my question on whether people try to enforce this at all) -- more honored in the breach than the observance.

In the designs I've seen usually put persistent state into `status` as a first cut. Fighting the intuitive flow w/out hard enforcement is going to be tough.

Bowei

Daniel Smith

unread,

Feb 5, 2021, 5:53:10 PM2/5/21

to Tim Allclair, Tim Hockin, Bowei Du, kubernetes-sig-architecture

It's cleared in PrepareFor{Create|Update} methods.

FWIW I do not think spec vs status should be a security boundary, it was not intended initially to be so. The difference is "how I want it to be" vs "how it actually is".

If you want a strong security boundary, I recommend these interventions in this order:

* split concerns into separate objects

* make your own subresources for your particular audience

* webhook admission which uses the SSA fieldset concept to lock fields to users (? speculative ?)

* use spec/status but carefully audit roles / permissions at runtime

Eric Tune

unread,

Feb 5, 2021, 6:07:33 PM2/5/21

to Daniel Smith, Tim Allclair, Tim Hockin, Bowei Du, kubernetes-sig-architecture

+1 to removing this rule. I don't see how we could ever forget status, really, as it has never been exercises, so critical system will surely have unexpected behavior if we did start to forget status. It isn't clear how it would actually work to forget status and recreate it.

If the APIserver forgot it, and clients were able to get objects with empty status, before the controller could recreate it, then the clients might think that an object is not ready, or some other terminal condition has not been reached. Pods might all get removed from endpoints spontaneously. It would not be good.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAB_J3bZ-V7GQw35ifcXb4ZjuqQPCR7ckj-zuSGT7zz9XavKUcA%40mail.gmail.com.

Eric Tune

unread,

Feb 5, 2021, 6:09:18 PM2/5/21

to Daniel Smith, Tim Allclair, Tim Hockin, Bowei Du, kubernetes-sig-architecture

What might be useful is guidance on (a) indicating which fields of status are intended for users, and which are controller state, and (b) suggested rate and size limits for such controller state. In the Job KEP mentioned above, we had these questions.

Tim Hockin

unread,

Feb 5, 2021, 6:14:26 PM2/5/21

to John Belamaric, Bowei Du, kubernetes-sig-architecture

John, Awesome point. Sadly, I don't think it is having the result you want, despite the intention. I like the goal, and I am bit torn (hence my option 2). I am afraid option 2 feels impractical and we'd end up doing net more work and not realizing the vision anyway.

On Fri, Feb 5, 2021 at 2:39 PM John Belamaric <jbela...@google.com> wrote:

Tim Hockin

unread,

Feb 5, 2021, 6:18:29 PM2/5/21

to Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

On Fri, Feb 5, 2021 at 2:53 PM Daniel Smith <dbs...@google.com> wrote:

It's cleared in PrepareFor{Create|Update} methods.

FWIW I do not think spec vs status should be a security boundary, it was not intended initially to be so. The difference is "how I want it to be" vs "how it actually is".

I think that horse has left the barn. Most resources have a single controller or a small number of pretty well-trusted controllers. As such spec/status has a natural mapping. We have lots of such examples.

For those fields that might be more promiscuously updated (status.conditions!) we should probably have distinct sub-resources.

Clayton Coleman

unread,

Feb 5, 2021, 6:24:44 PM2/5/21

to Tim Hockin, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

On Fri, Feb 5, 2021 at 6:18 PM 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:

On Fri, Feb 5, 2021 at 2:53 PM Daniel Smith <dbs...@google.com> wrote:
It's cleared in PrepareFor{Create|Update} methods.

FWIW I do not think spec vs status should be a security boundary, it was not intended initially to be so. The difference is "how I want it to be" vs "how it actually is".

I think that horse has left the barn. Most resources have a single controller or a small number of pretty well-trusted controllers. As such spec/status has a natural mapping. We have lots of such examples.

For those fields that might be more promiscuously updated (status.conditions!) we should probably have distinct sub-resources.

Should is pretty different than do, can, or must.

You SHOULD separate your desired state from actual state (spec / status).

You SHOULD ensure only the people who can update status are the ones who should be doing so.

You SHOULD have single writer where possible.

You SHOULD have reconstructable state.

You SHOULD let controllers set fields in spec when those are desired intent.

You SHOULD consider subresources for when your object needs to adapt to an interface (Scale is a good example of what tim is descirbing above).

It's really hard for me to imagine that any of these can be MUST. The value of SHOULD is where tools can infer behavior. Is inferring that status can be wiped in apply useful? Yes. Is having consistent fields in status (conditions, observed generation) useful and consistent? Yes.

But it's way past the time where we could enforce MUST on any of these and break users. I definitely have examples of where it is useful to break those guidelines above, but they are the exceptions that prove the rule.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewZT__Q0qtCYcYiNs_Dbhi2KAAxxRbJxQJ1A-7G2w%3DhKeQ%40mail.gmail.com.

Daniel Smith

unread,

Feb 5, 2021, 6:24:57 PM2/5/21

to Tim Hockin, John Belamaric, Bowei Du, kubernetes-sig-architecture

IMO the "imagine status got dropped, could you reconstruct it" scenario was an intuition pump to help people understand if a given thing was spec or status, because it was kind of a foreign concept to a lot of people initially.

The part where it *actually* gets dropped doesn't actually make a ton of sense to me, and I'm not we were actually serious about that being a legitimate thing the system might do? Yes, you can do that with events, but it's very different. And yes, Brian has wanted to actually store Spec & Status separately for ages, but IIRC that's a contention / scalability thing, I don't believe there's gains to be made from making status less durable. The fact that we've never even bothered to try wiping status in a test to see what happens is evidence that we were never intending for the system to literally handle that.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewYLiXroePRmBBCMq_P3%2BHa5D7c9TUt5VK0fDZSKt7oZMA%40mail.gmail.com.

Clayton Coleman

unread,

Feb 5, 2021, 6:37:01 PM2/5/21

to Daniel Smith, Tim Hockin, John Belamaric, Bowei Du, kubernetes-sig-architecture

On Fri, Feb 5, 2021 at 6:25 PM 'Daniel Smith' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:

IMO the "imagine status got dropped, could you reconstruct it" scenario was an intuition pump to help people understand if a given thing was spec or status, because it was kind of a foreign concept to a lot of people initially.

The part where it *actually* gets dropped doesn't actually make a ton of sense to me, and I'm not we were actually serious about that being a legitimate thing the system might do? Yes, you can do that with events, but it's very different. And yes, Brian has wanted to actually store Spec & Status separately for ages, but IIRC that's a contention / scalability thing, I don't believe there's gains to be made from making status less durable. The fact that we've never even bothered to try wiping status in a test to see what happens is evidence that we were never intending for the system to literally handle that.

Pod status is the great example of this - we've never formally defined the semantics of what guarantees for execution we provide, but they're roughly:

1. No pod with the same UID runs on two nodes at the same time

2. No pod with the same name and namespace runs on two nodes ever

3. All containers are run "at least once" and one pod.spec.nodeName is set you must assume a process can be running (status can be delayed forever)

4. We do a pretty weak job of guaranteeing RestartNever containers - users are likely inferring it means "at most one" when it really means "if we observe a pod failure we won't try to restart it unless wierd stuff happens" - so the real guarantee is "we'll restart until we observe at least one failure or a success AND it propagates to the apiserver AND back to the kubelet before something else bad happens" and it's up to you to to handle that on disk. We definitely promise not to report success unless we see an actual successful process exit, but we don't promise much else.

4. Once the pod is deleted by the kubelet there is definitely no process running (modulo serious bugs in the integrity of linux)

I bet people would be *surprised* that we don't actually provide stronger guarantees than 3 or 4 because we try so hard to keep status up to date.

So the bigger risk to me would be whether we are doing a good enough job of clarifying our actual guarantees around status for each object, or the controller expectations. I think it would be good to practice wiping status - so that we better clarify to users what guarantees they actually get, and have the critical discussions about what status means. Does doing that test mean we should switch SHOULD to MUST? Probably agree with Daniel's reasoning above

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAB_J3banNXtYwT3iHF2XoE%2BET%3DP5P6-AFPYBDadobze93Fk5wg%40mail.gmail.com.

Tim Hockin

unread,

Feb 5, 2021, 7:18:30 PM2/5/21

to Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

On Fri, Feb 5, 2021 at 3:24 PM Clayton Coleman <ccol...@redhat.com> wrote:

On Fri, Feb 5, 2021 at 6:18 PM 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:

On Fri, Feb 5, 2021 at 2:53 PM Daniel Smith <dbs...@google.com> wrote:
It's cleared in PrepareFor{Create|Update} methods.

FWIW I do not think spec vs status should be a security boundary, it was not intended initially to be so. The difference is "how I want it to be" vs "how it actually is".

I think that horse has left the barn. Most resources have a single controller or a small number of pretty well-trusted controllers. As such spec/status has a natural mapping. We have lots of such examples.

For those fields that might be more promiscuously updated (status.conditions!) we should probably have distinct sub-resources.

Should is pretty different than do, can, or must.

If you are picking at my "should", what I really mean is that WE (the project) could/should define a common sub-resource for commonly status'ed things like conditions (especially since conditions are all uniform now, right? :).

You SHOULD separate your desired state from actual state (spec / status).

And most do

You SHOULD ensure only the people who can update status are the ones who should be doing so.

And mostly we do (the defaults anyway)

You SHOULD have single writer where possible.

And mostly we do

You SHOULD have reconstructable state.

We're spotty on the application of this guidance

You SHOULD let controllers set fields in spec when those are desired intent.

I'm going to argue with this one on 2 axes.

1) Don't trust controllers. Providing finer granularity for controllers is a net win for security and for abstraction (hence /scale). Controllers accessing spec should be a smell in many cases.

2) Don't trust users. There are some spec-like things that we want to allow semi-trusted controllers to set that we do not want to allow users to set (or that users can request and controllers can ACK). We don't really have a per-field ACL mechanism, and I think we don't really want one. We already use RBAC and different resources to govern this.

You SHOULD consider subresources for when your object needs to adapt to an interface (Scale is a good example of what tim is descirbing above).

Scale serves 2 purposes - security AND abstraction. To be clear: I'm *not* proposing to make a status abstraction. I *am* saying that we need a pattern for my point 2 (spec-like things set by semi-trusted actors).

It's really hard for me to imagine that any of these can be MUST. The value of SHOULD is where tools can infer behavior. Is inferring that status can be wiped in apply useful? Yes. Is having consistent fields in status (conditions, observed generation) useful and consistent? Yes.

But it's way past the time where we could enforce MUST on any of these and break users. I definitely have examples of where it is useful to break those guidelines above, but they are the exceptions that prove the rule.

I can't tell if you are agreeing or not :) If we think "status can be wiped" is a useful litmus test, then a) we need to solve the 'spec-like things" problem and b) we could/should do a few things to make that litmus test easier (e.g. actually provide an API to wipe status and ask people to use that when testing). I expect huge explosions, though, as users and controllers lose state, load-balancers get de-provisioned, volumes get un-bound, replicas get re-launched etc.

Clayton

unread,

Feb 5, 2021, 8:00:55 PM2/5/21

to Tim Hockin, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

On Feb 5, 2021, at 7:18 PM, Tim Hockin <tho...@google.com> wrote:

I probably think most controllers should and our controllers should test and we should guide authors as to why (as others have noted upthread). But I am also biased because I maintain APIs where I disagreed with Brian went ahead and created nonrecreatable status and the apis are decent (spec = what I want, status = what the controller has found + history). Apply works, users aren’t too confused, status can be recovered if necessary from backup or just the last copy (more easily than spec because in practice immutability on status fields hasn’t been necessary).

If we think "status can be wiped" is a useful litmus test, then a) we need to solve the 'spec-like things" problem and b) we could/should do a few things to make that litmus test easier (e.g. actually provide an API to wipe status and ask people to use that when testing)

We do have that api today. I’ve written maybe three PRs to add subresource as a flag for the kubectl primitives (patch edit get), but keep getting distracted. I know I personally clear controller state on status during normal operations / debugging / recovery maybe once a month.

. I expect huge explosions, though, as users and controllers lose state, load-balancers get de-provisioned, volumes get un-bound, replicas get re-launched etc.

I’m more of an advocate of doing this ourselves in testing to make sure our controllers are sound but not actually enforce this. Reconstruction at runtime is disruptive to tools and why borrow trouble unless it makes users happy? Reconstruction is an api author problem, not a user problem.

Tim Hockin

unread,

Feb 5, 2021, 8:34:40 PM2/5/21

to Clayton, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

So what's your stance on spec-like things that need some non-user-writeable control?

Clayton Coleman

unread,

Feb 5, 2021, 10:53:50 PM2/5/21

to Tim Hockin, Clayton, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

As someone who wrote several admission controllers to require specific rbac permissions (on virtual resources or virtual subresources) for mutation / setting / clearing fields - for pods.spec.nodeName requiring create on “binding”, for service externalIP requiring update on “service/unrestrictedAddresses”, and for changing ingress hostname post creation requiring a cluster scoped resource permission (to do a cheap creation ordered name reservation system for public hostnames) - I am even more biased.

RBAC and mutating admission controller work reasonably well for selectively controlling small numbers of fields with broad ranges of valid levels of control (where two different cluster usages would disagree how dangerous the field is). I’ve always thought that defaulting of fields is the hard bit (clusterIP selection is so easy for end users, but so gross in code and doesn’t allow any flexibility).

On Feb 5, 2021, at 8:34 PM, Tim Hockin <tho...@google.com> wrote:

Tim Hockin

unread,

Feb 6, 2021, 1:03:18 AM2/6/21

to Clayton Coleman, Clayton, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

I still can't tell if you are arguing for or against doing anything at all.

That you found RBAC to work well.for this makes me think you support SOMETHING, but I interpret that as "but not status".

I am wary of lots of custom subresources. Status as a subresource already exists and is in wide use. Unless we make it super easy to define subresources in built-in and CRD APIs and to use them from kubectl.

I'm not against coming up with some convention or guidance for a new top-level stanza that grants access to more-trusted fields, but that does feel like the long way around.

I'd very much like to NOT make the same mistakes as Service and clusterIP

Monis Khan

unread,

Feb 6, 2021, 7:55:01 AM2/6/21

to Tim Hockin, Clayton Coleman, Clayton, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture

One thing that came to mind from reading Clayton's comment: the restrictions described are a combination of RBAC and admission control.

With CRDs, it is easy to do resource level RBAC (i.e. only this controller can update status) and field level schema requirements (i.e. regex validation).

I don't think CRDs allow creation of arbitrary sub resources to leverage RBAC at that level - you need to use an aggregated API server. I am also unaware of any field level schema validation such as "changing this value requires this subject access review to pass" - you need to write a custom admission webhook.

Both of those are somewhat complicated to do, and add an extra availability dimension outside of the Kube API server.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewbEqFjwDDxJkamXRfUqpgQMNKNndSj88E6z4ZZATMDAOA%40mail.gmail.com.

jayp...@gmail.com

unread,

Feb 8, 2021, 9:49:42 AM2/8/21

to Tim Hockin, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, onus.m...@gmail.com

On Fri, 2021-02-05 at 16:18 -0800, 'Tim Hockin' via kubernetes-sig-

architecture wrote:
> On Fri, Feb 5, 2021 at 3:24 PM Clayton Coleman <ccol...@redhat.com>
> wrote:
> > You SHOULD let controllers set fields in spec when those are
> > desired intent.
>
> I'm going to argue with this one on 2 axes.
>
> 1) Don't trust controllers. Providing finer granularity for
> controllers is a net win for security and for abstraction (hence
> /scale). Controllers accessing spec should be a smell in many cases.
>
> 2) Don't trust users. There are some spec-like things that we want
> to allow semi-trusted controllers to set that we do not want to allow
> users to set (or that users can request and controllers can ACK). We
> don't really have a per-field ACL mechanism, and I think we don't
> really want one. We already use RBAC and different resources to
> govern this.

What is the guidance for controllers that interface with external APIs
that default values for certain desired state fields on the server
side?

A great example of this might be a controller that allows users to
create an S3 Bucket. [0]

When you create an S3 Bucket, the only field representing desired state
that you need to supply is the Bucket's Name. That's it. However, when
the Bucket is created on the S3 side, a number of attributes of the
Bucket are set to default values by the S3 service. Things like whether
or not the Bucket has an object lock enabled (defaults to false) or the
Bucket's access policy (defaults to private).

The way I currently have things set up is that I have fields like
Bucket.Spec.ObjectLockEnabledForBucket and Bucket.Spec.ACL that the
user may fill in their manifest and send over to Kubernetes. These
fields are optional, and I've been wondering how to handle the
"defaulting" behaviour in the controller. My first thought was to do a
read of the Bucket's attributes [1] and change the Spec field to match
whatever was defaulted by the server side. I keep the Status object for
fields that the user has no ability to change -- for example, a
Bucket's Location attribute.

Other systems, including Crossplane, take a different approach and keep
a fairly bright line between what is read from the backend service and
what the user supplied in their desired state. [2] Crossplane does not
let the controller modify the Spec.ForProvider struct and Crossplane
doesn't let the user modify the Status.AtProvider struct. In this way,
it doesn't matter what the server-side defaults are, since Crossplane
only ever sets the Status.AtProvider values and doesn't ever "merge"
those server-side defaults with the Spec.ForProvider values.

I'm actually a bit on the fence as to which approach is better, long-
term. I like the clean separation that Crossplane's strategy enforces,
but I like the less verbose and less duplicative (IMHO) user experience
that our S3 controller provides.

Hoping that y'all Kubernetes gurus might have some advice on this. :)

Best,
-jay

[0] Yes, I am actually dealing with this exact situation in
https://github.com/aws-controllers-k8s/s3-controller

[1] Unfortunately there are 19 separate S3 API commands for this.
Because, well, we can't have nice things.

[2] Crossplane uses a Spec.ForProvider and Status.AtProvider field to
isolate these things from each other. I'm cc'ing Muvaffak from
Crossplane to correct me if I'm wrong here...

David Eads

unread,

Feb 8, 2021, 10:25:34 AM2/8/21

to jayp...@gmail.com, Tim Hockin, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, onus.m...@gmail.com

I like having status reflect the observable state of the world. When status reflects the observable state of the world and is reconciled, it is impossible for status fields to end up in a state where they are corrupted due to some unexpected situation and are never corrected. It's the difference between a count that is returned by actually counting the number of records and a count that is returned by adding every delta perfectly over time. One reflects the state of the world, the other is your estimation of that state and if you ever mess up, it will never be correct again.

I think the spec/status division being based on spec=="I want it like X" and status=="It is actually Y" is more important than the ACL characteristics. I think the ACL characteristics came about as we built systems on top of systems and wanted to react to the state of the world. Users cannot (generally) be trusted to provide the actual state of the world, but we want our controllers to build on that actual state so they work reliably.

If we want to create a section of spec that is more trusted or a section of status that is less trusted, we can do that today using admission plugins. Built-in resources could even do it using subresources (this is how we do the ACL separation of spec/status today).

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/40fc1bed74804507f8490e60e9c243ec78a87aaf.camel%40gmail.com.

Muvaffak Onuş

unread,

Feb 8, 2021, 10:27:02 AM2/8/21

to jayp...@gmail.com, Tim Hockin, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, ne...@upbound.io

Resending after joining sig-architecture.

---

Hi Jay!

I believe there is precedence in Kubernetes for such situations, see late-initialization in API conventions doc. In Crossplane, we are strictly following this API conventions doc, hence implemented late initialization in most of the controllers to handle defaulting from external APIs. So, we do allow the controller to make changes on `spec.forProvider`, however, it's done if and only if the field wasn't filled by the user so that we are sure we don't override any desired state. You can find some examples here: RDS Instance, Elasticache. Some resources like S3 are a bit more complex in our case since it comprises of several different Put calls in provider-aws as opposed to one Create/Put call like others, but we still go with a best effort approach there.

If you look at the helper functions, they return the observed value only if the spec value is empty, that is how the controller preserves the desired state. So, we trust the controller to make the right changes on the spec. But all this is written manually and some controllers miss this functionality, so you might have run into one of those. I hope when this PR is merged, we'll be able to generate late-initialization for all resources automatically.

Best,

Muvaffak

Tim Hockin

unread,

Feb 19, 2021, 6:26:36 PM2/19/21

to David Eads, Jay Pipes, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, onus.m...@gmail.com

Lost track of this thread. I plan to write a KEP, but I wanted to respond to a couple comments.

On Mon, Feb 8, 2021 at 7:25 AM David Eads <de...@redhat.com> wrote:

I like having status reflect the observable state of the world. When status reflects the observable state of the world and is reconciled, it is impossible for status fields to end up in a state where they are corrupted due to some unexpected situation and are never corrected. It's the difference between a count that is returned by actually counting the number of records and a count that is returned by adding every delta perfectly over time. One reflects the state of the world, the other is your estimation of that state and if you ever mess up, it will never be correct again.

I don't think that's the right analogy. I am specifically talking about things that are not natively observable. E.g. My Foo asked for Bar "X" and my request was granted. I have argued in the past that this is Someone Else's Problem. Why doesn't the Bar allocator keep track of which Bars it has allocated, and then periodically reconcile the fact that it allocated "X" to my Foo, via status. Then status is still entirely observation.

That's a lot of complexity for very little gain (see PV/PVC bi-directional binding) and the simple fact is that we DON'T generally do that. It doesn't really make the failure modes any better if the Foo and the Bar somehow become unbound.

I'm arguing that we need to solve this general request/grant pattern. Maybe spec/status is not the right split, but if not I think we should define what IS.

If we want to create a section of spec that is more trusted or a section of status that is less trusted, we can do that today using admission plugins. Built-in resources could even do it using subresources (this is how we do the ACL separation of spec/status today).

How does an admission controller know that principal "da...@foobar.com" is allowed to write to `spec.normalField` but not `.controllerField` while `serviceAccount: bar-controller` is allowed to write `.controllerField` but not the rest of `spec`? Isn't that just reinventing RBAC? I am genuinely curious, because I shot down a design that wanted to do exactly this.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAFS1Mj%2BoJn7jzpQrQVbPYoh9kOG%2ByNr3qxjXFoGZkGOTN63fNg%40mail.gmail.com.

Tim Hockin

unread,

Feb 22, 2021, 5:40:40 PM2/22/21

to David Eads, Jay Pipes, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, onus.m...@gmail.com

https://github.com/kubernetes/enhancements/pull/2537

Tim Hockin

unread,

Jun 7, 2021, 7:56:38 PM6/7/21

to David Eads, Jay Pipes, Clayton Coleman, Daniel Smith, Tim Allclair, Bowei Du, kubernetes-sig-architecture, onus.m...@gmail.com

Hi all. I made some minor updates to this, but there has not been much
feedback so far. Would love to hear more thoughts.

https://github.com/kubernetes/enhancements/pull/2537

Reply all

Reply to author

Forward