[RFC] Protecting users of kubectl delete

1110 views
Skip to first unread message

Eddie Zaneski

unread,
May 27, 2021, 3:35:23 PM5/27/21
to kuberne...@googlegroups.com, kubernete...@googlegroups.com

Hi Kuberfriendos,


We wanted to start a discussion about mitigating some of the potential footguns in kubectl.


Over the years we've heard stories from users who accidentally deleted resources in their clusters. This trend seems to be rising lately as newer folks venture into the Kubernetes/DevOps/Infra world.


First some background.


When a namespace is deleted it also deletes all of the resources under it. The deletion runs without further confirmation, and can be devastating if accidentally run against the wrong namespace (e.g. thanks to hasty tab completion use).


```

kubectl delete namespace prod-backup

```


When all namespaces are deleted essentially all resources are deleted. This deletion is trivial to do with the `--all` flag, and it also runs without further confirmation. It can effectively wipe out a whole cluster.


```

kubectl delete namespace --all

```


The difference between `--all` and `--all-namespaces` can be confusing.


There are certainly things cluster operators should be doing to help prevent this user error (like locking down permissions) but we'd like to explore what we can do to help end users as maintainers.


There are a few changes we'd like to propose to start discussion. We plan to introduce this as a KEP but wanted to gather early thoughts.


Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.

  • Alpha

    • Introduce a flag like `--ask-for-confirmation | -i` that requires confirmation when deleting ANY resource. For example the `rm` command to delete files on a machine has this built in with `-i`. This provides a temporary safety mechanism for users to start using now.

    • Add a flag to enforce the current behavior and skip confirmation. `--force` is already used for removing stuck resources (see change 3 below) so we may want to use `--auto-approve` (inspired by Terraform). Usage of `--ask-for-confirmation` will always take precedence and ignore `--auto-approve`. We can see this behavior with `rm -rfi`.

 -i          Request confirmation before attempting to remove each file, regardless of the file's permissions, or whether or not the standard input device is a terminal.  The -i option overrides any previous -f options.

    • Begin warning to stderr that by version x.x.x deleting with `--all` and `--all-namespaces` will require interactive confirmation or the `--auto-approve` flag.

    • Introduce a 10 second sleep when deleting with `--all` or `--all-namespaces` before proceeding to give the user a chance to react to the warning and interrupt their command.

  • Beta

    • Address user feedback from alpha.

  • GA

    • Deleting with `--all` or `--all-namespaces` now requires interactive confirmation as the default unless `--auto-approve` is passed.

    • Remove the 10-second deletion delay introduced in the alpha, and stop printing the deletion warning when interactive mode is disabled.


Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


The `--force` flag is also a frequent source of confusion, and users do not understand what exactly is being forced. Alongside the `--all` change (in the same releases), we should consider renaming `--force` to something like `--force-reference-removal`.


These are breaking changes that shouldn't be taken lightly. Scripts, docs, and applications will all need to be modified. Putting on our empathy hats we believe that the benefits and protections to users are worth the hassle. We will do all we can to inform users of these impending changes and follow our standard guidelines for deprecating a flag.


Please see the following for examples of users requesting or running into this. This is a sample from a 5 minute search.


From GitHub:

From StackOverflow:

Eddie Zaneski - on behalf of SIG CLI

Tim Hockin

unread,
May 27, 2021, 3:47:41 PM5/27/21
to Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
On Thu, May 27, 2021 at 12:35 PM Eddie Zaneski <eddi...@gmail.com> wrote:

Hi Kuberfriendos,


We wanted to start a discussion about mitigating some of the potential footguns in kubectl.


Over the years we've heard stories from users who accidentally deleted resources in their clusters. This trend seems to be rising lately as newer folks venture into the Kubernetes/DevOps/Infra world.


First some background.


When a namespace is deleted it also deletes all of the resources under it. The deletion runs without further confirmation, and can be devastating if accidentally run against the wrong namespace (e.g. thanks to hasty tab completion use).


```

kubectl delete namespace prod-backup

```


When all namespaces are deleted essentially all resources are deleted. This deletion is trivial to do with the `--all` flag, and it also runs without further confirmation. It can effectively wipe out a whole cluster.


```

kubectl delete namespace --all

```


The difference between `--all` and `--all-namespaces` can be confusing.


There are certainly things cluster operators should be doing to help prevent this user error (like locking down permissions) but we'd like to explore what we can do to help end users as maintainers.


There are a few changes we'd like to propose to start discussion. We plan to introduce this as a KEP but wanted to gather early thoughts.


Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.


Can we start with a request for confirmation when the command is run interactively and a printed warning (and maybe the sleep). 
 

Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


The "material harm" here feels very low and I am not convinced it rises to the level of breaking users. 
 

Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


I think 3 releases is too aggressive to break users.  We know that it takes months or quarters for releases to propagate into providers' stable-channels.  In the meantime, docs and examples all over the internet will be wrong.

If we're to undertake any such change I think it needs to be more gradual.  Consider 6 to 9 releases instead.  Start by adding new forms and warning on use of the old forms.  Then add small sleeps to the deprecated forms.  Then make the sleeps longer and the warnings louder.  By the time it starts hurting people there will be ample information all over the internet about how to fix it.  Even then, the old commands will still work (even if slowly) for a long time.  And in fact, maybe we should leave it in that state permanently.  Don't break users, just annoy them.
 
Tim

Brian Topping

unread,
May 27, 2021, 3:54:40 PM5/27/21
to Eddie Zaneski, kuberne...@googlegroups.com, kubernete...@googlegroups.com
Please also consider this issue

There are good examples of solving this issue in Rook and Gardener. My personal preference from those projects is requiring an annotation to be placed on critical resources before any deletion workflow is allowed to start. If these annotation requirements could be defined declaratively, projects and users could create the constraints on installation. As well, the constraints could be removed if they became onerous in dev / test environments. 

Creating basic safeguards not just about junior users, I have deleted massive amounts of infrastructure several times because I was in the wrong kubectl context. I can’t judge whether I am an idiot or not.

I have started making windows with critical resources present to have obnoxious backgrounds that are unmistakeable. Another idea in this genre is for better kubectl support for PS1 resources. Contexts and/or namespaces could contain API resources with PS1 sequences that are played when a context is activated. Again, this would be easily modified or removed when they aren’t desired.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

Tim Hockin

unread,
May 27, 2021, 3:58:58 PM5/27/21
to Brian Topping, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
default context is a good point.

I'd like a way to set my kubeconfig to not have defaults, and to REQUIRE me to specify --context or --cluster and --namespace.  I have absolutely flubbed this many times.

Jordan Liggitt

unread,
May 27, 2021, 3:59:43 PM5/27/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
I appreciate the desire to help protect users, but I agree with Tim that rollouts take way longer than you expect, and that the bar for breaking existing users that are successful is very high.

The project's deprecation periods are the minimum required. For the core options of the core commands of a tool like kubectl which is used as a building block, I don't think we should ever break compatibility if we can possibly avoid it.


On Thu, May 27, 2021 at 3:47 PM 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
On Thu, May 27, 2021 at 12:35 PM Eddie Zaneski <eddi...@gmail.com> wrote:

Change 1: Require confirmation when deleting with --all and --all-namespaces


Confirmation when deleting with `--all` and `--all-namespaces` is a long requested feature but we've historically determined this to be a breaking change and declined to implement. Existing scripts would require modification or break. While it is indeed breaking, we believe this change is necessary to protect users.


We propose moving towards requiring confirmation for deleting resources with `--all` and `--all-namespaces` over 3 releases (1 year). This gives us ample time to warn users and communicate the change through blogs and release notes.


Can we start with a request for confirmation when the command is run interactively and a printed warning (and maybe the sleep). 

+1 for limiting behavior changes to interactive runs, and starting with warnings and maybe sleeps.
 
 

Change 2: Throw an error when --namespace provided to cluster-scoped resource deletion


Since namespaces are a cluster resource using the `--namespace | -n` flag when deleting them should error. This flag has no effect on cluster resources and confuses users. We believe this to be an implementation bug that should be fixed for cluster scoped resources. Although it is true that this may break scripts that are incorrectly including the flag on intentional mass deletion operations, the inconvenience to those users of removing the misused flag must be weighed against the material harm this implementation mistake is currently causing to other users in production. This will follow a similar rollout to above.


The "material harm" here feels very low and I am not convinced it rises to the level of breaking users. 

Setting the namespace context of an invocation is equivalent to putting a default namespace in your kubeconfig file. I don't think we should break compatibility with this option. It is likely to disrupt tools that wrap kubectl and set common options on all kubectl invocations.

 
 

Change 3: Rename related flags that commonly cause confusion


The `--all` flag should be renamed to `--all-instances`. This makes it entirely clear which "all" it refers to. This would follow a 3-release rollout as well, starting with the new flag and warning about deprecation.


I think 3 releases is too aggressive to break users.  We know that it takes months or quarters for releases to propagate into providers' stable-channels.  In the meantime, docs and examples all over the internet will be wrong.

If we're to undertake any such change I think it needs to be more gradual.  Consider 6 to 9 releases instead.  Start by adding new forms and warning on use of the old forms.  Then add small sleeps to the deprecated forms.  Then make the sleeps longer and the warnings louder.  By the time it starts hurting people there will be ample information all over the internet about how to fix it.  Even then, the old commands will still work (even if slowly) for a long time.  And in fact, maybe we should leave it in that state permanently.  Don't break users, just annoy them.

If we wanted to add parallel flag names controlling the same variables and hide the old flags, that could be ok, but we should never remove the old flags. Even adding parallel flags means the ecosystem gets fragmented between scripts written against the latest kubectl and ones written using previous flags.

Clayton Coleman

unread,
May 27, 2021, 4:07:02 PM5/27/21
to Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
This is somewhat terrifying to me from a backward compatibility perspective.  We have never changed important flags like this, and we have in fact explicitly stated we should not.  I might almost argue that if we were to do this, we'd create a new CLI that has different flags
 



These are breaking changes that shouldn't be taken lightly. Scripts, docs, and applications will all need to be modified. Putting on our empathy hats we believe that the benefits and protections to users are worth the hassle. We will do all we can to inform users of these impending changes and follow our standard guidelines for deprecating a flag.


Please see the following for examples of users requesting or running into this. This is a sample from a 5 minute search.


From GitHub:

From StackOverflow:

Eddie Zaneski - on behalf of SIG CLI

--

Brendan Burns

unread,
May 27, 2021, 4:21:44 PM5/27/21
to ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan



From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Clayton Coleman <ccol...@redhat.com>
Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddi...@gmail.com>
Cc: kubernetes-dev <kuberne...@googlegroups.com>; kubernetes-sig-cli <kubernete...@googlegroups.com>
Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete
 

Jordan Liggitt

unread,
May 27, 2021, 4:23:08 PM5/27/21
to Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I like the "opt into deletion protection" approach. That got discussed a long time ago (e.g. https://github.com/kubernetes/kubernetes/pull/17740#issuecomment-217461024), but didn't get turned into a proposal/implementation

There's a variety of ways that could be done... server-side and enforced, client-side as a hint, etc.

Daniel Smith

unread,
May 27, 2021, 5:11:24 PM5/27/21
to Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I'm in favor of server-side enforced deletion protection, however it's not clear how that will protect a single "locked" item in a namespace if someone deletes the entire namespace.

The last deletion protection mechanism conversation that comes to mind got bogged down in, well what if multiple actors all want to lock an object, how do you know that they have all unlocked it? I can imagine a mechanism like Finalizers (Brian suggested this--"liens"), but I'm not convinced the extra complexity (and implied delay agreeing on / building something) is worth it.

I think I disagree with all those who don't want to make kubectl safer for fear of breaking users, because I think there's probably some middle ground, e.g. I can imagine something like: detect if a TTY is present; if so, give warnings / make them confirm destructive action; otherwise, assume it's a script that's already been tested and just execute it.



You received this message because you are subscribed to the Google Groups "kubernetes-sig-cli" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-cli/CAMBP-pJwPtDk0Zz6XqTo4XFKox8k3RsfQ2b%2B-rLR%2BeeDrTKG4Q%40mail.gmail.com.

Benjamin Elder

unread,
May 27, 2021, 5:20:04 PM5/27/21
to Daniel Smith, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
This is perhaps veering a bit off topic but detecting actual interactivity can be tricky FWIW ... E.G. operating in travis-ci you will detect a TTY as it's intentionally trying to get output from tools that match developer's terminals.

https://github.com/kubernetes-sigs/kind/pull/1479/files
https://github.com/travis-ci/travis-ci/issues/8193
https://github.com/travis-ci/travis-ci/issues/1337

I wouldn't recommend the TTY detection route in particular.

Antonio Ojea

unread,
May 27, 2021, 6:07:57 PM5/27/21
to Benjamin Elder, Daniel Smith, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli

raghvenders raghvenders

unread,
May 28, 2021, 12:12:56 AM5/28/21
to Antonio Ojea, Benjamin Elder, Brendan Burns, Daniel Smith, Eddie Zaneski, Jordan Liggitt, ccoleman, kubernetes-dev, kubernetes-sig-cli
Is it worth considering like old school substitution if archive for delete so can be restored then or the same could be achieved through scheduled eviction process for delete ?

Regards,
Raghvender

abhishek....@gmail.com

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Kubernetes developer/contributor discussion
My +1 to this proposal.

As much as we care about giving utility to all users, it is also a basic need to provide some cover from accidental disasters. RBAC is a very wide topic and I understand kubernetes administrators has responsibility to restrict access.
At the same time there are cases where a cluster is very big with many applications on it and admin access to a namespace has to be given to different people to ease some work. At last we are all human, one "--all" or "-A| --all-namespaces" is just needed to put down a otherwise running cluster with a 'delete' call.
I would say, it is very possible for anyone to make such mistake but the payment must not be whole cluster going down.
That's the same reason "rm" in Linux has an "--interactive|-i" feature because any level of experts sometimes even make such mistakes.
I am in total favor of having something like "--interactive|-i" or "--ask-for-confirmation" in place as Alpha feature with warning at first, and then slowly graduate it to GA. That would give every one a lot of time to change any breaking automation scripts. 

Siyu Wang

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Eddie Zaneski, kuberne...@googlegroups.com, kubernete...@googlegroups.com
Hi, you may look at the OpenKruise project. The latest v0.9.0 version has provided a feature called Deletion Protection, which can not only protect the namespaces from cascading deletion, but also for other resources like workloads and CRD.

The defense by webhook can also protect deletion operations from kubectl or any other api sources.



Rory McCune

unread,
May 28, 2021, 6:56:53 AM5/28/21
to Kubernetes developer/contributor discussion
Hi All, 

Looking at this, and seeing that making changes to the operation of kubectl will take a while, would it make sense to start with some more guidance for cluster operators around least privilege RBAC designs and using things like impersonation to reduce the risk of mistakes being made?

If I relate this back to other setups like Windows domain admin, standard good practice is for them not to use their domain admin account for day to day administration but to have a separate account to use where destructive actions are needed. Then of course in Linux we have sudo.

If cluster operators made use of read-only accounts for standard troubleshooting and then had impersonation rights to an account with deletion rights, it may reduce the likelihood of accidents happening as an additional switch would need to be provided.

Kind Regards

Rory

Douglas Schilling Landgraf

unread,
May 28, 2021, 7:48:57 AM5/28/21
to Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Thu, May 27, 2021 at 4:23 PM 'Jordan Liggitt' via
kubernetes-sig-cli <kubernete...@googlegroups.com> wrote:
>
> I like the "opt into deletion protection" approach. That got discussed a long time ago (e.g. https://github.com/kubernetes/kubernetes/pull/17740#issuecomment-217461024), but didn't get turned into a proposal/implementation
>

+1 Recently I talked with a coworker looking for such a feature.

> There's a variety of ways that could be done... server-side and enforced, client-side as a hint, etc.
>
> On Thu, May 27, 2021 at 4:21 PM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
>>
>> I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.
>>
>> We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also
> consider "k8s.io/lock" which actively blocks the delete.

Annotation seems a pretty straightforward approach IMO if such a
feature was enabled in the cluster by the user.
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-cli" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-cli/CAMBP-pJwPtDk0Zz6XqTo4XFKox8k3RsfQ2b%2B-rLR%2BeeDrTKG4Q%40mail.gmail.com.

Tim Hockin

unread,
May 28, 2021, 10:31:16 AM5/28/21
to Douglas Schilling Landgraf, Jordan Liggitt, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
There are are lots of good ideas here.  I look forward to a solution that takes the best parts of each of them :)

Zizon Qiu

unread,
May 28, 2021, 10:58:23 AM5/28/21
to Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.
 
Or abuse the existing finalizer mechanism.  

Daniel Smith

unread,
May 28, 2021, 11:35:14 AM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Finalizers prevent a deletion from finishing, not from starting.

Tim Hockin

unread,
May 28, 2021, 12:14:40 PM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzd...@gmail.com> wrote:
On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:
I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.
 
Or abuse the existing finalizer mechanism.  

Finalizers are not "deletion inhibitors" just "deletion delayers".  Once you delete, the finalizer might stop it from happening YET but it *is* going to happen.  I'd rather see a notion of opt-in delete-inhibit.  It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

 

Tim Hockin

unread,
May 28, 2021, 12:55:32 PM5/28/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli


On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzd...@gmail.com> wrote:
I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer).
And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die".  A delete operation moves an object, irrevocably from "alive" to "waiting to die".  That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it.  Let's not abuse that to mean something else.

Tabitha Sable

unread,
May 28, 2021, 1:54:48 PM5/28/21
to Rory McCune, Kubernetes developer/contributor discussion
I really love this suggestion, Rory. I've heard it come up in other contexts before and I think it's really smart.

WDYT about taking this idea to our friends at sig-security-docs?

Tabitha

Abhishek Tamrakar

unread,
May 29, 2021, 1:13:41 AM5/29/21
to Tim Hockin, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover.
The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.


You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

Zizon Qiu

unread,
May 29, 2021, 1:13:46 AM5/29/21
to Tim Hockin, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer).
And keeping alive whenever counter > 0(with any arbitrary finalizer).

raghvenders raghvenders

unread,
May 29, 2021, 1:13:50 AM5/29/21
to Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Or Finalizing through consensus :)

raghvenders raghvenders

unread,
Jun 1, 2021, 11:25:25 AM6/1/21
to Abhishek Tamrakar, Tim Hockin, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
Since client-side changes would potentially go about  6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

Quickly Summarizing the options discussed so far (Server-side):
  • Annotation and Delete Prohibitors
  • Finalizers
  • RBAC and Domain accounts/sudo-like
Please add, if I missed anything or correct me if it is not the option.

And parallelly continuing with proposed Kubectl client-based changes - Change 1 (Interactive), Change 2, and 3 for the targeted release timelines.


I would be curious to see how will it be like, choosing 1 of 3 options or combine the options, then a WBS and stakeholder approvals, components changes, and release rollouts?

Regards,
Raghvender


Josh Berkus

unread,
Jun 1, 2021, 12:26:34 PM6/1/21
to Tim Hockin, Eddie Zaneski, Kubernetes developer/contributor discussion, kubernetes-sig-cli
On 5/27/21 12:47 PM, 'Tim Hockin' via Kubernetes developer/contributor
discussion wrote:
>
> If we're to undertake any such change I think it needs to be more
> gradual.  Consider 6 to 9 releases instead.  Start by adding new forms
> and warning on use of the old forms.  Then add small sleeps to the
> deprecated forms.  Then make the sleeps longer and the warnings louder.
> By the time it starts hurting people there will be ample information all
> over the internet about how to fix it.  Even then, the old commands will
> still work (even if slowly) for a long time.  And in fact, maybe we
> should leave it in that state permanently.  Don't break users, just
> annoy them.

My experience is that taking more releases to roll out a breaking change
doesn't really make any difference ... users just ignore the change
until it goes GA, regardless.

Also consider that releases are currently 4 months, so 6 to 9 releases
means 2 to 3 years.

What I would rather see here is a switch that supports the old behavior
in the kubectl config. Then deprecate that over 3 releases or so. So:

Alpha: feature gate
Beta: feature gate, add config switch (on if not set)
GA: on by default, config switch (off if not set)
GA +3: drop config switch -- or not?

... although, now that I think about it, is it *ever* necessary to drop
the config switch? As a scriptwriter, I prefer things I can put into my
.kube config to new switches.

Also, of vital importance here is: how many current popular CI/CD
platforms rely on automated namespace deletion? If the answer is
"several" then that's gonna slow down rollout.

--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO

Tim Hockin

unread,
Jun 1, 2021, 12:52:52 PM6/1/21
to raghvenders raghvenders, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, Eddie Zaneski, kubernetes-dev, kubernetes-sig-cli
On Tue, Jun 1, 2021 at 8:20 AM raghvenders raghvenders
<raghv...@gmail.com> wrote:
>
> Since client-side changes would potentially go about 6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

To be clear - the distinction isn't really client vs. server. It's
about breaking changes without users EXPLICITLY opting in. You REALLY
can't make something that used to work suddenly stop working, whether
that is client or server implemented.

On the contrary, client-side changes like "ask for confirmation" and
"print stuff in color" are easier because they can distinguish between
interactive and non-interactive execution.

Adding a confirmation to interactive commands should not require any
particular delays in rollout.

Eddie Zaneski

unread,
Jun 1, 2021, 4:58:02 PM6/1/21
to kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
Thanks to everyone for the great thoughts and discussion so far!

There are some good ideas throughout this thread (please keep them coming) that could probably stand alone as KEPs. I believe anything opt-in/server-side is orthogonal to what we're currently trying to achieve.

I think the big takeaway so far is that the flag and error changes should be separated from the warning/delay/confirmation changes.

We're thinking in the context of an imperative CLI that takes user input and executes administrative actions. Users don't intend to delete the resources they are accidentally deleting - not that there are things that should never be deleted. It doesn't matter how many mistakes have to pile up to create a perfect storm of a bad thing because we're allowing a bad thing to happen without a confirmation gate.

With confirmation in place we significantly lower the chances of accidentally deleting everything in your cluster. This will most likely be the scope of our starting point.

If you want to join us for more we will be discussing during the SIG-CLI call tomorrow (Wednesday 9am PT).


Eddie Zaneski


On Tue, Jun 01, 2021 at 10:52 AM, Tim Hockin <tho...@google.com> wrote:

On Tue, Jun 1, 2021 at 8:20 AM raghvenders raghvenders
<raghvenders@gmail.com> wrote:

Since client-side changes would potentially go about 6-9 releases as mentioned by Tim and potentially breaking changes, a server-side solution would be a reasonable and worthy option to consider and finalize.

To be clear - the distinction isn't really client vs. server. It's about breaking changes without users EXPLICITLY opting in. You REALLY can't make something that used to work suddenly stop working, whether that is client or server implemented.

On the contrary, client-side changes like "ask for confirmation" and
"print stuff in color" are easier because they can distinguish between interactive and non-interactive execution.

Adding a confirmation to interactive commands should not require any particular delays in rollout.

Quickly Summarizing the options discussed so far (Server-side):

Annotation and Delete Prohibitors
Finalizers
RBAC and Domain accounts/sudo-like

Please add, if I missed anything or correct me if it is not the option.

And parallelly continuing with proposed Kubectl client-based changes - Change 1 (Interactive), Change 2, and 3 for the targeted release timelines.

I would be curious to see how will it be like, choosing 1 of 3 options or combine the options, then a WBS and stakeholder approvals, components changes, and release rollouts?

Regards,
Raghvender

On Sat, May 29, 2021 at 12:13 AM Abhishek Tamrakar <abhishek.tamrakar08@gmail.com> wrote:

The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover. The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.

On Fri, May 28, 2021, 22:25 'Tim Hockin' via Kubernetes developer/contributor discussion <kubernetes-dev@googlegroups.com> wrote:

On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzdtsv@gmail.com> wrote:

I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer). And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die". A delete operation moves an object, irrevocably from "alive" to "waiting to die". That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it. Let's not abuse that to mean something else.

On Sat, May 29, 2021 at 12:14 AM Tim Hockin <thockin@google.com> wrote:

On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzdtsv@gmail.com> wrote:

On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kubernetes-dev@googlegroups.com> wrote:

I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

Or abuse the existing finalizer mechanism.

Finalizers are not "deletion inhibitors" just "deletion delayers". Once you delete, the finalizer might stop it from happening YET but it *is* going to happen. I'd rather see a notion of opt-in delete-inhibit. It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan

________________________________
From: kubernetes-dev@googlegroups.com <kubernetes-dev@googlegroups.com> on behalf of Clayton Coleman <ccoleman@redhat.com> Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddiezane@gmail.com>
Cc: kubernetes-dev <kubernetes-dev@googlegroups.com>; kubernetes-sig-cli <kubernetes-sig-cli@googlegroups.com> Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKfeUTY2L8dq%2BZr0Eagun_AUtOmpC7sExuuvC8OTZ6YSw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/SA0PR21MB2011CEA6073A236826EC84C3DB239%40SA0PR21MB2011.namprd21.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAKTRiEK%3Dbu6HQMT9xZ8PCvhQxJT5AX5WsFO_EkkucS%2Btbf4UBA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe. To unsubscribe from this group and all its topics, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAGBZAhGEUQ9bd0zcbt3aB2Z7Z958Wqv9Qx7iHwUfnnWWdHvGkA%40mail.gmail.com.


Brian Topping

unread,
Jun 1, 2021, 5:48:40 PM6/1/21
to Eddie Zaneski, kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
An especially dangerous situation is one where Ceph storage is managed by Rook. Rook itself is incredibly reliable, but hostStorage is used for the critical Placement Group (PG) maps on monitor nodes (stored in RocksDB). Loss of PG maps would result in loss of *all* PV data in the storage cluster! 

IMO this is more critical than loss of the API object store – assuming they are both backed up, restoring etcd and waiting for reconciliation is several orders of magnitude less downtime than restoring TB/PB/EB of distributed storage. Some resilient application architectures are designed not to need backup, but cannot tolerate a complete storage failure. 

Raising this observation in case it’s worth considering hierarchical confirmation gates with something basic like reference counting. It should be *even harder* to delete PV storage providers, cluster providers or other items that have multiple dependencies.

Maybe this indicates a “deletion provider interface” for pluggable tools. Default no-op implementations echo existing behavior, advanced implementations might be installed with Helm, use LDAP for decision processing and automatically archive deleted content. Let the community build these implementations instead of trying to crystal ball the best semantics. This also pushes tooling responsibility out to deployers. 

$0.02...

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmwa0zBn1pjvfArcPezj%2B1AupLhwENKpgf1rrQL6p5Nocw%40mail.gmail.com.

fillz...@gmail.com

unread,
Jun 2, 2021, 1:18:54 AM6/2/21
to Kubernetes developer/contributor discussion
Server-side solution is reasonable. However, finalizers can just protect the resource to be deleted in etcd, but the resources belongs to it will still be deleted.
Webhook might be a better way to extend this.

OpenKruise, one of the CNCF sandbox projects, has already provides the Protection for Cascading Deletion.



Fury kerry

unread,
Jun 2, 2021, 1:19:06 AM6/2/21
to Eddie Zaneski, kubernetes-dev, Abhishek Tamrakar, Zizon Qiu, Brendan Burns, ccoleman, raghvenders raghvenders, Tim Hockin, kubernetes-sig-cli
server side deletion protections are already implemented in OpenKruise (https://openkruise.io/en-us/docs/deletion_protection.html), which cover both namespace and workload cascade detention. 

On Sat, May 29, 2021 at 12:13 AM Abhishek Tamrakar <abhishek....@gmail.com> wrote:

The current deletion strategy provides is easy but very risky without any gates, the deletion could risk whole cluster, this is where it needs some cover. The reason I would still prefer the client-side approach as mentioned in the original proposal is because the decision of deletion of a certain object or objects should remain in control of the end user at the same time providing the safest for them to operate the cluster.

On Fri, May 28, 2021, 22:25 'Tim Hockin' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:

On Fri, May 28, 2021 at 9:21 AM Zizon Qiu <zzd...@gmail.com> wrote:

I`m thinking of finalizers as some kind of reference counter, like smart pointers in C++ or something like that.

Resources are deallocated when the counter turns down to zero(no more finalizer). And keeping alive whenever counter > 0(with any arbitrary finalizer).

That's correct, but there's a fundamental difference between "alive" and "waiting to die". A delete operation moves an object, irrevocably from "alive" to "waiting to die". That is a visible "state" (the deletionTimestamp is set) and there's no way to come back from it. Let's not abuse that to mean something else.

On Sat, May 29, 2021 at 12:14 AM Tim Hockin <tho...@google.com> wrote:

On Fri, May 28, 2021 at 7:58 AM Zizon Qiu <zzd...@gmail.com> wrote:

On Fri, May 28, 2021 at 4:21 AM 'Brendan Burns' via Kubernetes developer/contributor discussion <kuberne...@googlegroups.com> wrote:

I'd like to suggest an alternate approach that is more opt-in and is also backward compatible.

We can add an annotation ("k8s.io/confirm-delete: true") to a Pod and if that annotation is present, prompt for confirmation of the delete. We might also consider "k8s.io/lock" which actively blocks the delete.

Or abuse the existing finalizer mechanism.

Finalizers are not "deletion inhibitors" just "deletion delayers". Once you delete, the finalizer might stop it from happening YET but it *is* going to happen. I'd rather see a notion of opt-in delete-inhibit. It is not clear to me what happens if I have a delete-inhibit on something inside a namespace and then try to delete the namespace - we don't have transactions, so we can't abort the whole thing - it would be stuck in a weird partially-deleted state and I expect that to be a never-ending series of bug reports.

We could also support those annotations at a namespace level if we wanted to.

This is similar to Management Locks that we introduced in Azure (https://docs.microsoft.com/en-us/rest/api/resources/managementlocks) for similar reasons to prevent accidental deletes and force an explicit action (remove the lock) for a delete to proceed.

--brendan

________________________________

From: kuberne...@googlegroups.com <kuberne...@googlegroups.com> on behalf of Clayton Coleman <ccol...@redhat.com> Sent: Thursday, May 27, 2021 1:06 PM
To: Eddie Zaneski <eddi...@gmail.com>

Cc: kubernetes-dev <kuberne...@googlegroups.com>; kubernetes-sig-cli <kubernete...@googlegroups.com> Subject: [EXTERNAL] Re: [RFC] Protecting users of kubectl delete

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmx-a6qLr_%3D74Mv%2B%2Bp5rJJkPA%3Dk8vtFNTKs5LY1xB4x_Xw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKfeUTY2L8dq%2BZr0Eagun_AUtOmpC7sExuuvC8OTZ6YSw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/SA0PR21MB2011CEA6073A236826EC84C3DB239%40SA0PR21MB2011.namprd21.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAKTRiEK%3Dbu6HQMT9xZ8PCvhQxJT5AX5WsFO_EkkucS%2Btbf4UBA%40mail.gmail.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/y4Q20V3dyOk/unsubscribe. To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAO_RewaP8-96m-Tjg4wQ6Gv0yTgL2EeDpmQNaZUK8-BdwM1s7g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAGBZAhGEUQ9bd0zcbt3aB2Z7Z958Wqv9Qx7iHwUfnnWWdHvGkA%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAN9Ncmwa0zBn1pjvfArcPezj%2B1AupLhwENKpgf1rrQL6p5Nocw%40mail.gmail.com.


--
Please consider the environment before you print this mail
Zhen Zhang
Zhejiang University
Yuquan Campus
MSN:Fury_...@hotmail.com

Eddie Zaneski

unread,
Jun 2, 2021, 5:02:43 PM6/2/21