Sandboxes API Follow-Up

83 views
Skip to first unread message

Tim Allclair

unread,
May 25, 2018, 3:50:29 PM5/25/18
to kubernetes-sig-node, Brandon Baker
​On Tuesday we discussed a proposal for adding a first-class sandbox API to Kubernetes: proposal, notes. These discussions were very helpful, and I took away 2 action items.

The first was to try and nail-down a more concrete definition of sandboxes. Doing so is challenging, and to some extent runs into the problem of proving non-existance. We took a stab at it and came up with this:

Sandboxed enforces that no single software vulnerability can lead to a compromise of the confidentiality, integrity, ​or​ availability of processes running outside the pod on the same host, or data residing on the same host but not explicitly exposed to the pod.

I would love to hear feedback on this definition, and other ideas.

The second follow-up item as to explore in more depth what it would look like to expose the underlying sandbox runtime to the user. I wrote up a design sketch for how we might implement this: RuntimeClass sketch.

Despite the extra complexity, RuntimeClass feels like it would be easier to build. I'm worried this is because we're dancing around the hard problems of defining sandboxes and pushing the complexity to the user.

I'm looking forward to discussing topics more on Tuesday in our weekly.

Cheers,
-- Tim Allclair

Jessie Frazelle

unread,
May 28, 2018, 1:16:04 AM5/28/18
to Tim Allclair, Brandon Baker, kubernetes-sig-node
I like RuntimeClass better. Seems less complex imo. I also like how it is called runtime. Also seems, imo, like users just want to chose the executable that is used.

The fact of the matter is that despite the sandbox API being called Sandbox you cannot guarantee that the runtimes actually implementing it do what they say they do. So someone could inevitably create some "super secure runtime" that is indeed wildly insecure and hook it into the sandbox API and this all becomes a lie. Oh and someone will. 

I think runtime is what people will know anyways and if they end up using an insecure one... well that's their fault we can't control it. But at least the name is not decieving. 

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CADtktAX7SK4vhv3gY3k%3DxCpsUAe1HCAXLA5i_VzV0sJCvi%3D1XQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
pgp.mit.edu

Jessie Frazelle

unread,
May 28, 2018, 10:17:35 AM5/28/18
to Tim Allclair, Brandon Baker, kubernetes-sig-node
Actually I guess conformance tests or something for runtimes could guarantee the sandbox so either is good... 

corr...@gmail.com

unread,
May 29, 2018, 3:58:47 PM5/29/18
to kubernetes-sig-node
Following on from the discussion today, it seems like the main criticisms of the Sandbox boolean are:

- It's hard to define what a sandboxed runtime gives you in terms of security gurantees
- It's hard to test or assert that a sandbox is going to give you any particular security guarantees
- How can the tradeoffs be simply expressed in such a way that a user can intuitively make a choice?

So maybe we shouldn't be thinking about it in terms of security guarantees or implementation detail; but rather in policy terms. In that respect, sandboxing can be a policy convention that can be explicitly tested for. For example:

1 - A sandboxed pod has no access to the Kubernetes control plane
2 - A sandboxed pod cannot write any state to the host on which its running that could be readable by other pods (or the host itself?)
     - Limit volume types. A sandboxed pod's persistent state should not be readable in the event of a privilege escalation to the host. Can use block device / encryption
     - Would have an impact on ephermal storage such as log storage. Again this could be overcome by encryption / block devices.
3 - A sandboxed pod inherits no configuration or state from the host on which it runs
     - Malicious modification of a secret or configmap on the host would be picked up by pods. Secrets, configmaps etc should be explicitly copied in
4 - A privileged container within a sandboxed pod can only access the sandbox itself, not the host
5 - No local exec or attach
     - Rather than disabling exec and attach, there should be a way to only allow it via Kubelet and prevent exec and attach on the CRI socket locally

This really only considers runtime and storage. It may not make sense to include conventions around network isolation given all the different providers and options out there. However, a list like this is portable, avoids plumbing / implementation details and can be explicitly tested for. 

I guess a criticism of this - other than it being over-simplistic - is that you could apply these constraints to a regular container runtime - you wouldn't necessarily need gVisor or Kata. In that respect, maybe it's not enough of a guarantee of security for some. You could add something about the kernel not being shared if there was a consensus about that. On the flip side though, maybe that's a good thing. The benefits of introducing abstraction layers - be that x86 or system calls - are harder to assert or test for.

And that brings us to the question of allowing mixed sandbox types on a single node. There's an argument to be made that allowing un-sandboxed containers on a node presents a potential risk to the sandboxed ones - mostly through privilege escalation and subsequent control plane manipulation.

If there were a complete list such as the one above that could be agreed upon, it wouldn't necessarily negate the usefulness of being able to select a particular container runtime for reasons such as performance, hardware isolation etc. But to me it seems so much simpler to keep that as a node-level attribute for now and use taints and tolerations for scheduling.

Ben

Jessie Frazelle

unread,
May 29, 2018, 5:21:47 PM5/29/18
to corr...@gmail.com, kubernetes-sig-node
Thanks for the notes! One question:

5 - No local exec or attach
>  - Rather than disabling exec and attach, there should be a way to only allow it via Kubelet and prevent exec and attach on the CRI socket locally

Why this? Because in every model I can think of if a malicious user as access to the node itself it's already game over so why even prevent this?

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jessie Frazelle

unread,
May 29, 2018, 5:22:30 PM5/29/18
to corr...@gmail.com, kubernetes-sig-node
Anyways you can exec into a container with pure C :) you don't need the socket so what is this even preventing 

Ben Corrie

unread,
May 29, 2018, 6:12:20 PM5/29/18
to Jessie Frazelle, kubernetes-sig-node
Good point Jessie. I was thinking more in terms of Kata when talking about exec, but the entry point to that is still a CRI socket. 

Maybe we need to be clearer about the business value of sandboxing. Is it about keeping things in, out or both? My assumption is that it’s ideally both. 

If we concede that we care about the possibility of a privilege escalation on the host - and I can see why we might - I’m not sure the consequences have to be all or nothing. Can we draw a distinction between affecting the pod’s lifecycle and being able to access the pod’s data or run something inside it? I might be missing something but seems like these aren’t equally bad.

Ben

Sent from my iPhone

Jessie Frazelle

unread,
May 29, 2018, 6:17:38 PM5/29/18
to Ben Corrie, kubernetes-sig-node
In my opinion a Sandbox is for keeping things in, although in real life sand rarely stays in the box

(I'll see myself out)

I think for most threat models around Kubernetes this holds true. For a SaaS, etc you really only care about keeping things in.

I guess a different model would be a use case like that of SGX where you want to keep things out. Without encrypted memory/isolation from the host to the sandbox and all the features of things like SGX I don't know how we can guarantee this.

Also the main use case imo for SGX is when people don't trust the cloud... but that's not really a use case for k8s

Although SGX containers is something I looked into but that's a different whole thing...

Tim Allclair

unread,
May 29, 2018, 7:20:40 PM5/29/18
to Jessica Frazelle, corr...@gmail.com, kubernetes-sig-node
Yes, I agree. Sandboxes are about keeping things in (see "isolation is directional"). IMO keeping things out is a very different problem. There may be use cases where you want both, but I think we should scope them differently. A basic example is that the Kubelet currently expects to be able to exec into containers in order to do lifecycle probes, debugging, and lifecycle management.

1 - A sandboxed pod has no access to the Kubernetes control plane

Agreed, at least by default.

2 - A sandboxed pod cannot write any state to the host on which its running that could be readable by other pods (or the host itself?)

I'm not sure how feasible this is. Kubernetes is all about realizing the desired state of a declaritive API - this means that you minimally need to know the state of the containers, and being able to run probes is an important part of that. I also think that if we restrict sandboxes to stateless workloads we're severely limiting the use cases for the feature.

> 3 - A sandboxed pod inherits no configuration or state from the host on which it runs

What threats is this protecting against? I don't see how copying the data in is stronger than exposing it through a read-only interface. I'm also worried that this is too much of a departure from the Kubernetes API (e.g. downward API, kube-proxy, etc.)

> 4 - A privileged container within a sandboxed pod can only access the sandbox itself, not the host

Yes, this should be a hard requirement for privileged sandbox pods, otherwise it's meaningless.

> 5 - No local exec or attach

I agree with Jessie here. This is extremely difficult without breaking a lot of Kubernetes functionality and adding hardware support.

corr...@gmail.com

unread,
May 29, 2018, 9:16:56 PM5/29/18
to kubernetes-sig-node
Fair enough - thanks for the RTFM reminder to re-read the original doc :)

When thinking about (2) and (3), I was contemplating the recent subpath vulnerability. There will be cases where it may not be possible to execute code on the host or connect to the control plane on the host, but where it is possible to read or write to the host's filesystem. The question then becomes - how serious are the consequences? In that case, failure of containment (the "in") potentially affects another pod on the host (the "out").

If state is inherited from the host's filesystem (3), then the ability to write to the right location on the host's filesystem is elevated to the equivalent of having control plane access. If configuration state is only pushed in by the control plane, this is then the explicit level of access an attacker would need.

In terms of (2), there's a clear distinction to be made between control plane state and user state. I don't see any issue with the control plane keeping whatever state it wants on the host, but should everything a sandboxed pod writes to its ephemeral filesystem be visible on the host filesystem? Seems like you make this point yourself in suggesting that state written to the host should be opaque. That's really what I was getting at - it wouldn't make sense to force sandboxed pods to be stateless.

In summary - you've clearly written and thought much more deeply about this stuff. We can debate the list and what makes sense to be in it - but in the general sense, do you think it's possible to adequately express what a sandbox should be in terms of relatively simple policy constraints?

Ben

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-node+unsub...@googlegroups.com.
--


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
pgp.mit.edu
--


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
pgp.mit.edu
--


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
pgp.mit.edu

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-node+unsub...@googlegroups.com.

Loc Nguyen

unread,
May 30, 2018, 5:39:42 PM5/30/18
to kubernetes-sig-node
Tim, I think you can tell from my question in your doc that I don't particularly like the boolean.  There maybe runtimes that only supports sandboxed pods and the boolean forces these runtimes to also support native pods.  I do like the runtime class a lot better.  Since you had the storageclass in mind when writing your proposal, have you looked into all the work that's currently being proposed in storage for topology aware scheduling?  If applied to runtime class, it would allow heterogeneous nodes that support a combination of native-only, native/sandboxed, and sandboxed-only runtimes.  It will be more complicated but it is the most flexible solution and would allow runtime to follow the same future design of storage.  The boolean feels like a quick and dirty flag that will become a future tech debt.


On Friday, May 25, 2018 at 12:50:29 PM UTC-7, Tim St. Clair wrote:

Tim Allclair

unread,
Jun 7, 2018, 4:50:11 PM6/7/18
to nl...@vmware.com, kubernetes-sig-node
Thanks everyone for the valuable feedback on this topic. In order to unblock progress in this area, I'd like to try and arrive at lazy-consensus in the next SIG-node meeting (6/12). Once we have general consensus, I will rewrite the proposal as a KEP and go through the formal review & approval process. I'd like to have an alpha implementation in 1.12.

See you Tuesday!

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.

To post to this group, send email to kubernete...@googlegroups.com.

Tim Allclair

unread,
Jun 12, 2018, 2:20:20 PM6/12/18
to nl...@vmware.com, kubernetes-sig-node
For those that couldn't make it to today's meeting, the decision is to move forward with the RuntimeClass proposal. I will work on fleshing out the details in a KEP soon, and we can continue discussions on that proposal.

Thanks!

liz.x...@gmail.com

unread,
Jun 18, 2018, 7:38:12 AM6/18/18
to kubernetes-sig-node
Just wanted to add my +1 that I like the RuntimeClass proposal better than the Sandbox boolean too (and also tagging myself into this thread as I'm interested!)

Tim Allclair

unread,
Jun 22, 2018, 3:35:07 PM6/22/18
to liz.x...@gmail.com, kubernetes-sig-node
FYI - the initial RuntimeClass KEP is out for review: https://github.com/kubernetes/community/pull/2290

On Mon, Jun 18, 2018 at 4:38 AM <liz.x...@gmail.com> wrote:
Just wanted to add my +1 that I like the RuntimeClass proposal better than the Sandbox boolean too (and also tagging myself into this thread as I'm interested!)

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages