CRI extension for emptyDir type volumes

73 views
Skip to first unread message

Harshal Patil

unread,
Aug 1, 2018, 1:38:59 AM8/1/18
to kubernetes-...@googlegroups.com, tall...@google.com, ms...@google.com, dy...@google.com, saa...@google.com, Pradipta Kumar
 
Hi,
 
Kubernetes emptyDir type volumes (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) share their lifespan with the pod they are provisioned for. These volumes are available as long as the pod is running. As of now, Kubelet provisions the tmpfs backed emptyDir type volumes and then containers of the respective pod just bind mount the path from the host. One of the main uses of emptyDir type volume for facilitating the communication between the containers of the same pod. 
 
Lately, there have been some discussions (https://docs.google.com/document/d/1QQ5u1RBDLXWvC8K3pscTtTRThsOeBSts_imYEoRyw8A/edit#) and efforts (https://github.com/kata-containers/runtime) around hardening the 'pod' by sandboxing it. 
 
Considering the tight coupling between an individual pod and a corresponding emptyDir type volume, when you consider sandboxing the pod it only makes sense to have this emptyDir volume provisioned inside the confines of the sandbox. This way the communication between the containers of the pod is subjected to the same level of isolation or hardening as the containers themselves. 
 
We implemented this idea in Kata, https://github.com/kata-containers/runtime/pull/307. The way overall flow works by following way,
1. Create a sandbox for the Pod (kata uses a virtual machine as a sandbox)
2. Before launching the containers detect the emptyDir type volume from the container mounts (https://github.com/kata-containers/runtime/blob/master/cli/utils.go#L58)
3. Provision the tmpfs inside the VM and bind mount it pod containers (which are by now running inside that VM). 
 
This way all the containers are running inside the virtual machine plus all the contents of the emptyDir type volume stays within the confines of the virtual machine. This can be further helpful in future with Intel's TME and MKTME, IBM Ultravisor or AMD SEV. Since those technologies allow you to encrypt the memory pages used by VM, it will also mean that the contents of the emptyDir volumes will also remain encrypted from the host point of view. 
 
By the time pod is created, Kubelet already provisions the emptyDir type volume on the host. This volume is then bind mounted to the containers of the pod. If we want to introduce sandboxing for the pod the components responsible for provisioning the pod have (e.g. an OCI complaint runtime) no way to know if the particular mount is backed by emptyDir type volume. This is why you would notice that in our work in Kata depends on parsing the path of the storage ((https://github.com/kata-containers/runtime/blob/master/cli/utils.go#L58) which is a less than desirable approach. 
 
We think the right way to handle emptyDir type volumes in sandboxed (or non sandboxed) pod is to handle the provisioning of the emptyDir type volumes by CRI. This way any CRI implementation for provisioning sandboxed (or non sandboxed) pods can also provision emptyDir type volumes in proper confines of the pod (VM or non VM)
 
We think we should fit emptyDir type volumes in CRI  at,
 
e.g.
<snip1>
message PodSandboxConfig {
<snip2>
    repeated string empty_dir = 9
</snip2>
}
</snip1>
 
I will be happy to know what the community thinks about this. 
 
Thanks,
Harshal Patil

Fox, Kevin M

unread,
Aug 1, 2018, 11:58:45 AM8/1/18
to Harshal Patil, kubernetes-...@googlegroups.com, tall...@google.com, ms...@google.com, dy...@google.com, saa...@google.com, Pradipta Kumar
With my operator hat on, that does sound like it would perform better for vm based containers. As well as work around some features (or lack there of) in 9p. So that sounds good to me.

Could you create a pr at https://github.com/kubernetes/community so we can talk about the details there?

one other thought, maybe it should be an optional cri feature? so if the cri runtime wants to support emptyDirs themselves, they can, otherwise the current driver still gets the request?

Thanks,
Kevin

From: kubernetes-...@googlegroups.com [kubernetes-...@googlegroups.com] on behalf of Harshal Patil [harsha...@in.ibm.com]
Sent: Tuesday, July 31, 2018 10:38 PM
To: kubernetes-...@googlegroups.com
Cc: tall...@google.com; ms...@google.com; dy...@google.com; saa...@google.com; Pradipta Kumar
Subject: CRI extension for emptyDir type volumes

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-storage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-st...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-storage/OF108C55DF.360FA7CF-ON002582DC.001DE39E-002582DC.001F0610%40notes.na.collabserv.com.
For more options, visit https://groups.google.com/d/optout.

Tim Allclair

unread,
Aug 1, 2018, 2:01:49 PM8/1/18
to Kevi...@pnnl.gov, Yu-Ju Hong, Harshal Patil, kubernetes-...@googlegroups.com, Michelle Au, David Zhu, Saad Ali, bpra...@in.ibm.com
+Yu-Ju to comment on the CRI aspects

Provisioning the emptydir volumes through the CRI implementation makes sense to me, especially since there isn't any lifecycle management of the volume beyond the pod lifecycle. A couple caveats we should consider:

- Volume cleanup (matters when not using an in-memory implementation)
- Resource accounting, both for scheduling (not sure if we have this today), and reporting storage metrics
- Code sharing (we should provide some library implementations)
- Volumes that delegate to emptyDir: off the top of my head, all atomicWriter volumes & gitRepo volumes. I think it makes sense to make these types unchanged, i.e. don't delegate to the CRI

All that said, I'm still worried about the kata-model of using in-memory volumes, especially since kata pods have a static shape, and volume resources are owned by the pod not containers.

Vishnu Kannan

unread,
Aug 1, 2018, 2:14:50 PM8/1/18
to Tim Allclair, Kevi...@pnnl.gov, Yu-Ju Hong, harsha...@in.ibm.com, kubernetes-sig-storage, Michelle Au, David Zhu, Saad Ali, bpra...@in.ibm.com
Keep in mind that emptydir now also supports huge pages. 

Fox, Kevin M

unread,
Aug 1, 2018, 2:16:03 PM8/1/18
to Tim Allclair, Yu-Ju Hong, Harshal Patil, kubernetes-...@googlegroups.com, Michelle Au, David Zhu, Saad Ali, bpra...@in.ibm.com
It probably should pass through the emptyDir type too, so the other side can do both disk emptydirs and mem emptydirs. Both are important.

Thanks,
Kevin


From: Tim Allclair [tall...@google.com]
Sent: Wednesday, August 01, 2018 11:01 AM
To: Fox, Kevin M; Yu-Ju Hong
Cc: Harshal Patil; kubernetes-...@googlegroups.com; Michelle Au; David Zhu; Saad Ali; bpra...@in.ibm.com
Subject: Re: CRI extension for emptyDir type volumes

Michelle Au

unread,
Aug 1, 2018, 4:26:34 PM8/1/18
to Kevi...@pnnl.gov, Tim Allclair, Yu-Ju Hong, harsha...@in.ibm.com, kubernetes-sig-storage, David Zhu, Saad Ali, bpra...@in.ibm.com
Instead of having a specific extension for EmptyDir volumes, I would be interested in some more generic Volume interface.  There is some thought to potentially delegating all volume setup to runtimes in the future.

Fox, Kevin M

unread,
Aug 1, 2018, 4:34:56 PM8/1/18
to Michelle Au, Tim Allclair, Yu-Ju Hong, harsha...@in.ibm.com, kubernetes-sig-storage, David Zhu, Saad Ali, bpra...@in.ibm.com
Are you suggesting that all of CSI belongs on the other side of CRI?

Thanks,
Kevin

From: Michelle Au [ms...@google.com]
Sent: Wednesday, August 01, 2018 1:26 PM
To: Fox, Kevin M
Cc: Tim Allclair; Yu-Ju Hong; harsha...@in.ibm.com; kubernetes-sig-storage; David Zhu; Saad Ali; bpra...@in.ibm.com

Michelle Au

unread,
Aug 1, 2018, 5:00:36 PM8/1/18
to Kevi...@pnnl.gov, Tim Allclair, Yu-Ju Hong, harsha...@in.ibm.com, kubernetes-sig-storage, David Zhu, Saad Ali, bpra...@in.ibm.com
For runtimes that want to provide a more secure environment, it may make sense to have a mechanism to physically isolate volumes from each other in order to reduce the impact of volume breakout vulnerabilities.  For example, the recent subpath volume vulnerability would not have been so impactful if the pod's volumes were isolated in a sandbox vs today's method of having all the volumes of all pods mounted in the same host namespace.

Fox, Kevin M

unread,
Aug 1, 2018, 7:14:09 PM8/1/18
to Michelle Au, Tim Allclair, Yu-Ju Hong, harsha...@in.ibm.com, kubernetes-sig-storage, David Zhu, Saad Ali, bpra...@in.ibm.com
That makes sense.

It does complicate how csi drivers are deployed though. At that point, rather then deploy the driver with a daemonset, you might want some way to inject the drivers as a sidecar into the pod? Anyway, probably not something to solve right this minute, but something to think about...

Thanks,
Kevin

From: 'Michelle Au' via kubernetes-sig-storage [kubernetes-...@googlegroups.com]
Sent: Wednesday, August 01, 2018 2:00 PM
Reply all
Reply to author
Forward
0 new messages