@nicorevin: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-misc.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
I don't follow why this is needed, emptyDir creates a new tmpfs "device" from nothing by doing "mount -o tmpfs tmpfs " and the only thing that then has access to that device is the pod. I think the advice about "noexec" etc. is about /tmp specifically not tmpfs in general
@wongma7 then it is docker who is wrong. Try docker run --tmpfs /lol smth
Good point, I dug up the issue that led to that moby/moby#12143 . I'm hardly qualified to talk about security issues so I'll defer to the reviewers :). The only thing I'll add for the reviewers to consider is that the docker options are easily overridable, whereas an emptyDir user will be stuck with noexec no matter what.
the docker options are easily overridable, whereas an emptyDir user will be stuck with noexec no matter what.
This is what I was thinking about as well.
On the one hand it would probably be good to allow users to mount tmpfs with possibility to run executables.
On the other hand one container == one process. Kubernetes has elaborated this principle by introducing pods as a matter for composite container applications thus encouraging people to follow the principle of one proc per container and addressing the need to run some helper tools in scenarios where this is necessary.
Running a container with read-only root filesystem and noexec writable mounts is just one of the steps in eliminating possible attack vectors and reassuring that only the stuff pre-baked into the container will be executed. Here is a bunch of examples.
Ok, to summarize:
emptyDir.mountOptionsIs there any action on this? We currently have this item in our risk register.
While almost all of our pod filesystems are read-only, we are required to mount an emptyDir temporary filesystem at /tmp so that .NET Core can operate correctly (it requires writing temporary files to disk), and we want to ensure that any potential attacker can not use this emptyDir volume as a way of staging executables of their choice.
Can someone explain the attack vector that this is trying to prevent?
It's part of a defense in depth strategy. An attacker being able to write executables to disk and executing them generally has a lower threshold than convincing an application to allocate a block of memory, mark it as executable and jump to it.
So if I understand correctly, you want to prevent a scenario like:
So if kubelet mounted the emptydir as noexec, then it would avoid this scenario.
But do you even need emptydir to do this? Couldn't you do this also with the container writable layer?
We run all our pods with readOnlyRootFilesystem. The only thing writable is the emptyDir.
@msau42
We are willing to do the same. Running a container in read-pnly mode with tmpfs mounts for some runtime nevessary stuff (/run or /tmp) and there is a general recommendation to mount tmpfs with noexec, nosuid,nodev flags. If you run plain dpcker run —tmpfs /lol something you will see it in action
@kubernetes/sig-auth-feature-requests
imo this feature doesn't have to take the form of emptydir.mountoptions. It doesn't even need to be an emptydir field, it could be a pod.securitypolicy field that says "all of this pod's emptydirs must be mounted noexec,nosuid,nodev", imo that would satisfy this use-case well enough without bringing in any of the complications of having psp validations parse all pods' volumes.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
/lifecycle frozen
/remove-lifecycle stale
Do we know if there are any updates on this ?
We are struggling to get noexec removed from our emptyDir mounts. Is there any progress with this!?
I am going to put my hands on it next week. But taking the release schedule I would not expect it to ship until 1.14
@nrvnrvn do you want some help with this? I think I know the key line to change once the options exist but am a bit unclear on what exactly needs to be done to open up a pipeline between say a YAML definition and here and have no idea what would be required for the pod security policy work. Happy to spend some cycles on this though...
I think I've got a change that might do what the OP was looking for, but haven't yet been able to do a proper e2e test.
Having gone through this, though, I'm wondering if this is really the right solution, or if the right solution is more general and involves adding mountOptions to VolumeMount.
Wanting to have noexec,nosuid,nodev (or other options) isn't just something that you would want on emptyDir: {medium: "Memory"} or emptyDir: {medium:"HugePages"}; you'd want it on emptyDir: {} as well, and likely on most volumes if you're security-conscious and not able to mount them read-only.
Briefly digging into that rabbit hole seems to lead quickly to a gap in the runtime API, where we can't pass arbitrary mount options to volumes, and even docker run doesn't support this.
Not sure if this is worth pursuing further, especially given the limitation to the memory-based emptyDir implementations. I suppose a person could rewrite the directory-based emptyDir so that it does a bind mount and applies the options...
Thoughts?
@glb but Docker does support this:
# docker volume create --driver local --opt type=tmpfs --opt device=tmpfs \
--opt o=size=100m,uid=1000,noexec,nosuid,nodev foo
# docker run -d -p 8000:80 -v foo:/usr/share/nginx/html --name bar --rm nginx
# mount | grep foo
tmpfs on /var/lib/docker/volumes/foo/_data type tmpfs (rw,nosuid,nodev,noexec,relatime,size=102400k,uid=1000)
@adampl true, and in the branch I created to try this out, it seems pretty easy to extend emptyDir: { medium: "Memory" } to set the mount options. This is a rough equivalent to creating a tmpfs volume like you showed.
It's even equivalently easy to do the same for emptyDir: { medium: "HugePages"}.
However, for the default emptyDir which just uses a directory (see here and here), it's not so simple. Maybe I could hack something with a bind mount, and I'm tempted now to try that, but I keep getting dragged back to the feeling that this isn't the right solution. It doesn't seem right to be putting mount options on the volume spec; they should be on the volume mount, and you should be able to use them for any kind of volume, not just emptyDir.
@glb Yes, I agree, this should be more generic.
Because I couldn't leave it alone: today I learned that yes, you can bind-mount a directory to itself with options, so for example:
# mkdir foo # cat > foo/x.sh <<EOHD #!/bin/sh echo hi EOHD # chmod 755 foo/x.sh # foo/x.sh hi # mount -o bind foo foo # mount -o remount,bind,noexec foo foo # foo/x.sh bash: foo/x.sh: Permission denied # umount foo # foo/x.sh hi
so it would be possible to make emptyDir work with mountOptions for the default medium case where it mounts a directory instead of tmpfs / hugepages.
I still don't think that's the right solution here... the right answer feels like having mount options for all VolumeMounts.
mount options in volumeMounts was shot down for security reasons. For example, there is some option that can halt the node, error=panic?
@wongma7 yeah, that would be bad. maybe there needs to be some allow list, perhaps this is what was meant in earlier comments about PodSecurityPolicy...
Yep, I don't like the idea of ditching useful options for security reasons. The dangerous options should be disallowed by default, but possible to allow whichever options the cluster admin deems appropriate.
Yep, I don't like the idea of ditching useful options for security reasons. The dangerous options should be disallowed by default, but possible to allow whichever options the cluster admin deems appropriate.
There was significant discussion around this when introducing mount options in kubernetes/community#321 (comment)
The resolution was that mount options would only be supported for PV/PVC volume sources, not inline pod volumes.
@liggitt I may have skimmed too quickly, but the result of the conversation you linked seems to mean that a) "mount options" are really on the PV and would therefore apply equally to any container that mounts them (I can't think of a reasonable real-life counterexample); b) setting options for emptyDir seems to be contradictory to this decision because it is defined inline.
Any thoughts on what the right answer could be for emptyDir then? It sounds like the idea of a list of allowed mount options in the PSP wouldn't fly for the reasons you described; having allowedEmptyDirMountOptions in the PSP seems extremely fine-grained. Would it be insane to have a hard-coded allow list in the plugin itself that's initially limited to a reasonable set?
I don't think it would be better to have Yet Another Boolean Option useMoreSecureMountOptions: true (name negotiable 🙂) on the emptyDir spec, as the next people to come along with even more nuanced understanding of what they want will add useEvenMoreSecureMountOptions rather than extending the allow list.
@liggitt First of all, I don't really understand why inline pod volumes are regarded as something separate from normal PVs. For example, why can't we have an emptyDir PV/PVC. See: #75378.
The potential to crash a node can be a problem only for a subset of Kubernetes clusters working on shared (cloud) infrastructure. That for me is not a sufficient reason to block other use cases.
/area security
Without following all the context around PV/PVC mountOptions, I'm generally +1 on the idea of adding these (noexec,nosuid,nodev) as options on per-container volume mounts, or as options in the securityContext. I think defaulting those to true might be a hard sell with Kubernetes' stance on backwards compatibility.
Those options are probably useful (especially 'exec') but could it be possible to add a 'uid' volume mount parameter. For a better security, I don't want anything to run as root in my container and I also want my root FS to be readonly. I just need a small subdirectory where I dynamically generate a configuration file before launching the daemon. This small volume does not have to be persisted and emptyDir would fit perfectly if the tmpfs directory it provides could belong to a non-root user.
Using bare docker, I can use the '--tmpfs /dir:rw,uid=$UID" option. In kubernetes, I have no way to set the 'uid=' option. So, I cannot combine using a read-only root FS and running as non-root.
A more common case is the need for a writable /tmp. But enabling non-root users to write to /tmp requires a chmod which requires to be executed as root. Same case: Need to start as root and su/sudo to start daemon -> lower security.
Using bare docker, I can use the '--tmpfs /dir:rw,uid=$UID" option. In kubernetes, I have no way to set the 'uid=' option.
It sounds like a use case for the securityContext.fsGroup field (see kubectl explain Pod.spec.securityContext.fsGroup).
A more common case is the need for a writable /tmp. But enabling non-root users to write to /tmp requires a chmod which requires to be executed as root.
Could you elaborate a little more on the last sentence, please? I don't see where you would need chmod:
$ cat test-pod.yaml apiVersion: v1 kind: Pod metadata: generateName: test-pod- spec: containers: - image: ubuntu name: test-container command: [ "/bin/bash" ] args: [ "-c", "id; touch /tmp/test-file; ls -l /tmp" ] securityContext: runAsUser: 1000 readOnlyRootFilesystem: true volumeMounts: - name: tmp-volume mountPath: /tmp restartPolicy: Never volumes: - name: tmp-volume emptyDir: {} $ kubectl.exe create -f test-pod.yaml pod/test-pod-77q5h created $ kubectl.exe logs pod/test-pod-77q5h uid=1000 gid=0(root) groups=0(root) total 0 -rw-r--r-- 1 1000 root 0 Oct 7 11:57 test-file
You're right. I thought emptyDir would set the mount point to 755 permissions, but it is 777. So, no chmod needed.
Hi,
I installed "Velero" and this one tries to exec something on a volume that was by default mounted from an emptyDir but it fails because of permission.
I'm the only one having that, I just ran the commands listed in the how-to.
Note that a year ago I got problems with Jenkins too, I had mount its volumes on the host volume otherwise it was unable to run its custom Jenkins scripts.
I'm with a Kubernetes cluster v1.15 managed by Kops on GCE... That's so strange that your help is welcome :) :)
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.![]()
On a kubeadm v1.18.15 cluster I created a pod with a readonly root filesystem by emptyDir /tmp. I was able to exec into the container, create a script in /tmp, and execute it. It would be great if we could specify mountOptions for emptyDir volumes.
Hi Sorry,
It is a long thread, difficult to follow. Someone asked about the attack vector so I can say that it is unsafe in circumstances where we are running web_services that are working on user-generated content (UGC). Liks say there is an attachment in your mail that I am processing for virus scan, and that executable can then execute in that tmpfs. I want to mount it with no execute permission.
Ash
Just to come with a IMO valid usecase which would easily break with noexec: we always runAsUser different than root and readOnlyRootFilesystem. And we use single-file option for pyinstaller. pyinstaller creates standalone executables for Python programs which contain a stripped down Python environment. The produced binary unpacks the Python environment to /tmp by default.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.![]()
On a kubeadm v1.18.15 cluster I created a pod with a readonly root filesystem by emptyDir /tmp. I was able to exec into the container, create a script in /tmp, and execute it. It would be great if we could specify mountOptions for emptyDir volumes.
This is what I want to prevent in my clusters. We have some public content that runs non-root with readonly rootfs, and we want to use the 3 options mentioned by the OP on the writable emptyDir directory-backed mounts in the pods where that content is.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()
Sweeping dust off this issue.
CIS Benchmark for distribution-independent Linux has the following controls (and those controls propagate to other distribution-dependent benchmarks):
| Id | Title | Description |
|---|---|---|
| 1.1.3 | Ensure nodev option set on /tmp partition | The nodev mount option specifies that the filesystem cannot contain special devices. Rationale: Since the /tmp filesystem is not intended to support devices, set this option to ensure that users cannot attempt to create block or character special devices in /tmp. |
| 1.1.4 | Ensure nosuid option set on /tmp partition | The nosuid mount option specifies that the filesystem cannot contain setuid files. Rationale: Since the /tmp filesystem is only intended for temporary file storage, set this option to ensure that users cannot create setuid files in /tmp. |
| 1.1.5 | Ensure noexec option set on /tmp partition | The noexec mount option specifies that the filesystem cannot contain executable binaries. Rationale: Since the /tmp filesystem is only intended for temporary file storage, set this option to ensure that users cannot run executable binaries from /tmp. |
Although the aforementioned controls were created with a generic Linux OS in mind and and covers the /tmp partition specifically it is applicable to containers as well.
In the sense that in the ideal scenario one would like to ensure only one executable is being executed inside the contanier and that executable comes from the image's Entrypoint or Cmd (command or args in Kubernetes container spec respectively) and there are no places left to download and execute code apart from those pre-defined in the Pod spec. Unfortunately you cannot mandate all mounted volumes to have noexec flag. Example usecases:
nosuid,noexec,nodev options is completely valid and will work flawlessly.--tmpfs mounts where I had to remove the noexec flag to resolve the issue.We could reconsider the decision to disallow mountOptions for inline volumes and re-use the github.com/moby/sys logic to validate mount options set up on emptyDir volumes individually to address all emptyDir use cases and preserve the current behavior to maintain backwards compatibility for the meantime.
I have read through the kubernetes/community#321 and as far as I understand the decision to not allow users define mountOptions for inline volumes comes from the risk of node crash as suggested by @wongma7.
Polluting securityContext with Yet Another Boolean Option is not an option as suggested by @glb as it will introduce yet another special case and we are already in the position where inline volumes are treated differently from inline volumes otherwise this issue and discussion would probably be non-existent.
Docker and Nerdctl, however, allow to pass mount options to tmpfs mounts (which can be considered as an equivalent to emptyDir in particular and inline volumes in general). Docker does it through maintaining a list of valid mount options and does not accept silly things like errors=panic thus protecting the host.
Example:
❯ docker run --rm --tmpfs /lol:errors=panic busybox docker: Error response from daemon: Invalid tmpfs option ["errors" "panic"].
See
and
and nerdctl that re-uses the github.com/moby/sys logic and also resorts to nosuid,noexec,nodev default mount options for tmpfs:
Apart from docker and nerdctl I have not performed prior art analysis for other container runtimes and clients but I am curious and will check how podman handles this.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()
I also desperately need being able to set mountOptions on emptyDir mounts. I'd need to add the noexec and nosuid mount options. Has there been any progress on this issue lately?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()
Has there been any progress on this issue?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()
It was mentioned in this thread that mount options are only supported on PVs, not on inline pod volumes, because of security concerns with allowing users to set arbitrary mount options. But there are actually two different kinds of mount options in Linux: superblock mount options, and per-mount-point flags (see mount manpage). The per-mount-point flags in current Linux versions are: nosuid, nodev, noexec, noatime, nodiratime, relatime, strictatime (API-only), readonly, nosymfollow. Allowing users to set these flags does not cause security issues. Because these flags can be different for each mount point, it makes more sense conceptually to set these flags on the VolumeMount, which corresponds to a mount point, instead of on the volume itself. In fact, one of these flags can already be set on the VolumeMount: the readonly flag, through the ReadOnly boolean field.
I therefore propose to add a MountOptions: []string field to VolumeMount in the Kubernetes API. This field would only allow the per-mount-point flags. With that, you could specify the noexec and similar options on your emptyDir. Backwards compatibility is preserved because the flags are only applied when explicitly set. A benefit of this solution is that it applies to all types of volumes, not just emptyDir.
Implementing this feature would require adding the same mount options field to Mount in the CRI API. CRI server implementations like containerd would then forward this to the OCI runtime config, where the options field already exists. I currently don't have the capacity to write a fully thought out KEP and do the implementation work, but maybe someone else could pick this up from here.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()
See #119627
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.![]()