desktop sandboxes in the opencontainer initiative?

Affichage de 118 messages sur 18
desktop sandboxes in the opencontainer initiative? Alexander Larsson 20/10/15 03:59
Hi,

I'm working on a project called xdg-app[1], which is a Linux desktop application sandboxing framework. Its not quite a container (I intentionally avoid that word to limit confusion), but it uses the same underlying technologies. That said, I do think some of the things that xdg-app does are interesting in a OCI context, and there are areas where xdg-app could use OCI things. So, yesterday I sat down trying to make runc launch an xdg-app, in order to see what features are missing in the opencontainer spec.

First let me detail how xdg-app differs from typical containers.

xdg-app apps run in the very specific context of a users graphical login session. This leads to certain expectations on the environment. For instance, apps are tied to a specific session rather than being machine-global, and one machine must be able to run multiple independent sessions. There is also an implicit assumption that certain services are accessible (X server, Wayland compositor, pulseaudio daemon, session DBus daemon). Also the lifetime of all processes in the session are tied to it and they will be force killed at session exit.

The lifecycle of an xdg-app is (like any desktop app) much more ad-hoc than that of a container. Apps are started at a whim by the user (or by other apps) and they manage their own lifetime (i.e. they exit when the user interacts with them to tell them to quit), and nothing monitors or manages this (other than the session exit kill).

There are standardized ways to find and start applications, such as .desktop files and DBus activation, but in general application launch boil down to a fork+exec by some other process already in the session. For this to work we need to properly inherit everything about the session when starting. Such as file descriptors, environment, parent/child relationships, audit trails, uids, cgroups, etc.

And finally, everything runs completely unprivileged as the actual user running the session. There is no need for sudo, daemons running as root, setuid binaries, etc. In fact, the whole point is to have the app run with *less* permissions, and anything with elevated privileges is instead a risk that the app can get *more* permissions.

With that in mind, here is a xdg-app converted to a runc "container":

  https://github.com/alexlarsson/xdg-app-oci

If you pull that and run "./run.sh" in an X session it will (the first time) download a Gimp build that I exported from xdg-app and launch it in a xdg-app like environment using runc.

I tried to recreate the regular xdg-app environment in it, but there are a few things that are not the same:

It uses sudo to run runc, which then switch backs to the real uid. This is pretty bad, because you can't expect the user to have sudo rights, or that sudo even works in all app lauch cases. xdg-app works because it uses unprivileged user namespaces, and makes sure to only use operations that are allowed by this. This places some limitations on it, for instance, you can't create device nodes, or mount a fresh sysfs, but those have unprivileged alternatives such as bind mounts from the host fs. As long as the config files use only these alternatives it should be perfectly possible for runc to start a container as non-root.

Even though it runs as root I was not able to use the user namespaces support. Because when I tried to do the same mapping as in xdg-app (user to himself, everything else unmapped, setgroup deny, which is the only valid unprivileged mode) runc complained about root not being mapped.  And when I added a root map, i got this:

  openat(AT_FDCWD, "/proc/6108/uid_map", O_RDWR) = 11
  write(11, "0 1000 1\n1000 1000 1\n\0", 22) = -1 EINVAL (Invalid argument)

Not sure what caused that, but I didn't look into it.

runc looks for an existing directory and writes stats to a global directory (/run/opencontainer/container) which breaks the fact that xdg-apps are not global. Is there a way to have it not do this?

xdg-app mounts / as a tmpfs inside the mount namespace and constructs its root there (linking in /usr as a read-only bind-mount). This means that no other app can access it, and that the kernel will automatically clean it up on app exit. I couldn't figure out any way to do this, with runc, as you can't tell it to create files after things are mounted. It would be nice if config.json would let me list a bunch of directories/symlinks/files and have them created during container setup.

xdg-app sets the PR_SET_NO_NEW_PRIVS flag on the process, which effectively neuters all kinds of setuid priviledge escalations. There is no way to apply this in runc currently, but it would make a lot of sense to add it.

xdg-app has its own minimal pid1 that reaps children and some other stuff. xdg-app can do this easily as it just forks (which runc can't). I think i can achieve the same in runc by mounting a static helper binary into the container, but its kind of a pain.

In case anyone is interested in how xdg-app sets up the container, you can take a look at the xdg-app-helper sources:

  http://cgit.freedesktop.org/xdg-app/xdg-app/tree/lib/xdg-app-helper.c

It is pretty self-contained (links to glibc and libseccomp only) and small (2542 lines of C).

I wonder if there is any interest in supporting these kinds of usecases under the opencontainer umbrella?

[1] References:
  https://wiki.gnome.org/Projects/SandboxedApps
  http://cgit.freedesktop.org/xdg-app/xdg-app/
  http://www.freedesktop.org/software/xdg-app/releases/

Re: desktop sandboxes in the opencontainer initiative? Jessica Frazelle 20/10/15 05:31
This is super cool!





To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.
Re: desktop sandboxes in the opencontainer initiative? Colin Walters 20/10/15 06:19
On Tuesday, 20 October 2015 06:59:16 UTC-4, Alexander Larsson wrote:

xdg-app sets the PR_SET_NO_NEW_PRIVS flag on the process, which effectively neuters all kinds of setuid priviledge escalations. There is no way to apply this in runc currently, but it would make a lot of sense to add it.

Small note: It is currently implied when using seccomp:

$ git describe --tags --always
v0.0.2-6-g33494b1
$ git grep -i prctl.*nonew
libcontainer/seccomp/context.go:        if err := prctl(prSetNoNewPrivileges, 1, 0, 0, 0); err != nil {
$

Re: desktop sandboxes in the opencontainer initiative? Alexander Larsson 20/10/15 06:55


On Tuesday, 20 October 2015 15:19:34 UTC+2, Colin Walters wrote:
On Tuesday, 20 October 2015 06:59:16 UTC-4, Alexander Larsson wrote:

xdg-app sets the PR_SET_NO_NEW_PRIVS flag on the process, which effectively neuters all kinds of setuid priviledge escalations. There is no way to apply this in runc currently, but it would make a lot of sense to add it.

Small note: It is currently implied when using seccomp:


Oh, related to seccomp. How do i actually set the errno return value in something like:

 "seccomp": {
            "syscalls": [
                {
                    "name": "syslog",
                    "action": "SCMP_ACT_ERRNO"
                },
...


Re: desktop sandboxes in the opencontainer initiative? dwa...@redhat.com 20/10/15 07:14
We should look into allowing this to be specified without using seccomp.  I also asked Matt Heon to look into your other question with seccomp.
Re: desktop sandboxes in the opencontainer initiative? d...@walshclan.org 20/10/15 07:15
Have you looked into whether hooks could be used to add features that you need?
Re: desktop sandboxes in the opencontainer initiative? Matthew Heon 20/10/15 07:34
Hi,

At present, the signals for ACT_ERRNO and ACT_TRACE aren't configurable, and default to EPERM. Making them configurable as part of the spec shouldn't be that difficult if there's a demand for it.

Also, to Colin: runc trunk no longer sets No New Privileges by default (changed in the move to a libseccomp-based backend). This could also be made configurable (or just enabled by default, if we're confident it won't break anything).

Thanks,
Matthew Heon
Re: desktop sandboxes in the opencontainer initiative? Justin Cormack 20/10/15 07:40
On 20 October 2015 at 15:34, Matthew Heon <matthe...@gmail.com> wrote:
> Hi,
>
> At present, the signals for ACT_ERRNO and ACT_TRACE aren't configurable, and
> default to EPERM. Making them configurable as part of the spec shouldn't be
> that difficult if there's a demand for it.

This came up a few weeks ago, seems simple enough to add now there is demand.
(There is currently no support for ACT_TRACE at all).

Justin
Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 20/10/15 10:13
On Tue, Oct 20, 2015 at 03:59:16AM -0700, Alexander Larsson wrote:
> xdg-app works because it uses unprivileged user namespaces, and
> makes sure to only use operations that are allowed by this. This
> places some limitations on it, for instance, you can't create device
> nodes, or mount a fresh sysfs, but those have unprivileged
> alternatives such as bind mounts from the host fs.

I've been poking at this as well [1], and you *can* mount a new sysfs
in an unprivileged user's new mount namespace if you also create a new
network namespace where you have CAP_SYS_ADMIN [2,3] and the host
sysfs is still visible:

  $ mkdir /tmp/sys
  $ unshare --user -r --mount mount -n -t sysfs none /tmp/sys
  mount: permission denied
  $ unshare --user -r --mount --net mount -n -t sysfs none /tmp/sys
  …success…

Of course, once you've created a new network namespace, you'll need
someone with CAP_NET_ADMIN to setup a veth connection with the host
network namespace [4], so bind-mounting is probably the best approach.

Cheers,
Trevor

[1]: https://github.com/wking/ccon
[2]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7dc5dbc879bd0779924b5132a48b731a0bc04a1e
[3]: http://thread.gmane.org/gmane.linux.file-systems/77413/focus=77414
[4]: https://github.com/wking/ccon/tree/master/examples/good/net-veth-root

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 20/10/15 10:15
On Tue, Oct 20, 2015 at 03:40:57PM +0100, Justin Cormack wrote:
> On 20 October 2015 at 15:34, Matthew Heon wrote:
> > At present, the signals for ACT_ERRNO and ACT_TRACE aren't
> > configurable, and default to EPERM. Making them configurable as
> > part of the spec shouldn't be that difficult if there's a demand
> > for it.
>
> This came up a few weeks ago, seems simple enough to add now there
> is demand.  (There is currently no support for ACT_TRACE at all).

The earlier thread is [1], if we want to keep discussion of that
aspect focused in one place.

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/rKrj2LcUyA8
     Subject: Seccomp actions with arguments
     Message-ID: <20150923215...@odin.tremily.us>
Re: desktop sandboxes in the opencontainer initiative? Mrunal Patel 20/10/15 14:34
I missed including the list so reposting my response.

Thanks,
Mrunal

Hi, Alexander,
Responses inline.



On Tue, Oct 20, 2015 at 3:59 AM, Alexander Larsson <alexande...@gmail.com> wrote:
Hi,

I'm working on a project called xdg-app[1], which is a Linux desktop application sandboxing framework. Its not quite a container (I intentionally avoid that word to limit confusion), but it uses the same underlying technologies. That said, I do think some of the things that xdg-app does are interesting in a OCI context, and there are areas where xdg-app could use OCI things. So, yesterday I sat down trying to make runc launch an xdg-app, in order to see what features are missing in the opencontainer spec.

First let me detail how xdg-app differs from typical containers.

xdg-app apps run in the very specific context of a users graphical login session. This leads to certain expectations on the environment. For instance, apps are tied to a specific session rather than being machine-global, and one machine must be able to run multiple independent sessions. There is also an implicit assumption that certain services are accessible (X server, Wayland compositor, pulseaudio daemon, session DBus daemon). Also the lifetime of all processes in the session are tied to it and they will be force killed at session exit.

The lifecycle of an xdg-app is (like any desktop app) much more ad-hoc than that of a container. Apps are started at a whim by the user (or by other apps) and they manage their own lifetime (i.e. they exit when the user interacts with them to tell them to quit), and nothing monitors or manages this (other than the session exit kill).

There are standardized ways to find and start applications, such as .desktop files and DBus activation, but in general application launch boil down to a fork+exec by some other process already in the session. For this to work we need to properly inherit everything about the session when starting. Such as file descriptors, environment, parent/child relationships, audit trails, uids, cgroups, etc.

And finally, everything runs completely unprivileged as the actual user running the session. There is no need for sudo, daemons running as root, setuid binaries, etc. In fact, the whole point is to have the app run with *less* permissions, and anything with elevated privileges is instead a risk that the app can get *more* permissions.

With that in mind, here is a xdg-app converted to a runc "container":

  https://github.com/alexlarsson/xdg-app-oci

If you pull that and run "./run.sh" in an X session it will (the first time) download a Gimp build that I exported from xdg-app and launch it in a xdg-app like environment using runc.

I tried to recreate the regular xdg-app environment in it, but there are a few things that are not the same:

It uses sudo to run runc, which then switch backs to the real uid. This is pretty bad, because you can't expect the user to have sudo rights, or that sudo even works in all app lauch cases. xdg-app works because it uses unprivileged user namespaces, and makes sure to only use operations that are allowed by this. This places some limitations on it, for instance, you can't create device nodes, or mount a fresh sysfs, but those have unprivileged alternatives such as bind mounts from the host fs. As long as the config files use only these alternatives it should be perfectly possible for runc to start a container as non-root.

Even though it runs as root I was not able to use the user namespaces support. Because when I tried to do the same mapping as in xdg-app (user to himself, everything else unmapped, setgroup deny, which is the only valid unprivileged mode) runc complained about root not being mapped.  And when I added a root map, i got this:

  openat(AT_FDCWD, "/proc/6108/uid_map", O_RDWR) = 11
  write(11, "0 1000 1\n1000 1000 1\n\0", 22) = -1 EINVAL (Invalid argument)

Not sure what caused that, but I didn't look into it.
I can take a look into it.

runc looks for an existing directory and writes stats to a global directory (/run/opencontainer/container) which breaks the fact that xdg-apps are not global. Is there a way to have it not do this?
 
Hmm. Maybe we could consider make it optional. The idea behind the directory is for other tools such as cadvisor to have one place to look
at all the running containers in the system.
 

xdg-app mounts / as a tmpfs inside the mount namespace and constructs its root there (linking in /usr as a read-only bind-mount). This means that no other app can access it, and that the kernel will automatically clean it up on app exit. I couldn't figure out any way to do this, with runc, as you can't tell it to create files after things are mounted. It would be nice if config.json would let me list a bunch of directories/symlinks/files and have them created during container setup.
I think using the pre-start hook will work for this.

xdg-app sets the PR_SET_NO_NEW_PRIVS flag on the process, which effectively neuters all kinds of setuid priviledge escalations. There is no way to apply this in runc currently, but it would make a lot of sense to add it.
I think we can add this to runc. I believe it should be straightforward. 

xdg-app has its own minimal pid1 that reaps children and some other stuff. xdg-app can do this easily as it just forks (which runc can't). I think i can achieve the same in runc by mounting a static helper binary into the container, but its kind of a pain.
It should be possible to add this to runc as well. Michael, thoughts on having an optional reaper pid 1?
 

In case anyone is interested in how xdg-app sets up the container, you can take a look at the xdg-app-helper sources:

  http://cgit.freedesktop.org/xdg-app/xdg-app/tree/lib/xdg-app-helper.c
Thanks! I will take a look.
 


It is pretty self-contained (links to glibc and libseccomp only) and small (2542 lines of C).

I wonder if there is any interest in supporting these kinds of usecases under the opencontainer umbrella?

[1] References:
  https://wiki.gnome.org/Projects/SandboxedApps
  http://cgit.freedesktop.org/xdg-app/xdg-app/
  http://www.freedesktop.org/software/xdg-app/releases/

To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.


Re: desktop sandboxes in the opencontainer initiative? Mrunal Patel 20/10/15 14:36




  openat(AT_FDCWD, "/proc/6108/uid_map", O_RDWR) = 11
  write(11, "0 1000 1\n1000 1000 1\n\0", 22) = -1 EINVAL (Invalid argument)

Not sure what caused that, but I didn't look into it.
I can take a look into it.

I think writing to uid_map/gid_map is failing because the runtime.json is trying to map 1000 on host to both 0 and 1000 in the container.
Does it make sense to allocate a block of say 32 k starting at 1000? Or you just need 1000 mapped to 0 for this to work?
 
Thanks,
Mrunal

Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 20/10/15 21:26
On Tue, Oct 20, 2015 at 02:34:03PM -0700, Mrunal Patel wrote:
> On Tue, Oct 20, 2015 at 3:59 AM, Alexander Larsson wrote:
> > runc looks for an existing directory and writes stats to a global
> > directory (/run/opencontainer/container) which breaks the fact
> > that xdg-apps are not global. Is there a way to have it not do
> > this?
>
> Hmm. Maybe we could consider make it optional. The idea behind the
> directory is for other tools such as cadvisor to have one place to
> look at all the running containers in the system.

Personally, I'd rather handle this sort of thing (on systems that want
it) with a pre-start hook.  If we need to have it built into the
runtime, I'd recommend at least making it a soft fail, with the
runtime warning and carrying on if there is any problem generating the
global listing.

> > xdg-app has its own minimal pid1 that reaps children and some
> > other stuff. xdg-app can do this easily as it just forks (which
> > runc can't). I think i can achieve the same in runc by mounting a
> > static helper binary into the container, but its kind of a pain.
>
> It should be possible to add this to runc as well. Michael, thoughts
> on having an optional reaper pid 1?

Is the pain “I don't want to pollute the container with this helper
binary” or is it “I don't want to have a static helper binary at all”?
I'd prefer handling this with an argument to config.json's process
that says “actually, lookup this command in the host mount namespace”,
since it seems like a slippery slope that starts with a reaper process
and ends with systemd ;).

I'm not sure what the implementation for the “exec something read from
the host mount namespace” would look like.  Maybe something like:

1. Bind-mount the file into a tmpfs inside the container.
2. Open the container-side file to get a file descriptor.
3. Remove the container-side file to hide it from the rest of the
   container.
4. fexecve the file descriptor.

I've stubbed this out in ccon to show that fexecve works after the
file has been removed [1].

Cheers,
Trevor

[1]: https://github.com/wking/ccon/tree/fexecve
Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 21/10/15 11:42
On Tue, Oct 20, 2015 at 09:24:44PM -0700, W. Trevor King wrote:
> I'd prefer handling this with an argument to config.json's process
> that says “actually, lookup this command in the host mount namespace”,
> since it seems like a slippery slope that starts with a reaper process
> and ends with systemd ;).
> …
> I've stubbed this out in ccon to show that fexecve works after the
> file has been removed [1].

In an off-list mail, Alex pointed out that bind-mounting wasn't
required for the fexecve approach.  I've finished off ccon's
implementation with [1] and pushed that to my master if folks want to
try out the process.host semantics [2].  I've setup my pivot-root
example to use this to launch the host's BusyBox inside a container
that contains no binaries [3].

Cheers,
Trevor

[1]: https://github.com/wking/ccon/commit/58b007ab1d0dfcbe7f9b9abcd28a089eb8897724
[2]: https://github.com/wking/ccon#host
[3]: https://github.com/wking/ccon/tree/master/examples/good/pivot-root
Re: desktop sandboxes in the opencontainer initiative? Alexander Larsson 21/10/15 23:40
On Wed, Oct 21, 2015 at 8:40 PM, W. Trevor King <wk...@tremily.us> wrote:
> On Tue, Oct 20, 2015 at 09:24:44PM -0700, W. Trevor King wrote:
>> I'd prefer handling this with an argument to config.json's process
>> that says “actually, lookup this command in the host mount namespace”,
>> since it seems like a slippery slope that starts with a reaper process
>> and ends with systemd ;).
>> …
>> I've stubbed this out in ccon to show that fexecve works after the
>> file has been removed [1].
>
> In an off-list mail, Alex pointed out that bind-mounting wasn't
> required for the fexecve approach.  I've finished off ccon's
> implementation with [1] and pushed that to my master if folks want to
> try out the process.host semantics [2].  I've setup my pivot-root
> example to use this to launch the host's BusyBox inside a container
> that contains no binaries [3].

You should probably open the executable with the O_PATH flag, as that
lowers the risk of leaking data to the container (i.e. if you somehow
get access to it you can't read it).
Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 22/10/15 09:33
On Thu, Oct 22, 2015 at 08:40:40AM +0200, Alexander Larsson wrote:
> You should probably open the executable with the O_PATH flag, as
> that lowers the risk of leaking data to the container (i.e. if you
> somehow get access to it you can't read it).

Thanks, done with [1].  The file is opened O_CLOEXEC, but O_PATH
(vs. O_RDONLY) will make it easier to audit ccon to make sure it isn't
doing something nefarious with the file itself.

Cheers,
Trevor

[1]: https://github.com/wking/ccon/commit/3b3aff47b55f8ccef5eb72d06a9d1d36caf3792a
Re: desktop sandboxes in the opencontainer initiative? Alexander Larsson 22/10/15 10:19
On Thu, Oct 22, 2015 at 6:59 PM, W. Trevor King <wk...@tremily.us> wrote:
> Alex,
>
> Your mail was off-list, so I'll reply off-list too.  I'm happy to move
> this sub-thread back on-list if you like.

Adding it back.

> On Thu, Oct 22, 2015 at 08:28:49AM +0200, Alexander Larsson wrote:
>> On Wed, Oct 21, 2015 at 4:29 PM, W. Trevor King wrote:
>> > On Wed, Oct 21, 2015 at 08:52:33AM +0200, Alexander Larsson wrote:
>> >> I guess my main issue with this is how useful runc is if you need
>> >> to create a custom statically linked binary to use it. At that
>> >> point you may start to wonder, why don't I just write the entire
>> >> container spawning code myself in this static helper and skip
>> >> runc...
>> >
>> > It doesn't need a meaninful init-process code built in to be
>> > useful.  So far, most people seem to be launching container-side
>> > binaries (static or not) as the main container process.  Your use
>> > case (launch a minimal init, and have other processes join the
>> > container later) is closer to Julz's use case which starts a dummy
>> > process and then execs the main workers into the container [1].
>> > The only difference is that your dummy process executable is
>> > supplied by the host, and not by the bundle.
>>
>> Oh, i don't exec into the container. I spawn the container with
>> clone() which gives me pid1 in the container. The pid1 then fork()s
>> and keeps the parent running a wait() loop while the child execs the
>> real command in the container.
>
> That doesn't seem like the final system will be much different from
> what you'd get execing into the bundle from a pre-start hook (except
> for child-reaping responsibilities when you don't create a new PID
> namespace).  Is there a reason you chose to go that way?

This seems the natural way to set up a container. runc having to exec
pid1 is generally just painful and caused by golang not being able to
fork().

>> A lot of people are not aware that you'll be leaking zombies like
>> crazy if you run a pid1 that doesn't reap children, so I'm not sure
>> your argument is correct. There is a very real risk of people not
>> being aware of this and causing bugs.
>
> Fair enough.  Documenting how this works is important, and linking to
> kernel docs for this would be even better (e.g. [1,2]).  But I expect
> some folks will want more involved init processes, so why not pick an
> approach that supports everyone (by punting ;).  For example, folks
> who don't create a new PID namespace and *do* exec their main worker
> with a pre-start hook won't need a child-reaping container process.

Well, clearly you don't *always* want a pid1 reaper. Another example
is when you want to run an actual init system (like systemd) inside
your container. However, having a reaper loop a single option away
seems very useful, and not particularly taxing.

>> I mean, i'm not talking about an init system that spawns services or
>> whatnot, just:
>>
>> while (1)
>>   wait (&status);
>
> True, and maybe this case is both common enough to deserve special
> treatment and innocuous enough to not hurt folks who aren't creating a
> new PID namespace.
>
>> >> You don't need to do the mounts stuff, just open the binary
>> >> O_PATH before you clone(), then inherit the fd to the container
>> >> and fexecve (or even better execveat) it there.
>> >
>> > How would execveat help here?
>>
>> execveat is a syscall that always works, fexecve is a userspace
>> implementation that uses /proc. Or maybe they made fexecve use
>> execveat now?
>
> It's still using /proc (at least with glibc 2.20).  open(2) motivates
> the *at functions with talk about a changing symlink in the directory
> path.  But it seems like it would race against changes to the
> directory itself.  For example:

> 1. You get a file descriptor for the directory holding your binary.
> 2. Someone else removes that binary.
> 3. You try to execvat the binary, but it no longer exists in the
>    directory.

If you have a fd to the actual executable itself you can run it with:
       execveat(fd, "", argv, envp, AT_EMPTY_PATH)
This gets rid of that race.

dirfd + filename is typically used to avoid races when you're
traversing a directory tree, rather than when you're just looking up a
particular pathname. execveat is less often used like this, but i
could imagine e.g. enumerating and starting all the hooks in a
directory. If you used a dirfd + execveat() you could guarantee that
you would not switch directory midway if someone renamed a parent
directory, or changed a symlink used in the pathname.
Re: desktop sandboxes in the opencontainer initiative? W. Trevor King 22/10/15 11:52
On Thu, Oct 22, 2015 at 07:19:23PM +0200, Alexander Larsson wrote:
> On Thu, Oct 22, 2015 at 6:59 PM, W. Trevor King wrote:
> > 1. You get a file descriptor for the directory holding your
> >    binary.
> > 2. Someone else removes that binary.
> > 3. You try to execvat the binary, but it no longer exists in the
> >    directory.
>
> If you have a fd to the actual executable itself you can run it with:
>        execveat(fd, "", argv, envp, AT_EMPTY_PATH)
> This gets rid of that race.

Love it.  Done with [1].

Thanks,
Trevor

[1]: https://github.com/wking/ccon/commit/27c04f60294d6316a50bb54f25d38d0b5d102efd