There are some system calls that affect one process state, but which users might expect to affect application state. I've been wondering if there's any way to deal with the situation, even if it's not transparent.
I realize I'm resurrecting a discussion that's been had before (in 2011, yet ...) but I've run into a situation where it'd be nice to fix it by something other than the SysProcAttr approach we're taking now for, e.g., unshare().
It's common in C programs to do the following (shortened)
if (fork()) == 0) { ...unshare(); ...}
or
if (fork() == 0) { ... put me into a container group ... } // as in pflask, etc.
These work fine because people tend to do them early in the game, in one process, before creating threads. That level of control is possible in C but not in Go IIUC.
what happens if you do the following in Go?
syscall.Unshare(...)
Mount("tmpfs", "/tmp", ...)
Create("/tmp/x", ...)
- a process does the unshare which applies to that process
- the process that does the unshare need not be the one that does the mount.
- the process that creates the file need not be either of the first two
- /tmp is private to one process
- but not to the other processes in our Go application
- and the shell might find /tmp has changed when the application exits, because the unshare, mount and create happened in different processes in your Go application. and the last two might be in the namespace of the shell.
I expect the same thing would happen in Plan 9 for the same reasons (though it's not called unshare ...)
As was discussed in #1954 back in 2011, the unshare at least won't go well. In one go app, there are lots of processes, and the unshare is for one process. You get to change a fraction of the state of the Go app, and the next line of code can run in a process you have not changed.
I realize we can now set clone flags etc. in exec.Cmd, but you have to know a priori to set them, and that's not always simple. Take the case of a shell. Most commands want to share the namespace with the shell (e.g. a mount command), but some might not (e.g. an unshare command). As things stand you have to know which commands will want to unshare, and mark the flag in the exec.Cmd. As far as I can tell, you have to special-case the names of those commands in the shell, so you can mark them when you Run the cmd. And it will fail in weird ways if a user writes a command that uses syscall.Unshare(), because the shell won't know that command name.
Possible ways to address this:
- Give up. Remove syscall.Unshare, because it's really not going to work the way people expect and it will only end badly
- But just removing functions won't do it. Programs create containers via file I/O, after all ... how do I put all the procs of a Go application in one container?
- in programs that have to do an unshare, is there some point in init where we're still one process? But that's ugly.
- have a way to say "shrink the Go app to one process until I'm done this block of code." Then change the one process; when it clones() again, the new processes will have the attributes we want. Is that doable? Does it already exist and I just did not know it?
-- something better than all of these ...
If this is a solved problem, sorry for wasting your time :-) and if you can, let me know what the fix is.
thanks
ron
p.s. I realize that docker and rocket deal with the container question, and I hoped to get some ideas from them. Rocket outsources container setup to systemd in Stage 1. Docker seems to be moving to letting system do container setup as well. That's not an option for me.