unshare and other system calls that affect one process but should affect the application

406 views
Skip to first unread message

ron minnich

unread,
Feb 8, 2015, 3:48:41 PM2/8/15
to golang-dev
The subject is too long, this note is too long, just ignore if you don't care about unshare ...

Go applications consist of many processes. I figure programmers tend to think of them as one process, however, no matter how many processes there are. Most system calls (read, write, ...) have a global effect.

There are some system calls that affect one process state, but which users might expect to affect application state. I've been wondering if there's any way to deal with the situation, even if it's not transparent.

I realize I'm resurrecting a discussion that's been had before (in 2011, yet ...) but I've run into a situation where it'd be nice to fix it by something other than the SysProcAttr approach we're taking now for, e.g., unshare().

It's common in C programs to do the following (shortened)
if (fork()) == 0) { ...unshare(); ...}
or
if (fork() == 0) { ... put me into a container group  ... } // as in pflask, etc.

These work fine because people tend to do them early in the game, in one process, before creating threads. That level of control is possible in C but not in Go IIUC.

what happens if you do the following in Go?
syscall.Unshare(...)
Mount("tmpfs", "/tmp", ...)
Create("/tmp/x", ...)

- a process does the unshare which applies to that process
- the process that does the unshare need not be the one that does the mount.
- the process that creates the file need not be either of the first two
- /tmp is private to one process
- but not to the other processes in our Go application
- and the shell might find /tmp has changed when the application exits, because  the unshare, mount and create happened in different processes in your Go application. and the last two might be in the namespace of the shell. 

I expect the same thing would happen in Plan 9 for the same reasons (though it's not called unshare ...)

As was discussed in #1954 back in 2011, the unshare at least won't go well. In one go app, there are lots of processes, and the unshare is for one process. You get to change a fraction of the state of the Go app, and the next line of code can run in a process you have not changed.

I realize we can now set clone flags etc. in exec.Cmd, but you have to know a priori to set them, and that's not always simple. Take the case of a shell. Most commands want to share the namespace with the shell (e.g. a mount command), but some might not (e.g. an unshare command). As things stand you have to know which commands will want to unshare, and mark the flag in the exec.Cmd. As far as I can tell, you have to special-case the names of those commands in the shell, so you can mark them when you Run the cmd. And it will fail in weird ways if a user writes a command that uses syscall.Unshare(), because the shell won't know that  command name.

Possible ways to address this:
- Give up. Remove syscall.Unshare, because it's really not going to work the way people expect and it will only end badly
- But just removing functions won't do it. Programs create containers via file I/O, after all ... how do I put all the procs of a Go application in one container? 
- in programs that have to do an unshare, is there some point in init where we're still one process?  But that's ugly.
- have a way to say "shrink the Go app to one process until I'm done this block of code." Then change the one process; when it clones() again, the new processes will have the attributes we want. Is that doable? Does it already exist and I just did not know it?
-- something better than all of these ...

If this is a solved problem, sorry for wasting your time :-) and if you can, let me know what the fix is. 

thanks

ron
p.s. I realize that docker and rocket deal with the container question, and I hoped to get some ideas from them. Rocket outsources container setup to systemd in Stage 1. Docker seems to be moving to letting system do container setup as well. That's not an option for me.

minux

unread,
Feb 8, 2015, 4:29:34 PM2/8/15
to ron minnich, golang-dev
On Sun, Feb 8, 2015 at 3:48 PM, ron minnich <rmin...@gmail.com> wrote:
The subject is too long, this note is too long, just ignore if you don't care about unshare ...

Go applications consist of many processes. I figure programmers tend to think of them as one process, however, no matter how many processes there are. Most system calls (read, write, ...) have a global effect.

There are some system calls that affect one process state, but which users might expect to affect application state. I've been wondering if there's any way to deal with the situation, even if it's not transparent.
Also worth considering is the setuid family of syscalls, which also only affects one process (thread).
That is issue #1435.

We probably needs a way to execute some func on all OS threads. 

what happens if you do the following in Go?
syscall.Unshare(...)
Mount("tmpfs", "/tmp", ...)
Create("/tmp/x", ...)

- a process does the unshare which applies to that process
- the process that does the unshare need not be the one that does the mount.
- the process that creates the file need not be either of the first two
- /tmp is private to one process
- but not to the other processes in our Go application
- and the shell might find /tmp has changed when the application exits, because  the unshare, mount and create happened in different processes in your Go application. and the last two might be in the namespace of the shell. 

I expect the same thing would happen in Plan 9 for the same reasons (though it's not called unshare ...)
It's indeed affected plan 9, as chdir only applies to the current proc.
This is issue #9428.

As was discussed in #1954 back in 2011, the unshare at least won't go well. In one go app, there are lots of processes, and the unshare is for one process. You get to change a fraction of the state of the Go app, and the next line of code can run in a process you have not changed.

I realize we can now set clone flags etc. in exec.Cmd, but you have to know a priori to set them, and that's not always simple. Take the case of a shell. Most commands want to share the namespace with the shell (e.g. a mount command), but some might not (e.g. an unshare command). As things stand you have to know which commands will want to unshare, and mark the flag in the exec.Cmd. As far as I can tell, you have to special-case the names of those commands in the shell, so you can mark them when you Run the cmd. And it will fail in weird ways if a user writes a command that uses syscall.Unshare(), because the shell won't know that  command name.

Possible ways to address this:
- Give up. Remove syscall.Unshare, because it's really not going to work the way people expect and it will only end badly
- But just removing functions won't do it. Programs create containers via file I/O, after all ... how do I put all the procs of a Go application in one container? 
- in programs that have to do an unshare, is there some point in init where we're still one process?  But that's ugly.
With the current runtime, I don't think there is a time where there is only one process (thread).
Even before the process gets to main.init, there are 4 threads already as I tested on today's tip. 

- have a way to say "shrink the Go app to one process until I'm done this block of code." Then change the one process; when it clones() again, the new processes will have the attributes we want. Is that doable? Does it already exist and I just did not know it?
I don't think that's possible in general.
 
-- something better than all of these ...

What about adding an internal mechanism to run a given func on all OS threads?
That is still not possible in general, consider for example, an OS thread is blocked
on a read syscall.

ron minnich

unread,
Feb 8, 2015, 5:04:49 PM2/8/15
to minux, golang-dev
On Sun Feb 08 2015 at 1:29:31 PM minux <mi...@golang.org> wrote:

What about adding an internal mechanism to run a given func on all OS threads?
That is still not possible in general, consider for example, an OS thread is blocked
on a read syscall.

I thought about that. It won't work correctly for unshare(), and I'm not sure about other things. 

If you run unshare() on each individual process in a Go application, you'll end up with N processes each with a private name space. But what we want is a go application with the same private namespace shared by all processes in the application. 

It's why the only thing I could think of was the 'shrink to one process/modify process/grow again' idea. 

I think we're going to see more of this, as the Linux system call surface continues to grow. And as you point out it's an issue in Plan 9. 

thanks

ron

 

minux

unread,
Feb 8, 2015, 6:08:48 PM2/8/15
to ron minnich, golang-dev
On Sun, Feb 8, 2015 at 5:04 PM, ron minnich <rmin...@gmail.com> wrote:
On Sun Feb 08 2015 at 1:29:31 PM minux <mi...@golang.org> wrote:

What about adding an internal mechanism to run a given func on all OS threads?
That is still not possible in general, consider for example, an OS thread is blocked
on a read syscall.

I thought about that. It won't work correctly for unshare(), and I'm not sure about other things. 

If you run unshare() on each individual process in a Go application, you'll end up with N processes each with a private name space. But what we want is a go application with the same private namespace shared by all processes in the application. 

It's why the only thing I could think of was the 'shrink to one process/modify process/grow again' idea. 
OK, I remembered the trick I suggested to the docker guys regarding this problem.

If you can determine which namespaces to unshare at the start of your Go application,
you can use a cgo trick to do that:

/*
__attribute__((constructor)) void init() {
   // call unshare(2) here, use /proc/self/cmdline to access the cmdline arguments.
}
*/
import "C"

Unless you also link with external libraries that create threads at global constructors,
this should be guaranteed to execute in the single-thread environment.

ron minnich

unread,
Feb 8, 2015, 7:17:20 PM2/8/15
to minux, golang-dev
On Sun Feb 08 2015 at 3:08:39 PM minux <mi...@golang.org> wrote:


/*
__attribute__((constructor)) void init() {
   // call unshare(2) here, use /proc/self/cmdline to access the cmdline arguments.
}
*/
import "C"

Unless you also link with external libraries that create threads at global constructors,
this should be guaranteed to execute in the single-thread environment.



It's a nice trick. IIUC it does require having a C compiler available ... not an option in my case.

It would be pretty nice if we had a way to specify 'code which runs in the single thread init environment' to Go directly, but it sounds ugly and limited use, hence probably not practical. 

At this point I realize my circumstances are unique enough that I'll go with my workaround, namely, recognizing 'pflask' in the shell  (I ported it to Go) as another special case which requires some extra special treatment :-)

Thanks

ron 

minux

unread,
Feb 8, 2015, 7:48:48 PM2/8/15
to ron minnich, golang-dev
On Sun, Feb 8, 2015 at 7:17 PM, ron minnich <rmin...@gmail.com> wrote:
On Sun Feb 08 2015 at 3:08:39 PM minux <mi...@golang.org> wrote:
/*
__attribute__((constructor)) void init() {
   // call unshare(2) here, use /proc/self/cmdline to access the cmdline arguments.
}
*/
import "C"

Unless you also link with external libraries that create threads at global constructors,
this should be guaranteed to execute in the single-thread environment.

It's a nice trick. IIUC it does require having a C compiler available ... not an option in my case.
If you precompile the code as .syso file, then it only requires an external linker.

And because it only uses a very simple feature not supported by our cmd/ld,
we can also enhance cmd/ld and runtime to support merging .init_array and
call the function pointer at start up. Shouldn't be too hard. (I'm not sure it could
be merged though.) As long as the .syso file is standalone and only uses direct
syscalls, this is a viable approach.

If you want to write in Go, in theory it's also possible to call a Go func at the very
start of the runtime (just after TLS stack limits are set). But you must make sure
the Go program doesn't allocate or otherwise require the Go runtime.

It would be pretty nice if we had a way to specify 'code which runs in the single thread init environment' to Go directly, but it sounds ugly and limited use, hence probably not practical.
it might be possible. For example, maybe we can delay starting of sysmon and
background GC threads until triggered or main.init has returned so that main.init
is always guaranteed to run in a single threaded environment.

File an issue for this request? I think whether it will be accepted depends on how
big the changes are.


At this point I realize my circumstances are unique enough that I'll go with my workaround, namely, recognizing 'pflask' in the shell  (I ported it to Go) as another special case which requires some extra special treatment :-)
Another workaround is probably to always re-exec the current binary to set the
namespaces, but you will need to design the application this way to pass states
to child (via a pipe with encoding/gob?)

btw, if your pflask Go port publicly available? I used pflask in my environment
extensively, so a pure Go replacement is much appreciated.

Reply all
Reply to author
Forward
0 new messages