Entering linux namespaces

2,893 views
Skip to first unread message

Alexander Morozov

unread,
Aug 31, 2015, 1:41:37 PM8/31/15
to golang-dev
Hello.

There is already some support of linux namespaces like `SysProcAttr.Cloneflags` and `SysProcAttr.Uid/GidMappings`. But there is one more use case in using namespace apart from creating new: enter existing namespaces. Now problem is that we can't call `setns` as we want for user and mount namespace, because it requires single-threaded environment. Now we using sort of undefined behavior of golang runtime https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c . It makes writing such code hard and could be broken by runtime changes later. What do you think about adding possibility to execute processes in existing namespaces? It'll require one more field in `syscall.SysProcAttr`, like array of pairs `nsfd`, `nstype` and call `RawSyscall(SYS_NETNS, nsfs, nstype, 0)` in `forkAndExecInChild` after clone.
It'll tremendously simplify writing advanced code with linux namespaces(entering and creating namespace code will be almost equal). Let me know if it could be interesting for golang, I can try to implement it.

Thanks!

minux

unread,
Aug 31, 2015, 2:37:08 PM8/31/15
to Alexander Morozov, Ian Lance Taylor, golang-dev
On Mon, Aug 31, 2015 at 1:41 PM, Alexander Morozov <lk4d...@gmail.com> wrote:
There is already some support of linux namespaces like `SysProcAttr.Cloneflags` and `SysProcAttr.Uid/GidMappings`. But there is one more use case in using namespace apart from creating new: enter existing namespaces. Now problem is that we can't call `setns` as we want for user and mount namespace, because it requires single-threaded environment. Now we using sort of undefined behavior of golang runtime https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c . It makes writing such code hard and could be broken by runtime changes later. What do you think about adding possibility to execute processes in existing namespaces? It'll require one more field in `syscall.SysProcAttr`, like array of pairs `nsfd`, `nstype` and call `RawSyscall(SYS_NETNS, nsfs, nstype, 0)` in `forkAndExecInChild` after clone.
It'll tremendously simplify writing advanced code with linux namespaces(entering and creating namespace code will be almost equal). Let me know if it could be interesting for golang, I can try to implement it.

SGTM. Ian?

mpa...@redhat.com

unread,
Aug 31, 2015, 3:47:11 PM8/31/15
to golang-dev
+1, This makes a lot of sense to have and further enhances golang support for Linux namespaces.

Philip Hofer

unread,
Aug 31, 2015, 3:47:11 PM8/31/15
to golang-dev
Seconded. I've run into this problem as well. (One work-around is to fork/exec a wrapper process that moves itself into the correct namespace before exec-ing the "real" child.)

Alexander Morozov

unread,
Sep 2, 2015, 6:30:22 PM9/2/15
to golang-dev
I did simple prototype: https://github.com/LK4D4/go/tree/setns
You can try it with program:
package main

import (
        "fmt"
        "log"
        "os"
        "os/exec"
        "strconv"
        "syscall"
)

func main() {
        wd, err := os.Getwd()
        if err != nil {
                log.Fatal(wd, err)
        }
        pid, err := strconv.Atoi(os.Args[1])
        if err != nil {
                log.Fatal(err)
        }
        file := fmt.Sprintf("/proc/%d/ns/user", pid)
        f, err := os.Open(file)
        if err != nil {
                log.Fatal(err)
        }
        cmd := exec.Command("sh")
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr
        cmd.Stdin = os.Stdin
        cmd.SysProcAttr = &syscall.SysProcAttr{}
        cmd.SysProcAttr.Namespaces = []syscall.Namespace{
                {
                        FD:   f.Fd(),
                        Type: syscall.CLONE_NEWUSER,
                },
        }
        if err := cmd.Run(); err != nil {
                log.Fatal(err)
        }
}

But first you need to create user namespace with unshare -Ur and get PID of process inside with echo $$. Just call program with that PID as first argument.

Ian Lance Taylor

unread,
Sep 2, 2015, 7:52:00 PM9/2/15
to Alexander Morozov, golang-dev
I don't want to be too much of a pain, but as far as I can see this
can be fully implemented using a helper program. The usual rule for
the syscall package is that we only add something that must be run
between fork and execve. It's not obvious to me why this is true for
setns.

Ian

Mrunal Patel

unread,
Sep 2, 2015, 8:01:18 PM9/2/15
to Ian Lance Taylor, Alexander Morozov, golang-dev
From http://man7.org/linux/man-pages/man2/setns.2.html --
"A multithreaded process may not change user namespace with setns"

There are similar limitations in joining mount namespace via setns as well.

Thanks,
Mrunal


Ian Lance Taylor

unread,
Sep 2, 2015, 8:24:52 PM9/2/15
to Mrunal Patel, Alexander Morozov, golang-dev
Understood, but I don't see how it applies to what I suggested.

What we are talking about here is a setting to call setns after
calling fork but before calling exece. It seems to me that instead of
doing that, I can use execve to run a small helper program that will
call setns and then call execve for the program I really want to
start.

Ian

Alexander Morozov

unread,
Sep 2, 2015, 9:58:25 PM9/2/15
to golang-dev
Sorry, sent my message only to Ian. Duplicate here:
Fair enough, but that won't be golang program. Actually all namespaces features of exec.Cmd can be replaced with unshare utility, but that's sorta inconvenient.
I understand your concerns about bringing more complexity to that func, though.

Aram Hăvărneanu

unread,
Sep 3, 2015, 5:12:59 AM9/3/15
to Ian Lance Taylor, Mrunal Patel, Alexander Morozov, golang-dev
On Thu, Sep 3, 2015 at 2:24 AM, Ian Lance Taylor <ia...@golang.org> wrote:
> I can use execve to run a small helper program that will
> call setns and then call execve for the program I really want to
> start.

But that intermediate program can't be written in Go, which is a pity,
and imposes the complexity on the programmer, which has to maintain
this special C program with its own rules for building outside the
normal Go rules.

--
Aram Hăvărneanu

Ian Lance Taylor

unread,
Sep 3, 2015, 12:14:28 PM9/3/15
to Aram Hăvărneanu, Mrunal Patel, Alexander Morozov, golang-dev
I'm a bit surprised that there isn't a standard system program to do this.

Ian

Alexander Morozov

unread,
Sep 3, 2015, 12:23:07 PM9/3/15
to golang-dev, ara...@mgk.ro, mpa...@redhat.com, lk4d...@gmail.com
There is "almost standard" util-linux which includes unshare and nsenter, with which you can do anything you like about namespaces. Just thought that if we started to add support for namespaces(Cloneflags, Uid/GidMapping) in exec.Cmd, it made sense to finish it to have complete set of features :)

четверг, 3 сентября 2015 г., 9:14:28 UTC-7 пользователь Ian Lance Taylor написал:

Ian Lance Taylor

unread,
Sep 3, 2015, 12:29:39 PM9/3/15
to Alexander Morozov, golang-dev, Aram Hăvărneanu, Mrunal Patel
On Thu, Sep 3, 2015 at 9:23 AM, Alexander Morozov <lk4d...@gmail.com> wrote:
> There is "almost standard" util-linux which includes unshare and nsenter,
> with which you can do anything you like about namespaces. Just thought that
> if we started to add support for namespaces(Cloneflags, Uid/GidMapping) in
> exec.Cmd, it made sense to finish it to have complete set of features :)

Clearly Cloneflags has to happen at fork time, and my understanding
was that Uid/GidMapping has to happen between fork and execve.

Ian

Alexander Morozov

unread,
Sep 3, 2015, 12:52:40 PM9/3/15
to Ian Lance Taylor, golang-dev, Aram Hăvărneanu, Mrunal Patel
If we talking about using additional programs, then exec.Cmd("unshare", "-muinpf", "-R", "command") will execute "command" in set of new namespaces just as with Cloneflags and remap user to "root". Yes, it lacks flexibility(that's one of reasons why I want setns in golang), but it can replace Cloneflags and mappings with some additions.
I think main reason for Cloneflags was single-threadness of forkExecInChild(which is only place in golang apart from C hacks), same as for proposed feature of entering namespace. After-fork-before-exec is right for mappings though(not that it can't be done in c program called by exec.Cmd, which would neglect whole point of having Cloneflags).
So, all three features(cloneflags, mappings, entering) could be done with small auxiliary C program and couldn't be done without it or code in forkExecInChild.

minux

unread,
Sep 3, 2015, 3:03:39 PM9/3/15
to Aram Hăvărneanu, Ian Lance Taylor, Mrunal Patel, Alexander Morozov, golang-dev
Actually they can be written in Go, just use the cgo constructor trick
to enter the namespace before Go program starts.

Then you don't need to manage building C programs too. The drawback
is that you will need C toolchain to build your Go program.

Alexander Morozov

unread,
Sep 3, 2015, 3:05:54 PM9/3/15
to golang-dev, ara...@mgk.ro, ia...@golang.org, mpa...@redhat.com, lk4d...@gmail.com
Yup, that what we do now(link it top-post). Also drawback that this is undocumented feature and it looks really tricky.

четверг, 3 сентября 2015 г., 12:03:39 UTC-7 пользователь minux написал:

minux

unread,
Sep 3, 2015, 3:25:27 PM9/3/15
to Alexander Morozov, golang-dev, Aram Hăvărneanu, Ian Lance Taylor, mpa...@redhat.com
On Thu, Sep 3, 2015 at 3:05 PM, Alexander Morozov <lk4d...@gmail.com> wrote:
Yup, that what we do now(link it top-post). Also drawback that this is undocumented feature and it looks really tricky.
Why it's undocumented and tricky? I think everything it uses is pretty standard.

you can define constructors in C, and thus, in cgo, and these constructions must
run prior to Go runtime starts otherwise any C++ program that relies on std::cout
will break. The only assumption is that when the constructor starts, the program
is still in single threaded mode, but this is easy to guarantee because the only
code that is capable to start new threads this early is other constructors, just
make sure you don't link to any problematic dynamic libraries and you will be
OK.

ron minnich

unread,
Sep 3, 2015, 3:38:47 PM9/3/15
to minux, Alexander Morozov, golang-dev, Aram Hăvărneanu, Ian Lance Taylor, mpa...@redhat.com
https://github.com/rminnich/u-root/blob/master/src/cmds/pflask/pflask.go#L271 shows an example of namespace unsharing without using the (admittedly pretty neat) constructor trick. The program sets the right
clone flags, and then starts another copy of itself with a magic argument. So the program *is* the utility that it needs to unshare itself. I did it this way because I did not want to depend on anything else being there.

Thanks to minux for contributing the last bits to make this fully unshare the namespace.

ron

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Morozov

unread,
Sep 3, 2015, 3:40:41 PM9/3/15
to golang-dev, lk4d...@gmail.com, ara...@mgk.ro, ia...@golang.org, mpa...@redhat.com
I think it is standard only for GCC. GCC is pretty "standard", so it's okay.
You're right, so all three features(creating, write mappings, entering) can be implemented with with aux program or cgo constructor(such golang program probably will contain more C than Go code though). Two(creating, write mappings) of three features can be used with `exec.Cmd` directly.

четверг, 3 сентября 2015 г., 12:25:27 UTC-7 пользователь minux написал:

minux

unread,
Sep 3, 2015, 3:51:34 PM9/3/15
to Alexander Morozov, golang-dev
On Thu, Sep 3, 2015 at 3:40 PM, Alexander Morozov <lk4d...@gmail.com> wrote:
I think it is standard only for GCC. GCC is pretty "standard", so it's okay.
the attribute((constructor)) extension is also supported by clang and icc, so all major
compilers support it. (Because it's standard feature of ELF, so any production C
compiler for Linux has to provide a way to use it.)

Peter Waller

unread,
Sep 3, 2015, 5:20:00 PM9/3/15
to golang-dev
Silly question - which I feel I almost know the answer to - but would love to hear an answer from those more experienced in the domain:

Why not provide a callback mechanism and allow the user to run arbitrary code between fork an exec?

This would enable one to use the x/sys/unix or make Syscall()s yourself, if needs be.

I'd hazard a guess that this amounts to a bazooka-sized footgun, so I could imagine some reluctance to provide such a thing. Maybe there is another reason. But it seems the alternative is that Go keeps falling behind the state of the art.

I guess we might hope that one day soon containers will be "done"; Go can complete SysProcAttr{} forever, safe in the knowledge there will never be any further additions? ;-)

Brad Fitzpatrick

unread,
Sep 3, 2015, 5:21:43 PM9/3/15
to Peter Waller, golang-dev
Because running code between fork and exec in a multi-threaded program is one of the most difficult tasks in the world to do correctly. Almost nothing is safe.


--

Peter Waller

unread,
Sep 3, 2015, 5:38:45 PM9/3/15
to Brad Fitzpatrick, golang-dev
That's fair enough. But is there another way to provide this to those who know what they're doing? Document it well? Call it .UnsafeCallbackIfYouUseThisYouGetToKeepTheBrokenPieces?

How about a way of specifying a slice of syscalls and parameters to execute? I'm grasping at straws here, maybe these solutions are too distasteful.

forkAndExecInChild is written in Go:

https://github.com/golang/go/blob/c788a8e05df9b75c53fac9d588d595329912c38b/src/syscall/exec_linux.go#L56

So why can I not write a short callback that runs in that context that runs a few syscalls?

Is it that given this power, people will do crazy things?

minux

unread,
Sep 3, 2015, 6:01:01 PM9/3/15
to Peter Waller, Brad Fitzpatrick, golang-dev
On Thu, Sep 3, 2015 at 5:38 PM, Peter Waller <pe...@scraperwiki.com> wrote:
That's fair enough. But is there another way to provide this to those who know what they're doing? Document it well? Call it .UnsafeCallbackIfYouUseThisYouGetToKeepTheBrokenPieces?

The problem is that we can't document it without limiting future runtime
enhancements.

The requirement will be evolving together with runtime changes.

Alexander Morozov

unread,
Sep 8, 2015, 1:26:00 AM9/8/15
to golang-dev, mi...@golang.org, lk4d...@gmail.com, ara...@mgk.ro, ia...@golang.org, mpa...@redhat.com
Yeah, but we're discussing here same way for setns, not only unshare.

четверг, 3 сентября 2015 г., 12:38:47 UTC-7 пользователь ron minnich написал:

Alexander Morozov

unread,
Sep 16, 2015, 12:05:27 PM9/16/15
to golang-dev
So, we're like definite about no making such changes? For me it makes sense for sake of consistence of namespace features in golang, all of them possible to implement through aux program or cgo-constructor, but it complicates code a lot :(

Aram Hăvărneanu

unread,
Sep 16, 2015, 12:15:40 PM9/16/15
to Alexander Morozov, golang-dev
I don't understand the resistance on doing this. If it's possible in
straightforward C, it should be possible in straightforward Go
(without cgo), if anything simply for parity.

--
Aram Hăvărneanu

Ian Lance Taylor

unread,
Sep 16, 2015, 12:37:59 PM9/16/15
to Aram Hăvărneanu, Alexander Morozov, golang-dev
On Wed, Sep 16, 2015 at 9:15 AM, Aram Hăvărneanu <ara...@mgk.ro> wrote:
> I don't understand the resistance on doing this. If it's possible in
> straightforward C, it should be possible in straightforward Go
> (without cgo), if anything simply for parity.

I think the resistance is straightforward. The syscall package is
frozen. We only change it when we must. We must change it for
operations that are required to occur between fork and exec. This
operation is not required to occur between fork and exec. Therefore
we should not implement it. (If we have made earlier changes that
were not required to occur between fork and exec, then we received
incorrect information and made a mistake.)

The question here is: is providing a way to call setns between fork
and exec so convenient that we should break the rules that we have
already set?

Ian

Alexander Morozov

unread,
Sep 16, 2015, 12:51:18 PM9/16/15
to golang-dev, ara...@mgk.ro, lk4d...@gmail.com
You're right about question and I think only golang team members can answer it. For me it'll be pretty convenient, because without I'd need ~500 lines of C code in cgo-constructor and bunch of environment variables to pass all that need there, which sorta hurts codebase in terms of readability.
But I'm definitely not the person who decide here :)

среда, 16 сентября 2015 г., 9:37:59 UTC-7 пользователь Ian Lance Taylor написал:

Eric Myhre

unread,
Sep 16, 2015, 2:44:49 PM9/16/15
to golan...@googlegroups.com
On 09/16/2015 11:37 AM, Ian Lance Taylor wrote:
>
> I think the resistance is straightforward. The syscall package is
> frozen. We only change it when we must. We must change it for
> operations that are required to occur between fork and exec. This
> operation is not required to occur between fork and exec. Therefore
> we should not implement it. (If we have made earlier changes that
> were not required to occur between fork and exec, then we received
> incorrect information and made a mistake.)
>
> The question here is: is providing a way to call setns between fork
> and exec so convenient that we should break the rules that we have
> already set?
>
> Ian
>

Let me start out by saying that as a default stance, I highly favor the stability vote.

However, in this case? It really may be still worth considering.

Reading the current stdlib code around fork/exec races, I can palpably feel the head-in-hands frustration of the author. Imagine the same frustration someone would have trying to get filedescriptor closure right for a security project without that. Namespaces are now very much stuck there.

If it is somehow possible to write the desired namespaced process launch features as a library, sans self-execing or cgo, the "no, it's frozen" answer is much bolstered, since the answer to "freeze" situations is typically "make a library". While that's clearly fine for, say, better/different json... reading the current race comments in fork/exec code, it's not clear to me this is possible to do as a library, and others on this list suggesting callbacks from the sticky spot seem to be saying the same, so we have a rock and a hard place for sure. I hear the concern that blithely adding a callback now will stick us with supporting it in the future; following that thought a little further, are there ideas for (possibly far-)future changes that would indeed deprecate such a hack?

(Trying to do this in cgo or self-execing reminds me of darker days when the letters were JNI instead of CGO, and as a personal anecdote, the joy of a language that decided to be better at speaking OS primitives *without* c wrappers is part of why golang has become dear to me.)

/2c

Ian Lance Taylor

unread,
Sep 16, 2015, 4:16:09 PM9/16/15
to Eric Myhre, golan...@googlegroups.com
On Wed, Sep 16, 2015 at 11:24 AM, Eric Myhre <ha...@exultant.us> wrote:
>
> If it is somehow possible to write the desired namespaced process launch features as a library, sans self-execing or cgo, the "no, it's frozen" answer is much bolstered, since the answer to "freeze" situations is typically "make a library". While that's clearly fine for, say, better/different json... reading the current race comments in fork/exec code, it's not clear to me this is possible to do as a library, and others on this list suggesting callbacks from the sticky spot seem to be saying the same, so we have a rock and a hard place for sure. I hear the concern that blithely adding a callback now will stick us with supporting it in the future; following that thought a little further, are there ideas for (possibly far-)future changes that would indeed deprecate such a hack?

A callback is a non-starter because there is practically nothing that
a callback can safely do. It can not allocate memory. It can not
split the stack. It can not enter the scheduler. Approximately the
only thing it can do is call RawSyscall.

Ian

demetri...@gmail.com

unread,
Sep 21, 2015, 1:10:28 AM9/21/15
to golang-dev
In general, things like running between fork and exec are best handled by systems programming languages like C or Rust. Go is a bad fit for this use case as it requires too much runtime support. Rust is the way to go if you want to write safe code in such an environment.
Message has been deleted

builderk...@gmail.com

unread,
Jul 19, 2017, 12:51:34 PM7/19/17
to golang-dev
I'm probably resurrecting a dead thread here, but I'm going to point this out anyway:  The main issue is that LockOSThread() doesn't stop the thread from forking.  Why don't we just fix it it at this level?  If a thread is pinned, don't allow it to fork.  

The primary side effect, is some added complexity at the start of LockOSThread(), to ensure that at least one unconstrained thread exists (i.e. - always fork before locking the last unconstrained thread.)

Then namespace-manipulating programs should work as expected.

Thoughts?  Am I missing something obvious?

Jessica Frazelle

unread,
Jul 19, 2017, 12:55:09 PM7/19/17
to builderk...@gmail.com, golang-dev
You will want to checkout https://github.com/golang/go/issues/20676
which is tracking patches for thread pinning as well as
https://github.com/golang/go/issues/20395 and
https://github.com/golang/go/issues/20458 for pairing LockOSThread and
UnlockOSThread calls. All these patches will make this possible and
clear up any confusion with LockOSThread and UnlockOSThread behavior.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--


Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu
Reply all
Reply to author
Forward
0 new messages