[lxc] non-root, unpriv, network: "Failed to allocate new network namespace id" is irrelevant?

36 views
Skip to first unread message

Иван Присяжный

unread,
Dec 15, 2021, 7:32:35 AM12/15/21
to lxc-users
Hi all,

I am running unprivileged non-root containers (lxc 4.0.11:latest
head). Everything works well. But I see this message in a log:

   WARN start - start.c:lxc_spawn:1835 - Operation not permitted -
Failed to allocate new network namespace id

This message looks scary but if you look at the code, it actually
tries to create a netlink namespace (NEWNSID):

    ret = lxc_netns_set_nsid(handler->nsfd[LXC_NS_NET]);
    if (ret < 0)
        SYSWARN("Failed to allocate new network namespace id");

That must be equivalent AFAIU to:

    $ ip netns add blah

But this operation probably requires having root permissions to make a
shared mount point, besides having the right permissions for the path:

    mkdir("/var/run/netns", 0755) = -1 EACCES (Permission denied)
    mount("", "/var/run/netns", 0x55f1735bd91f, MS_REC|MS_SHARED,
NULL) = -1 EPERM (Operation not permitted)

For example:

    $ mount --make-shared /tmp/1
    mount: /tmp/1: must be superuser to use mount.

So, it seems to me, that it's impossible to create an ip netns if the
euid is non-root. Am I correct?

If I am correct about this, shall not we patch LXC not to try to call
lxc_netns_set_nsid() if it is running unpriv containers with euid !=
0?

Related issue: https://github.com/lxc/lxc/issues/4045

--
-- Regards,
-- Ivan

Christian Brauner

unread,
Dec 16, 2021, 11:25:23 AM12/16/21
to Иван Присяжный, lxc-users
Allocating a network namespace id requires privileges in the owning user
namespace of the network namespace. All containers that don't drop
CAP_NET_ADMIN in their user namespace will be sufficiently privileged to
allocate a new network namespace id provided they also create a new
network namespace. So privileged (without user namespaces) and
unprivileged (with user namespaces) containers are able to make use of
network namespaces identifiers.

Additionally, it is not required to create any sort of mounts. Network
namespace id allocation is solely done through netlink.
The mounts you're looking at are created by the ip tool to persist
network namespaces. They are unrelated to network namespace ids.

So my bet is that you're dropping CAP_NET_ADMIN or sm.

Иван Присяжный

unread,
Jan 25, 2022, 12:20:23 PMJan 25
to lxc-users, christia...@ubuntu.com, lxc-users, Иван Присяжный


четверг, 16 декабря 2021 г. в 18:25:23 UTC+2, christia...@ubuntu.com:
Probably you are correct about the requirement of having CAP_NET_ADMIN. But I am not dropping it though. Here is what my config looks like:

```
lxc.include = /usr/share/lxc/config/common.conf
lxc.include = /usr/share/lxc/config/userns.conf
lxc.arch = linux64

lxc.uts.name = test

lxc.idmap = u 0 100000 65535
lxc.idmap = g 0 100000 65535

lxc.rootfs.path = dir:/home/public/.local/share/lxc/test/rootfs
lxc.rootfs.options = ro,bind
```

I am running of course an unprivileged LXC container:

```
$ id
uid=1000
$ lxc-start test -l trace -o /dev/stderr
...
lxc-start test 20220125171454.950 INFO     start - start.c:do_start:1107 - Unshared CLONE_NEWNET
lxc-start test 20220125171454.950 NOTICE   utils - utils.c:lxc_drop_groups:1347 - Dropped supplimentary groups
lxc-start test 20220125171454.950 NOTICE   utils - utils.c:lxc_switch_uid_gid:1323 - Switched to gid 0
lxc-start test 20220125171454.950 NOTICE   utils - utils.c:lxc_switch_uid_gid:1332 - Switched to uid 0
lxc-start test 20220125171454.950 TRACE    sync - sync.c:lxc_sync_wake_parent:104 - Child waking parent with sequence configure
lxc-start test 20220125171454.950 TRACE    sync - sync.c:lxc_sync_wait_parent:110 - Child waiting for parent with sequence post-configure
lxc-start test 20220125171454.951 DEBUG    start - start.c:lxc_try_preserve_namespace:139 - Preserved net namespace via fd 7 and stashed path as net:/proc/53644/fd/7
lxc-start test 20220125171454.951 WARN     start - start.c:lxc_spawn:1835 - Operation not permitted - Failed to allocate new network namespace id
...
```

In a root namespace as a common user, I of course don't have any caps.

Arch Linux 5.15.2-arch1-1.

Serge E. Hallyn

unread,
Jan 26, 2022, 12:39:55 AMJan 26
to Иван Присяжный, lxc-users, christia...@ubuntu.com
Yeah - the git log points to https://github.com/lxc/lxd/issues/4831 as
the motivation for this code. I would assume that lxc-info etc doesn't
make use of that stuff anyway, only lxd's "lxc list' does, so you're
not missing out on any perf improvements.

thanks,
-serge

Иван Присяжный

unread,
Jan 26, 2022, 6:01:23 AMJan 26
to lxc-users, Serge E. Hallyn, lxc-users, christia...@ubuntu.com, Иван Присяжный
I am just confused that it tries to allocate a netlink id and fails while having all the necessary caps:

lxc-start test 20220126104217.558 INFO     start - start.c:do_start:1107 - Unshared CLONE_NEWNET
...
lxc-start test 20220126104217.558 WARN     start - start.c:lxc_spawn:1835 - Operation not permitted - Failed to allocate new network namespace id

By having cloned USER and NEWNET and not dropping anything it must have CAP_NET_ADMIN.

Tracing also shows that it exits there:

```
$ sudo perf ftrace -a -v -G rtnetlink_rcv_msg --graph-opts depth=3,nosleep-time,noirqs

 1)               |  rtnetlink_rcv_msg() {
 1)   0.316 us    |    irq_enter_rcu();
 1)   0.098 us    |    idle_cpu();
 1)   0.088 us    |    irqentry_exit_cond_resched();
 1)               |    netlink_net_capable() {
 1)   0.207 us    |      ns_capable();
 1)   0.411 us    |    }
 1)   6.591 us    |  }

static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
                             struct netlink_ext_ack *extack)
{
 ...

        if (kind != 2 && !netlink_net_capable(skb, CAP_NET_ADMIN))
                return -EPERM;
...

sudo perf trace --call-graph=dwarf -a -e probe:rtnetlink_rcv_msg__return
     0.000 :158606/158606 probe:rtnetlink_rcv_msg__return(__probe_func: -1151945952, __probe_ret_ip: -1151464261, arg1: 4294967295) // return == -1
                                       kretprobe_trampoline ([kernel.kallsyms])

```

среда, 26 января 2022 г. в 07:39:55 UTC+2, Serge E. Hallyn:

Иван Присяжный

unread,
Jan 26, 2022, 9:10:31 AMJan 26
to Christian Brauner, lxc-users, Serge E. Hallyn, christia...@ubuntu.com
On Wed, Jan 26, 2022 at 2:27 PM Christian Brauner <bra...@kernel.org> wrote:
>
> On Wed, Jan 26, 2022 at 03:01:23AM -0800, Иван Присяжный wrote:
> > I am just confused that it tries to allocate a netlink id and fails while
> > having all the necessary caps:
> >
> > lxc-start test 20220126104217.558 INFO start - start.c:do_start:1107 -
> > Unshared CLONE_NEWNET
> > ...
> > lxc-start test 20220126104217.558 WARN start - start.c:lxc_spawn:1835 -
> > Operation not permitted - Failed to allocate new network namespace id
> >
> > By having cloned USER and NEWNET and not dropping anything it must have
> > CAP_NET_ADMIN.
>
> Oh, I think I get it. This is one the receive side, right?
> So the receive seems to require you have CAP_NET_ADMIN in the owning
> user namespace of the network namespace the socket has been created in.
> Since the socket is created by the monitor process in the initial
> network namespace which is owned by the initial userns and you're an
> unprivileged user you don't have CAP_NET_ADMIN.

Yes. You are right! I thought that the netlink transaction is
initiated after unsharing network ns, but now looking into the code
once again I see it's in a monitor. So that's the answer. And that
means that this piece of code does not work correctly in unprivileged
LXC containers and confuses users with this WARN.
--
-- Ivan Prisyazhnyy
Reply all
Reply to author
Forward
0 new messages