[lxc/lxc] a42951: execute: don't exec init, call it

0 views
Skip to first unread message

Christian Brauner

unread,
Jul 1, 2021, 11:14:19 AM7/1/21
to lxc-...@lists.linuxcontainers.org
Branch: refs/heads/stable-4.0
Home: https://github.com/lxc/lxc
Commit: a429519676d5af66843b28776ebef264487f95c7
https://github.com/lxc/lxc/commit/a429519676d5af66843b28776ebef264487f95c7
Author: Tycho Andersen <ty...@tycho.pizza>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/cmd/lxc_init.c
M src/lxc/conf.c
M src/lxc/execute.c
M src/lxc/initutils.c
M src/lxc/initutils.h
M src/lxc/start.h

Log Message:
-----------
execute: don't exec init, call it

Instead of having a statically linked init that we put on the host fs
somewhere via packaging, have to either bind mount in or detect fexecve()
functionality, let's just call it as a library function. This way we don't
have to do any of that.

This also fixes up a bunch of conditions from:

if (quiet)
fprintf(stderr, "log message");

to

if (!quiet)
fprintf(stderr, "log message");

:)

and it drops all the code for fexecve() detection and bind mounting our
init in, since we no longer need any of that.

A couple other thoughts:

* I left the lxc-init binary in since we ship it, so someone could be using
it outside of the internal uses.
* There are lots of unused arguments to lxc-init (including presumably
--quiet, since nobody noticed the above); those may be part of the API
though and so we don't want to drop them.

Signed-off-by: Tycho Andersen <ty...@tycho.pizza>


Commit: 91ee6c8bfe6197f0b1d932fdc5a6d5d15c986ec4
https://github.com/lxc/lxc/commit/91ee6c8bfe6197f0b1d932fdc5a6d5d15c986ec4
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/initutils.c

Log Message:
-----------
initutils: use vfork() in lxc_container_init()

We can let the child finish calling exec before continuing in the
parent.

Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: 0089d71762e3e2401edd869bb54944a719f01df1
https://github.com/lxc/lxc/commit/0089d71762e3e2401edd869bb54944a719f01df1
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/network.c

Log Message:
-----------
network: log network devices while sending

Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: 0a9531960a1c9fe6d582b198f6e8698e76209643
https://github.com/lxc/lxc/commit/0a9531960a1c9fe6d582b198f6e8698e76209643
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/conf.h
M src/lxc/initutils.c

Log Message:
-----------
execute: ensure parent is notified about child exec and close all unneeded fds

lxc_container_init() creates the container payload process as it's child
so lxc_container_init() itself never really exits and thus the parent
isn't notified about the child exec'ing since the sync file descriptor
is never closed. Make sure it's closed to notify the parent about the
child's exec.

In addition we're currently leaking all file descriptors associated with
the handler into the stub init. Make sure that all file descriptors
other than stderr are closed.

Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: c73a232555fe5e033d53529d4a411db72b1e2301
https://github.com/lxc/lxc/commit/c73a232555fe5e033d53529d4a411db72b1e2301
Author: Simon Deziel <simon....@canonical.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/initutils.c

Log Message:
-----------
initutils: close dirfd in error path

Signed-off-by: Simon Deziel <simon....@canonical.com>


Commit: e250f278bb5606933ad2801063b622ad48c7b1b3
https://github.com/lxc/lxc/commit/e250f278bb5606933ad2801063b622ad48c7b1b3
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/conf.c

Log Message:
-----------
conf: improve read-only /sys with read-write /sys/devices/virtual/net

Some tools require /sys/devices/virtual/net to be read-write. At the
same time we want all other parts of /sys to be read-only. To do this we
created a layout where we hade a read-only instance of sysfs mounted on
top of a read-write instance of sysfs:

`-/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
`-/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime
|-/sys/devices/virtual/net sysfs sysfs rw,relatime
| `-/sys/devices/virtual/net sysfs[/devices/virtual/net] sysfs rw,nosuid,nodev,noexec,relatime

This causes issues for systemd services that create a separate mount
namespace as they get confused to what mount options need to be
respected.

Simplify our mounting logic so we end up with a single read-only mount
of sysfs on /sys and a read-write bind-mount of /sys/devices/virtual/net:

├─/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime
│ ├─/sys/devices/virtual/net sysfs[/devices/virtual/net] sysfs rw,nosuid,nodev,noexec,relatime

Link: systemd/systemd#20032
Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: d50378b422f60ee701fea3244d4b1300241bc012
https://github.com/lxc/lxc/commit/d50378b422f60ee701fea3244d4b1300241bc012
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M .gitignore
M src/tests/Makefile.am
A src/tests/sys_mixed.c

Log Message:
-----------
tests: add tests for read-only /sys with read-write /sys/devices/virtual/net

Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: ff4b545f5e3454f03fdd1400a28dc85bb2cfd36d
https://github.com/lxc/lxc/commit/ff4b545f5e3454f03fdd1400a28dc85bb2cfd36d
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/cgroups/cgfsng.c

Log Message:
-----------
cgroups: handle funky cgroup layouts

Old versions of Docker emulate a cgroup namespace by bind-mounting the
container's cgroup over the corresponding controller:

/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,xattr,name=systemd
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,net_cls,net_prio
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,cpu,cpuacct
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,memory
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,devices
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,hugetlb
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,perf_event
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,cpuset
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime master:22 - cgroup cgroup rw,blkio
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime master:23 - cgroup cgroup rw,pids
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7d4424e6_bb13_42f4_a47a_45a4828bf54d.slice/docker-d0b3604b67ac7930dd34ba3a796627e3e4717d12309e90a4afe3f38b6816ac98.scope /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime master:24 - cgroup cgroup rw,freezer

New versions of LXC always stash a file descriptor for the root of the
cgroup mount at /sys/fs/cgroup and then resolve the current cgroup
parsed from /proc/{1,self}/cgroup relative to that file descriptor. This
doesn't work when the caller's cgroup is mouned over the controllers.
Older versions of LXC simply counted such layouts as having no cgroups
available for delegation at all and moved on provided no cgroup limits
were requested. But mainline LXC would fail such layouts. While I would
argue that failing such layouts is the semantically clean approach we
shouldn't regress users so make mainline LXC treat such cgroup layouts
as having no cgroups available for delegation.

Fixes: #3890
Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: 49f1fbec1655c073df0e16d7bec18ca8817f3b04
https://github.com/lxc/lxc/commit/49f1fbec1655c073df0e16d7bec18ca8817f3b04
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/terminal.c

Log Message:
-----------
terminal: ensure newlines are turned into newlines+carriage return for terminal output

Fixes: #3879
Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Commit: 01dd32bf95e45ae142353a08e079cd54726b38ef
https://github.com/lxc/lxc/commit/01dd32bf95e45ae142353a08e079cd54726b38ef
Author: Christian Brauner <christia...@ubuntu.com>
Date: 2021-07-01 (Thu, 01 Jul 2021)

Changed paths:
M src/lxc/cmd/lxc-checkconfig.in

Log Message:
-----------
cmd/lxc-checkconfig: list cgroup namespaces and rename confusing ns_cgroup entry

Link: https://discuss.linuxcontainers.org/t/cgroup-namespace-required-in-lxc-checkconfig-and-config-cgroup-ns
Signed-off-by: Christian Brauner <christia...@ubuntu.com>


Compare: https://github.com/lxc/lxc/compare/7cb9565c7fe6...01dd32bf95e4
Reply all
Reply to author
Forward
0 new messages