Bug#1026793: docker.io: --memory-swap not enforced on systemd cgroup driver

Saj Goonatilleke

unread,

Dec 21, 2022, 3:50:03 AM12/21/22

to

Package: docker.io
Version: 20.10.5+dfsg1-1+deb11u1
Severity: normal

Hello,

https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory

--memory-swap imparts no effect.
--memory does impart an effect, but is useless without --memory-swap.
Anon pages will begin to overflow into swap once a workload approaches
its --memory limit. Instead of a quick OOM and workload restart,
the workload will bring down all system perf with swap thrashing.

Docker is using the systemd cgroup driver.
README.Debian includes a note about swapaccount,
but I don't think this caveat applies to cgroups v2 and linux 5.10.
As shown below, swap accounting does appear to work OK.

--- 8< ---
$ docker info
[...]
Server Version: 20.10.5+dfsg1
[...]
Cgroup Driver: systemd
Cgroup Version: 2
--- >8 ---

Here are the relevant bits of the Docker container configuration:

--- 8< ---
$ docker inspect container | jq '.[0].HostConfig.Memory'
25165824
$ docker inspect container | jq '.[0].HostConfig.MemorySwap'
25165824
--- >8 ---

The configuration is faithfully sent to containerd:

--- 8< ---
# ctr --namespace moby c info container-id | jq .Spec.linux.resources.memory
{
"limit": 25165824,
"swap": 25165824
}
--- >8 ---

The swap limit goes missing somewhere between containerd and systemd:

--- 8< ---
# systemctl show docker-container-id.scope | awk -F = '$1 ~ /Memory.*Max/'
MemoryMax=25165824
MemorySwapMax=infinity
--- >8 ---

From the cgroup:

--- 8< ---
# cat memory.swap.max
max

# cat memory.current memory.swap.current
22798336
218423296
--- >8 ---

I would expect memory.swap.max to read zero (swap - limit),
and likewise for memory.swap.current.

I tried to find the missing puzzle piece,
but there are many pieces in this puzzle.
(Is this a runc problem?)

--- 8< ---
$ docker version
Client:
Version: 20.10.5+dfsg1
API version: 1.41
Go version: go1.15.15
Git commit: 55c4c88
Built: Sat Dec 4 10:53:03 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server:
Engine:
Version: 20.10.5+dfsg1
API version: 1.41 (minimum version 1.12)
Go version: go1.15.15
Git commit: 363e9a8
Built: Sat Dec 4 10:53:03 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.13~ds1
GitCommit: 1.4.13~ds1-1~deb11u2
runc:
Version: 1.0.0~rc93+ds1
GitCommit: 1.0.0~rc93+ds1-5+deb11u2
docker-init:
Version: 0.19.0
GitCommit:
--- >8 ---

-- System Information:
Debian Release: 11.6
APT prefers stable-security
APT policy: (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-14-cloud-amd64 (SMP w/8 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages docker.io depends on:
ii adduser 3.118
ii containerd 1.4.13~ds1-1~deb11u2
ii init-system-helpers 1.60
ii iptables 1.8.7-1
ii libc6 2.31-13+deb11u5
ii libdevmapper1.02.1 2:1.02.175-2.1
ii libsystemd0 247.3-7+deb11u1
ii lsb-base 11.1.0
ii runc 1.0.0~rc93+ds1-5+deb11u2
ii tini 0.19.0-1

Versions of packages docker.io recommends:
pn apparmor <none>
ii ca-certificates 20210119
pn cgroupfs-mount <none>
ii git 1:2.30.2-1
pn needrestart <none>
ii xz-utils 5.2.5-2.1~deb11u1

Versions of packages docker.io suggests:
pn aufs-tools <none>
pn btrfs-progs <none>
pn debootstrap <none>
pn docker-doc <none>
ii e2fsprogs 1.46.2-2
pn rinse <none>
pn rootlesskit <none>
pn xfsprogs <none>
pn zfs-fuse | zfsutils-linux <none>

-- Configuration Files:
/etc/default/docker changed:
DOCKER_OPTS="--bip 172.17.0.1/16 --log-opt max-size=1m --log-opt max-file=2 --live-restore=true --raw-logs --insecure-registry=REDACTED:5000 --insecure-registry=REDACTED:5000"

-- no debconf information

Shengjing Zhu

unread,

Jan 1, 2023, 2:00:03 PM1/1/23

to

Control: reassign runc 1.0.0~rc93+ds1-5+deb11u2

Thanks for the detailed report, something between containerd and
systemd, that is runc. So I'm reassigning the bug.

--
Shengjing Zhu

Shengjing Zhu

unread,

Jan 1, 2023, 2:30:03 PM1/1/23

to

Hi,

On Wed, Dec 21, 2022 at 4:45 PM Saj Goonatilleke <s...@discourse.org> wrote:

> --- 8< ---
> # systemctl show docker-container-id.scope | awk -F = '$1 ~ /Memory.*Max/'
> MemoryMax=25165824
> MemorySwapMax=infinity
> --- >8 ---
>
> From the cgroup:
>
> --- 8< ---
> # cat memory.swap.max
> max
>
> # cat memory.current memory.swap.current
> 22798336
> 218423296
> --- >8 ---
>
> I would expect memory.swap.max to read zero (swap - limit),
> and likewise for memory.swap.current.

I can't reproduce it on unstable, with docker/20.10.21+dfsg1,
containerd/1.6.14~ds1, runc/1.1.4+ds1.

$ docker run --rm --memory 1G --memory-swap 1G -it ubuntu:18.04 bash
root@65cc55e0f5f8:/#

$ systemctl show
docker-65cc55e0f5f82c23bed45a8451c552650d7eebee182db991ecd855454fafaab7.scope
|grep MemorySwapMax=
MemorySwapMax=infinity

$ cat /sys/fs/cgroup/system.slice/docker-65cc55e0f5f82c23bed45a8451c552650d7eebee182db991ecd855454fafaab7.scope/memory.swap.max

0

The systemd property seems wrong, but the cgroup value is right. So I
think it's not a big deal.

I'll try to see if I can reproduce it on bullseye later.

--
Shengjing Zhu

Shengjing Zhu

unread,

Jan 1, 2023, 2:50:04 PM1/1/23

to

Control: tag -1 unreproducible moreinfo

Can't reproduce it on bullseye as well.

debian@cloudimg:~$ sudo docker run --rm -itd --memory 1G --memory-swap
1G debian bash
5ccda9e406f85546afd5ccc61cf277cf0ed8f50ec80472d2641ae26d0c13d21e

debian@cloudimg:~$ systemctl show
docker-5ccda9e406f85546afd5ccc61cf277cf0ed8f50ec80472d2641ae26d0c13d21e.scope
|grep MemorySw
MemorySwapMax=infinity

debian@cloudimg:~$ cat
/sys/fs/cgroup/system.slice/docker-5ccda9e406f85546afd5ccc61cf277cf0ed8f50ec80472d2641ae26d0c13d21e.scope/memory.swap.max

0
debian@cloudimg:~$ cat
/sys/fs/cgroup/system.slice/docker-5ccda9e406f85546afd5ccc61cf277cf0ed8f50ec80472d2641ae26d0c13d21e.scope/memory.max
1073741824

$ sudo docker version

Client:
Version: 20.10.5+dfsg1
API version: 1.41
Go version: go1.15.15
Git commit: 55c4c88

Built: Mon May 30 18:34:49 2022

OS/Arch: linux/amd64
Context: default
Experimental: true

Server:
Engine:
Version: 20.10.5+dfsg1
API version: 1.41 (minimum version 1.12)
Go version: go1.15.15
Git commit: 363e9a8

Built: Mon May 30 18:34:49 2022

OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.13~ds1

GitCommit: 1.4.13~ds1-1~deb11u3

runc:
Version: 1.0.0~rc93+ds1
GitCommit: 1.0.0~rc93+ds1-5+deb11u2
docker-init:
Version: 0.19.0
GitCommit:

debian@cloudimg:~$ uname -a
Linux cloudimg 5.10.0-18-cloud-amd64 #1 SMP Debian 5.10.140-1
(2022-09-02) x86_64 GNU/Linux

--
Shengjing Zhu

Saj Goonatilleke

unread,

Jan 6, 2023, 3:00:04 PM1/6/23

to

Hi Shengjing,

On 2 Jan 2023, at 6:44, Shengjing Zhu wrote:
> Can't reproduce it on bullseye as well.

You are right: something is missing from the original report.
Now I am unable to reproduce the problem on the same machine
that first exhibited the problem.

Either this is a case of user error on my part,
or the problem only strikes when some other variable -- so far unknown --
is also present.

I have reinstated our production experiment using the same config from before.
I surveyed one machine by hand and it looked OK. Data from the others
should trickle in over the coming days. If everything looks still looks OK,
I suppose we'll just put this one down to PEBKAC.

Will make a note to follow up here next week.

Thank you!

Saj Goonatilleke

unread,

Jan 11, 2023, 11:10:04 AM1/11/23

to

Hi Shengjing,

On 7 Jan 2023, at 6:48, Saj Goonatilleke wrote:
> You are right: something is missing from the original report.
> Now I am unable to reproduce the problem on the same machine
> that first exhibited the problem.
>

> I have reinstated our production experiment using the same config from before.
>

> Will make a note to follow up here next week.

I was unable to repro. Our production (bullseye) systems seem fine now.
Please feel free to close this bug.

On a hunch, I also experimented with restart policies,
(thinking that the cgroup limits may disappear on an automatic restart)
however even that seemed to work OK upon an OOM of PID 1.
Very perplexing.

My apologies for the noise.