Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Linux 6.1.27, cgroup: Instruction fault 4 with systemd

10 views
Skip to first unread message

Frank Scheiner

unread,
May 22, 2023, 6:00:04 AM5/22/23
to
Dear all,

as already outlined on the debian-alpha mailing list ([1]), I get an
instruction fault 4 with Linux 6.1.27 (6.1.0-9 on Debian actually) and
systemd on my DS25:

```
aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408
aboot: switching to OSF/1 PALcode version 1.92
aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfffffc00ffacc000
aboot: starting kernel network with arguments root=/dev/nfs
ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
[ 0.000000] Linux version 6.1.0-9-alpha-smp
(debian...@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU
ld (GNU Binutils for Debian) 2.40) #1 SMP Debian
6.1.27-1 (2023-05-08)
[ 0.000000] Booting GENERIC on Titan variation Granite using machine
vector PRIVATEER from SRM
[ 0.000000] Major Options: SMP MAGIC_SYSRQ
[ 0.000000] Command line: root=/dev/nfs ip=:::::enP2p2s5:dhcp
console=ttyS0,9600n8
[...]
Begin: Running /scripts/nfs-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[ 9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM
+AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL
+ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP
+LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ
+ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT
default-hierarchy=unified)
[ 10.202143] systemd[1]: Detected architecture alpha.

Welcome to Debian GNU/Linux 12 (bookworm)!

[ 11.864251] systemd[1]: Queued start job for default target
graphical.target.
[ 11.958978] CPU 1
[ 11.958978] systemd(1): Instruction fault 4
[ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
= 0000 Not tainted
[ 12.131829] pc is at 0xfffffc0005163bfc
[ 12.177728] ra is at 0xfffffc0005163bf8
[ 12.223626] v0 = 0000000000000000 t0 = 0000000000000023 t1 =
fffffc00066eb800
[ 12.310540] t2 = fffffc000512e680 t3 = 0000000000f00000 t4 =
0000000000000008
[ 12.398431] t5 = 0000000000000001 t6 = 0000000000000000 t7 =
fffffc0005160000
[ 12.486321] a0 = 0000000000000000 a1 = fffffc0005163bc0 a2 =
fffffc0005163bf8
[ 12.573235] a3 = 0000000000000001 a4 = 00000002c8cf86cc a5 =
0000000000000001
[ 12.661126] t8 = 0000000000000080 t9 = 0000000000000001 t10=
fffffc0002891148
[ 12.749016] t11= 0000000000000000 pv = fffffc00011d4a40 at =
5f19e10505e118bf
[ 12.835930] gp = fffffc0002871148 sp = 00000000440a695e
[ 12.899407] Disabling lock debugging due to kernel taint
[ 12.962883] Trace:
[ 12.987298] [<fffffc00011155d8>] cgroup_migrate_execute+0x338/0x600
[ 13.062493] [<fffffc0001115da8>] cgroup_update_dfl_csses+0x2c8/0x330
[ 13.138665] [<fffffc000111867c>] cgroup_subtree_control_write+0x56c/0x5e0
[ 13.219719] [<fffffc000110dc24>] cgroup_file_write+0xa4/0x1a0
[ 13.288079] [<fffffc0001379cd4>] kernfs_fop_write_iter+0x1a4/0x330
[ 13.362297] [<fffffc00012a06c0>] vfs_write+0x250/0x4c0
[ 13.423821] [<fffffc00012a0b1c>] ksys_write+0x8c/0x140
[ 13.485344] [<fffffc000101158c>] entSys+0xac/0xc0
[ 13.541985]
[ 13.559563] Code:
[ 13.559563] fffffc00
[ 13.582024] 00000000
[ 13.610344] 00000000
[ 13.638664] 05163bfc
[ 13.666985] fffffc00
[ 13.695305] 02871148
[ 13.723625] <fffffc00>
[ 13.751946] 00000000
[ 13.779289]
```

[1]: https://lists.debian.org/debian-alpha/2023/05/msg00007.html

Checking with a few alternatives, this already seems to happen with
Linux 6.0.7 and systemd 251.6-1 and 250.4-1.

When using sysvinit, the system comes up OK and runs stable over a few
runs of `7z b` and `openssl speed -elapsed`.

It does also not happen when using Linux 5.3.0-3 from Debian with the
same systemd versions on the same machine.

****

Michael provided a first analysis on [2], Adrian locates it in the
cgroup code.

[2]: https://lists.debian.org/debian-alpha/2023/05/msg00010.html

****

Maybe someone on linux-alpha has an idea what could be the reason?

Cheers,
Frank

John Paul Adrian Glaubitz

unread,
May 22, 2023, 6:00:04 AM5/22/23
to
Hello Frank!

On Mon, 2023-05-22 at 11:34 +0200, Frank Scheiner wrote:
> Maybe someone on linux-alpha has an idea what could be the reason?

Try reproducing it with libcgroup to see if it's a systemd or a kernel bug:

> https://wiki.archlinux.org/title/cgroups#Examples

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

Frank Scheiner

unread,
Jun 19, 2023, 8:20:03 AM6/19/23
to
Hi,

let me add some additional data point(s):

After some testing on different machines and with different kernel types
it looks like this problem is exclusive to MP kernels. This also when
running a MP kernel on a single processor machine actually (tested on an
AlphaServer 800 5/400 w/EV56).

Running an SP kernel does not trigger that problem.

I posted a diff between the -alpha-generic and -alpha-smp kernel
configurations on [1].

[1]: https://pastebin.com/AwZQjHD9

On 22.05.23 11:37, John Paul Adrian Glaubitz wrote:
> Hello Frank!
>
> On Mon, 2023-05-22 at 11:34 +0200, Frank Scheiner wrote:
>> Maybe someone on linux-alpha has an idea what could be the reason?
>
> Try reproducing it with libcgroup to see if it's a systemd or a kernel bug:
>
>> https://wiki.archlinux.org/title/cgroups#Examples

Took me a while to get back to this and actually get it working...

Following misc. examples and manpages (e.g. [2] and [3]) I did the
following to test cgroup functionality with System V init installed and
running instead of systemd:

```
root@ds25:~# uname -a
Linux ds25 6.3.0-1-alpha-smp #1 SMP Debian 6.3.7-1 (2023-06-12) alpha
GNU/Linux

root@ds25:~# mount
[...]
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,mode=755,inode64)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
[...]
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,relatime,rdma)
cgroup on /sys/fs/cgroup/misc type cgroup (rw,relatime,misc)

root@ds25:~# CGROUP=/sys/fs/cgroup

root@ds25:~# mkdir $CGROUP/red
root@ds25:~# mount -t cgroup -o cpuset red $CGROUP/red
root@ds25:~# mkdir -p $CGROUP/red/shells/bash
root@ds25:~# chown root:root $CGROUP/red/shells/bash/*
root@ds25:~# id johndoe
uid=1001(johndoe) gid=1001(johndoe) groups=1001(johndoe),100(users)
root@ds25:~# chown root:johndoe $CGROUP/red/shells/bash/tasks
root@ds25:~# echo $(cgget -n -v -r cpuset.mems /) >
$CGROUP/red/shells/cpuset.mems
root@ds25:~# echo $(cgget -n -v -r cpuset.cpus /) >
$CGROUP/red/shells/cpuset.cpus
root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.mems
root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.cpus

root@ds25:~# cat /proc/self/cgroup
13:misc:/
12:rdma:/
11:pids:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/

root@ds25:~# echo $$
1496

root@ds25:~# cgexec -g cpuset:shells/bash bash

root@ds25:~# echo $$
1695

root@ds25:~# cat /proc/self/cgroup
13:misc:/
[...]
2:cpu:/
1:cpuset:/shells/bash
```

[2]:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch-using_control_groups

[3]: https://wiki.archlinux.org/title/cgroups#Examples

I then ran `7za b` in that shell and though `7za` executes two threads
assuming it has access to both CPUs, `htop` showed both of them running
on the first processor only. So it looks like at least this part of the
cgroup functionality is working with Linux 6.3.0-1 from Debian when
using System V init.

So it could be that this problem is only triggered with one or multiple
specific controller(s). But I don't exactly know how to determine the
used controller(s) for target "graphical.target" - where this seems to
happen according to (see more details on [4]):

```
[...]
[ 11.864251] systemd[1]: Queued start job for default target
graphical.target.
[ 11.958978] CPU 1
[ 11.958978] systemd(1): Instruction fault 4
[...]
```

[4]: https://lists.debian.org/debian-alpha/2023/05/msg00012.html

Cheers,
Frank
0 new messages