Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1016937: atop: autopkgtest regression on arm64 and armhf and times out on s390x

2 views
Skip to first unread message

Paul Gevers

unread,
Aug 12, 2022, 9:10:03 AM8/12/22
to
Hi,

[tl;dr: atop seems to hang on s390x]

On 12-08-2022 12:23, Marc Haber wrote:
> On Thu, Aug 11, 2022 at 10:51:32PM +0200, Paul Gevers wrote:
>> On 10-08-2022 12:03, Marc Haber wrote:
>>> Unfortunately, this bug report suffers from multiple cut&paste or
>>> template error. The ci link points to the mercurial page for amd64, the
>>> text alternates between s390s, armhf, arm64 and amd64.
>>
>> There was only one that I'm aware of, the link to mercurial. But I
>> understand it if the text was a bit confusing.
>
> You said autopkgtest fails on amd64, which was never the case. Maybe
> amd64 and arm64 got confused.

What I *wanted* to convey is that arm64 and amd64 *failures* are in our
RC policy and all other *regressions* are RC too. I did mix that up.

>>> I tried the (dead simple)d autopkgtest on the s390s and arm64 porterboxes
>>> and it succeeded in a second's time. I have sharpened the expression
>>> that counts the CPUs in lscpu's output and hope this will fix the issue.
>>
>> ooo, CPU count. Yes, some of those archs run on hosts with lots of CPU's.
>> armhf has 160, s390x has 10.
>
> I am testing locally on amd64 with a machine with 12 CPUs. The armhf
> tests succeed (see
> https://ci.debian.net/data/autopkgtest/testing/armhf/a/atop/24578667/log.gz).

Great, same on arm64. s390x still times out though.

> The complete test is:
> #!/bin/bash
>
> # atop reports number of CPU and two extra lines
> ATOPSOPINION="$(atop -P cpu 5 1 | grep -vE '^(RESET|SEP)' | wc -l)"

When I run `atop` manually (on stable), it doesn't do anything...
root@ci-worker-s390x-01:~# atop
^C

I started up a clean unstable lxc container and installing atop takes
quite some time between:
Created symlink
/etc/systemd/system/timers.target.wants/atop-rotate.timer ->
/lib/systemd/system/atop-rotate.timer.
Created symlink /etc/systemd/system/multi-user.target.wants/atop.service
-> /lib/systemd/system/atop.service.
Created symlink
/etc/systemd/system/multi-user.target.wants/atopacct.service ->
/lib/systemd/system/atopacct.service.
and
Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145.

running atop from unstable also hangs:
root@elbrus:~# atop
^C

> There is no loop, and nothing that could fail on a big number. In my
> understanding, this could run on a box with 2000 cores and still work.

Except, it doesn't. Seems like atop is seriously broken on s390x on the
hosts that we have.

> Also, the test does not time out on zelenka when manually invoked in an
> schroot (setting PATH to point to an executable atop is necessary, as it
> does not seem to be possible to install an abitrary package that is not
> in the archive. Also, the test is successful if invoked after installing
> atop 2.7.1-2 from the archive.

Maybe we need to involve the s390x porters? I put them in CC to already
draw their attention.

Paul
OpenPGP_signature

Paul Gevers

unread,
Aug 13, 2022, 4:20:04 PM8/13/22
to
Hi,

On 13-08-2022 21:34, Marc Haber wrote:
>> running atop from unstable also hangs:
>> root@elbrus:~# atop
>> ^C
>
> on zelenka, running the atop binary just works fine. Installing atop
> 2.7.1-2 in a DD chroot on zelenka also works fine, and the binary is ok
> as well. However, the chroots dont start the services though.

Progress.

Now, instead of killing it, I sent it to the background and when I then
take it to the foreground, it works as expected.

root@ci-worker-s390x-01:~# atop
^Z
[1]+ Stopped atop
root@ci-worker-s390x-01:~# fg
atop
root@ci-worker-s390x-01:~#


Same with your command in the test:
root@ci-worker-s390x-01:~# atop -P cpu 5 1
^Z
[1]+ Stopped atop -P cpu 5 1
root@ci-worker-s390x-01:~# fg
atop -P cpu 5 1
RESET
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 0
12314475 57940088 197207 116525509 1229493 133423 982033 4278583 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 1
13096470 56792358 204646 118023945 1290960 133142 321874 3737087 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 2
12982530 56925413 209005 117993872 1288573 131703 322564 3746751 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 3
13465982 56697100 208873 117747350 1287548 131114 322660 3739777 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 4
13639265 56795653 213211 117476209 1276394 130964 321365 3747339 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 5
13326756 56460169 202500 118173964 1261805 129906 322232 3723116 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 6
12968736 56176871 207863 118788707 1265701 130806 329336 3732416 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 7
13026985 56068710 211225 118856524 1248204 130943 321583 3736213 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 8
14194105 56997563 204065 116748001 1264309 130682 320834 3740854 0 0 100 0 0
cpu ci-worker-s390x-01 1660421134 2022/08/13 16:05:34 1936023 100 9
13285438 56060337 205755 118583081 1279057 130206 323123 3733407 0 0 100 0 0
SEP

Anybody any clue?

Paul
OpenPGP_signature

Marc Haber

unread,
Aug 15, 2022, 3:40:03 PM8/15/22
to
On Sat, Aug 13, 2022 at 10:08:26PM +0200, Paul Gevers wrote:
> On 13-08-2022 21:34, Marc Haber wrote:
> > > running atop from unstable also hangs:
> > > root@elbrus:~# atop
> > > ^C
> >
> > on zelenka, running the atop binary just works fine. Installing atop
> > 2.7.1-2 in a DD chroot on zelenka also works fine, and the binary is ok
> > as well. However, the chroots dont start the services though.
>
> Progress.
>
> Now, instead of killing it, I sent it to the background and when I then take
> it to the foreground, it works as expected.

The problem is that installing the package starts atopacct, which takes
a system-wide semaphore and then stalls. atop tries to take the same
semaphore and stalls as well.

I didn't see that in the beginning because I cannot install the build
package in a dd schroot on zelenka.

I filed this upstream, https://github.com/Atoptool/atop/issues/207

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
0 new messages