Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

0 views
Skip to first unread message

Paul Gevers

unread,
Sep 20, 2021, 4:30:04 PMSep 20
to
Source: mpi4py, gyoto
Control: found -1 mpi4py/3.0.3-10
Control: found -1 gyoto/1.4.4-3
Severity: serious
Tags: sid bookworm
X-Debbugs-CC: debi...@lists.debian.org
User: debi...@lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of mpi4py the autopkgtest of gyoto fails in testing
when that autopkgtest is run with the binary packages of mpi4py from
unstable. It passes when run with only packages from testing. In tabular
form:

pass fail
mpi4py from testing 3.0.3-10
gyoto from testing 1.4.4-3
versioned deps [0] from testing from unstable
all others from testing from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of mpi4py to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[0] You can see what packages were added from the second line of the log
file quoted below. The migration software adds source package from
unstable to the list if they are needed to install packages from
mpi4py/3.0.3-10. I.e. due to versioned dependencies or breaks/conflicts.
[1] https://qa.debian.org/excuses.php?package=mpi4py

https://ci.debian.net/data/autopkgtest/testing/i386/g/gyoto/15316145/log.gz

Reading parameter file:
/tmp/autopkgtest-lxc.8uv_qvhr/downtmp/build.nbg/src/doc/examples/example-startrace.xml
Copyright (c) 2011-2019 Frédéric Vincent, Thibaut Paumard,
Odele Straub and Frédéric Lamy.
GYOTO is distributed under the terms of the GPL v. 3 license.
We request that use of Gyoto in scientific publications be properly
acknowledged. Please cite:
GYOTO: a new general relativistic ray-tracing code,
F. H. Vincent, T. Paumard, E. Gourgoulhon & G. Perrin 2011,
Classical and Quantum Gravity 28, 225011 (2011) [arXiv:1109.4769]

j =
1/32--------------------------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.openmpi noticed that process rank 1 with PID 0 on node
ci-262-d8ad913b exited on signal 9 (Killed).
--------------------------------------------------------------------------

OpenPGP_signature

Thibaut Paumard

unread,
Sep 23, 2021, 8:00:03 AMSep 23
to
Control: reassign -1 src:openmpi
Control: found -1 openmpi/4.1.1-3
Control: tags -1 +unreproducible +help
Control: retitle -1 openmpi breaks gyoto autopkgtest on i386
Control: thanks

Hi,

I cannot reproduce this.

I have set up a testing-i386 chroot (on my amd64 laptop) and installed
openmpi from unstable (I also tried with mpi4py from unstable on top),
as well as their versionned dependencies not in testing:

libopenmpi-dev 4.1.1-5
libopenmpi3 4.1.1-5
openmpi-common 4.1.1-5
python3-mpi4py 3.0.3-10

I tried with gyoto from unstable (1.4.4-4) and from testing (1.4.4-3).

Note that this is apparently different from what the debci
infrastructure does (apparently recompiling instead of taking the binaries).

I then ran the test from this chroot (as sbuild user):
autopkgtest -B --test-name=gyoto-mpi -- null

The test PASSED each time.

I note that the test suite can take a long time to run and that the
final line in the failure log for this test reads:

mpirun.openmpi noticed that process rank 1 with PID 0 on node
ci-262-d8ad913b exited on signal 9 (Killed).

This happens after this test has been running for 1h and 55min.

Could it be that the process is killed because of a timeout in the test
environment?

In the one hand it would feel strange because, on the debci
infrastucture, the error always happens on the same file
(example-startrace), near the beginning of the process (row 1/32). On
the other hand I know this file is one of the longest to process (still
below a couple of minutes on my laptop).

Anyway, there's not much more I can do, except skip this test.

I'm reassigning to openmpi because, on the debci infrastructure, the
same failure occurs with openmpi/unstable also with mpi4py/testing.

Advice welcome.

Regards, Thibaut.


Paul Gevers

unread,
Sep 23, 2021, 9:50:04 AMSep 23
to
Hi Thibaut,

Thanks for investigating.

On 23-09-2021 13:44, Thibaut Paumard wrote:
> Note that this is apparently different from what the debci
> infrastructure does (apparently recompiling instead of taking the binaries).

We don't recompile unless "build-needed" is in the restrictions. We do
run in lxc instead of chroot.

> I then ran the test from this chroot (as sbuild user):
> autopkgtest -B --test-name=gyoto-mpi -- null

You can see from the top of the log that we ran with:
--no-built-binaries '--setup-commands=echo '"'"'gyoto testing/i386'"'"'
> /var/tmp/debci.pkg 2>&1 || true' '--setup-commands=echo
'"'"'Acquire::Retries "10";'"'"' > /etc/apt/apt.conf.d/75retry 2>&1 ||
true' --user debci --apt-upgrade '--add-apt-source=deb
http://incoming.debian.org/debian-buildd buildd-unstable main contrib
non-free' --add-apt-release=unstable
--pin-packages=unstable=src:mpi4py,src:openmpi --output-dir
/tmp/tmp.irEl4X24YS/autopkgtest-incoming/testing/i386/g/gyoto/15316145
gyoto -- lxc --sudo --name ci-262-d8ad913b autopkgtest-testing-i386

> I note that the test suite can take a long time to run and that the
> final line in the failure log for this test reads:
>
> mpirun.openmpi noticed that process rank 1 with PID 0 on node
> ci-262-d8ad913b exited on signal 9 (Killed).
>
> This happens after this test has been running for 1h and 55min.
>
> Could it be that the process is killed because of a timeout in the test
> environment?

The autopktest timeout is at 2:47, so if anything this is a timeout
inside the test.

> Anyway, there's not much more I can do, except skip this test.

Can we get more logging? I can run something in the testbed if it helps
debugging the issue.

> I'm reassigning to openmpi because, on the debci infrastructure, the
> same failure occurs with openmpi/unstable also with mpi4py/testing.

Ack, I was informed out-of-band that the likely culpit was there.

Paul

OpenPGP_signature

Thibaut Paumard

unread,
Sep 24, 2021, 6:00:03 AMSep 24
to
Control: reassign -1 src:openmpi src:gyoto

Hi Paul,

I think I've found a workaround and am getting closer to finding the
cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround.
If you can then check that the test passes fine, I guess we will just
have to let this gyoto migrate together with openmpi.

Gyoto supports two models for running within MPI: one can either specify
how many processes to run with the -np argument of mpirun, and one where
-np is set to 1 and gyoto itself spawns more processes (singleton approach).

The test used the singleton approach. If I now let mpirun spawn itself
the n processes, the test doesn't fail anymore.

The code path is slightly different within gyoto between the two
approaches so there could be a bug in gyoto, but it is puzzling that it
only affect one specific input file, only on one architecture, and only
with this new release of openmpi. And it still depends on the
environment: I don't get the failure if I let autopkgtest run the test
in my chroot, but I get it if I run the same commands manually in the
same chroot.

Best regards, Thibaut.

Paul Gevers

unread,
Sep 24, 2021, 3:50:03 PMSep 24
to
Control: reassign -1 src:openmpi,src:gyoto
Control: found -1 openmpi/4.1.1-3
Control: found -1 gyoto/1.4.4-4
Hi

On 24-09-2021 11:42, Thibaut Paumard wrote:
> Control: reassign -1 src:openmpi src:gyoto

This assigned the package to version src:gyoto ;)

> I think I've found a workaround and am getting closer to finding the
> cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround.
> If you can then check that the test passes fine, I guess we will just
> have to let this gyoto migrate together with openmpi.

Is the workaround inside the binary, or only (needed) in the test suite?
In other words, did openmpi *break* gyoto on i386 in some cases? If yes,
Ideally openmpi is updated with a versioned Breaks on gyoto with the
right unfixed package. The migration software then will schedule the set
and the migration will happen if everything's fine.

> The code path is slightly different within gyoto between the two
> approaches so there could be a bug in gyoto, but it is puzzling that it
> only affect one specific input file, only on one architecture, and only
> with this new release of openmpi. And it still depends on the
> environment: I don't get the failure if I let autopkgtest run the test
> in my chroot, but I get it if I run the same commands manually in the
> same chroot.

Ouch.

Paul

OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages