Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1006788: bagel: autopkgtest failure with new mpich.

0 views
Skip to first unread message

peter green

unread,
Mar 4, 2022, 6:40:02 PM3/4/22
to
Package: bagel
Version: 1.2.2-3
Severity: serious
x-debbugs-cc: mp...@packages.debian.org

bagel's autopkgtest is failing on amd64 with mpich 4.0.1-1 and hence
blocking it's migration to testing and hence blocking the finalisation
of the slurm-wlm transition.

https://ci.debian.net/data/autopkgtest/testing/amd64/b/bagel/19726694/log.gz

> running test case 'he3_svp_asd-dmrg'... FAILED.

Michael Banck

unread,
Apr 3, 2022, 8:40:02 AM4/3/22
to
For the record, the error in the .out file is:

| ERROR: EXCEPTION RAISED: dsyev/pdsyevd failed in Matrix


Michael

Debian Bug Tracking System

unread,
Aug 17, 2022, 4:30:03 PM8/17/22
to
Processing control commands:

> severity -1 serious
Bug #1006788 [bagel] bagel: autopkgtest failure with new mpich.
Severity set to 'serious' from 'important'
> retitle -1 autopkgtest fails on hosts with lots of RAM/cores
Bug #1006788 [bagel] bagel: autopkgtest failure with new mpich.
Changed Bug title to 'autopkgtest fails on hosts with lots of RAM/cores' from 'bagel: autopkgtest failure with new mpich.'.

--
1006788: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006788
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

Graham Inggs

unread,
Sep 14, 2022, 8:40:03 AM9/14/22
to
It's probably worth trying to set
BAGEL_NUM_THREADS=4
or similar in the autopkgtest.

From: https://nubakery.org/quickstart/how_to_run_bagel.html

Michael Banck

unread,
Nov 27, 2022, 5:00:02 AM11/27/22
to
Hi,

On Wed, Aug 17, 2022 at 10:25:38PM +0200, Paul Gevers wrote:
> Control: severity -1 serious
> Control: retitle -1 autopkgtest fails on hosts with lots of RAM/cores
>
> Hi,
>
> On Sun, 3 Apr 2022 19:42:42 +0200 Michael Banck <mba...@debian.org> wrote:
> > Hrm, it seems that test case passed now on the latest upload:
> > https://ci.debian.net/data/autopkgtest/unstable/amd64/b/bagel/20573831/log.gz
> >
> > |Get:14 http://deb.debian.org/debian unstable/main amd64 libmpich12 amd64 4.0.1-1 [4,924 kB]
> > [...]
> > |running test case 'he3_svp_asd-dmrg'... PASSED.
> >
> > So I'm a bit at a loss about what's going on here, perhaps that test
> > case really is just flakey.
>
> Yes, this test looks flaky (I came here because it was blocking glibc). The
> good news is however, it seems related to the host that runs the test. I.e.
> the test fails on our beefy amd64 host (ci-worker13) with 64 cores and 256GB
> RAM, but seems to pass on the others.
>
> The error on s390x is the same by the way (that has 10 cores and 32GB RAM).

I can reproduce this again on my developer (amd64) notebook.

If I downgrade mpich from 4.0.2 to 3.x, it passes fine:

|(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich
|ii libmpich12:amd64 3.4.1-5 amd64 Shared libraries for MPICH
|(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh
|running test case 'he3_svp_asd-dmrg'... PASSED.
|All tests passed
|(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich
|ii libmpich12:amd64 4.0.2-2 amd64 Shared libraries for MPICH
|(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh
|running test case 'he3_svp_asd-dmrg'... FAILED.
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * broadcast 0.00
| * dmrg block 0.00
| >> ** .. 0.17
|
| ===== Starting sweeps =====
|
| o convergence threshold: 1.0000e-08
| iter state sweep average sweep range dE average
| ERROR: EXCEPTION RAISED: dsyev/pdsyevd failed in Matrix
|1 tests failed

If I set BAGEL_NUM_THREADS as Graham suggests it also passes, so I'll
upload that now:

|(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ BAGEL_NUM_THREADS=4 ./debian/tests/testsuite.sh
|running test case 'he3_svp_asd-dmrg'... PASSED.
|All tests passed


Michael

Michael Banck

unread,
Nov 27, 2022, 5:00:02 AM11/27/22
to
Control: tag -1 pending

Hello,

Bug #1006788 in bagel reported by you has been fixed in the
Git repository and is awaiting an upload. You can see the commit
message below and you can check the diff of the fix at:

https://salsa.debian.org/debichem-team/bagel/-/commit/0e7042680a40f677b6e645c36bd82e012c695a13

------------------------------------------------------------------------
* debian/tests/testsuite.sh: Limit testsuite to 4 threads (Closes: #1006788).
------------------------------------------------------------------------

(this message was generated automatically)
--
Greetings

https://bugs.debian.org/1006788

Debian Bug Tracking System

unread,
Nov 27, 2022, 5:00:02 AM11/27/22
to
Processing control commands:

> tag -1 pending
Bug #1006788 [bagel] autopkgtest fails on hosts with lots of RAM/cores
Added tag(s) pending.

Debian Bug Tracking System

unread,
Nov 27, 2022, 5:30:04 AM11/27/22
to
Your message dated Sun, 27 Nov 2022 10:19:30 +0000
with message-id <E1ozEl0-...@fasolo.debian.org>
and subject line Bug#1006788: fixed in bagel 1.2.2-5
has caused the Debian Bug report #1006788,
regarding autopkgtest fails on hosts with lots of RAM/cores
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
0 new messages