Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1003165: scikit-learn in unstable FTBFS on arm64, armel, armhf, i386, ppc64el and s390x

14 views
Skip to first unread message

Neil Williams

unread,
Jan 5, 2022, 9:20:04 AM1/5/22
to
Source: scikit-learn
Version: 0.23.2-5
Severity: serious
Tags: ftbfs
Justification: Fails to build from source
X-Debbugs-Cc: code...@debian.org

The new version of scikit-learn has not migrated to testing because it
has not built on all required architectures. This is now affecting other
packages as the version of scikit-learn in testing is too old to allow
reverse dependencies to build. e.g. opentsne

https://buildd.debian.org/status/package.php?p=scikit-learn

This error crops up in the in-build tests:

=================================== FAILURES ===================================
[31m[1m________ [doctest] sklearn.ensemble._weight_boosting.AdaBoostRegressor _________[0m
1004 Examples
1005 --------
1006 >>> from sklearn.ensemble import AdaBoostRegressor
1007 >>> from sklearn.datasets import make_regression
1008 >>> X, y = make_regression(n_features=4, n_informative=2,
1009 ... random_state=0, shuffle=False)
1010 >>> regr = AdaBoostRegressor(random_state=0, n_estimators=100)
1011 >>> regr.fit(X, y)
1012 AdaBoostRegressor(n_estimators=100, random_state=0)
1013 >>> regr.predict([[0, 0, 0, 0]])
Expected:
array([4.7972...])
Got:
array([5.74049295])




-- System Information:
Debian Release: bookworm/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.15.0-2-amd64 (SMP w/16 CPU threads)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Paul Gevers

unread,
Jan 22, 2022, 9:10:05 AM1/22/22
to
Source: scikit-learn
Followup-For: Bug #1003165

Hi,

I have uploaded the attached debdiff as NMU to DELAYED/2. Please let
me know if I should cancel the upload.

Paul
scikit-learn_1.0.1-1.1.debdiff

John Paul Adrian Glaubitz

unread,
Feb 16, 2022, 6:00:03 AM2/16/22
to
Hello!

On 2/16/22 11:36, Graham Inggs wrote:
> Is anyone able to help with the bus error on armhf please?

Bus errors are normally easy to spot. Just run the code in question through
GDB and see where it crashes. Then look at the backtrace with the debug
symbols installed.

Usually it's a result of bad pointer arithmetics which should definitely be
fixed as such operations usually violate the C/C++ standards.

I can have quick look.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glau...@debian.org
`. `' Freie Universitaet Berlin - glau...@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

John Paul Adrian Glaubitz

unread,
Feb 16, 2022, 6:10:04 AM2/16/22
to
HellO!

On 2/16/22 11:57, John Paul Adrian Glaubitz wrote:
> On 2/16/22 11:36, Graham Inggs wrote:
>> Is anyone able to help with the bus error on armhf please?
>
> Bus errors are normally easy to spot. Just run the code in question through
> GDB and see where it crashes. Then look at the backtrace with the debug
> symbols installed.
>
> Usually it's a result of bad pointer arithmetics which should definitely be
> fixed as such operations usually violate the C/C++ standards.

So, I have skimmed over the build logs and one of the main issues is the use of
-march flags to enforce a certain baseline [1]:

powerpc64le-linux-gnu-gcc: error: unrecognized command-line option ‘-march=native’; did you mean ‘-mcpu=native’?

This is a policy violation and must be fixed in any case. Blacklisting architectures
is not enough in this case as forcing the baseline of the buildds can lead to code
that won't run on the user's machines.

Adrian

> [1] https://buildd.debian.org/status/fetch.php?pkg=scikit-learn&arch=ppc64el&ver=1.0.2-1&stamp=1644956229&raw=0

Christian Kastner

unread,
Feb 16, 2022, 7:30:02 AM2/16/22
to
Hi,

On 2022-02-16 11:57, John Paul Adrian Glaubitz wrote:
> Hello!
>
> On 2/16/22 11:36, Graham Inggs wrote:
>> Is anyone able to help with the bus error on armhf please?
>
> Bus errors are normally easy to spot. Just run the code in question through
> GDB and see where it crashes. Then look at the backtrace with the debug
> symbols installed.
>
> Usually it's a result of bad pointer arithmetics which should definitely be
> fixed as such operations usually violate the C/C++ standards.
>
> I can have quick look.

one of these errors has been reported in the past, and I already did
some analysis way back then:

https://github.com/scikit-learn/scikit-learn/issues/16443

Check the last comment. The relevant Cython code doesn't look wrong, so
I guess the problem is with the binary result produced during build, as
you point out.

Best,
Christian

Andreas Tille

unread,
Feb 17, 2022, 7:50:04 AM2/17/22
to
Hi,

Am Wed, Feb 16, 2022 at 12:09:23PM +0100 schrieb John Paul Adrian Glaubitz:
>
> So, I have skimmed over the build logs and one of the main issues is the use of
> -march flags to enforce a certain baseline [1]:
>
> powerpc64le-linux-gnu-gcc: error: unrecognized command-line option ‘-march=native’; did you mean ‘-mcpu=native’?
>
> This is a policy violation and must be fixed in any case. Blacklisting architectures
> is not enough in this case as forcing the baseline of the buildds can lead to code
> that won't run on the user's machines.

I confirm this is a problem and the critical string can also be found in
the amd64 build log[2]:

...
running build_clib
customize UnixCCompiler
customize UnixCCompiler using build_clib
CCompilerOpt.cc_test_flags[1013] : testing flags (-march=native)
C compiler: x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC

creating /tmp/tmpk22ehoi2/usr
creating /tmp/tmpk22ehoi2/usr/lib
creating /tmp/tmpk22ehoi2/usr/lib/python3
creating /tmp/tmpk22ehoi2/usr/lib/python3/dist-packages
creating /tmp/tmpk22ehoi2/usr/lib/python3/dist-packages/numpy
creating /tmp/tmpk22ehoi2/usr/lib/python3/dist-packages/numpy/distutils
creating /tmp/tmpk22ehoi2/usr/lib/python3/dist-packages/numpy/distutils/checks
compile options: '-c'
extra options: '-march=native'
CCompilerOpt.cc_test_flags[1013] : testing flags (-O3)
C compiler: x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC
...


I admit I'm not sure at what point / what tool might inject this string and
I'm also not sure whether the option -march=native is really used in the
amd64 case.

Kind regards

Andreas.

> > [1] https://buildd.debian.org/status/fetch.php?pkg=scikit-learn&arch=ppc64el&ver=1.0.2-1&stamp=1644956229&raw=0

[2] https://buildd.debian.org/status/fetch.php?pkg=scikit-learn&arch=amd64&ver=1.0.2-1&stamp=1644952702&raw=0

--
http://fam-tille.de

Andreas Tille

unread,
Jul 20, 2022, 7:30:03 AM7/20/22
to
Hi,

Am Sat, Jul 16, 2022 at 05:58:33PM +0200 schrieb julien...@gmail.com:
> > [1]:
> > https://buildd.debian.org/status/fetch.php?pkg=scikit-learn&arch=armel&ver=1.1.1-1&stamp=1653343638&raw=0
>
> I would open a bug report to upstream pointing to that log, and if the
> log isn't enough to pinpoint the issue, offer: 
> https://dsa.debian.org/doc/guest-account/

I have extremely bad experiences when upstream was running circles to
finally get the guest account and gave up. There was also some
discussion in this bug log[1]

Before we stop progress in Debian for many other architectures since we
cant't solve this on our own or otherwise are burning patience of
upstream I'd alternatively consider droping armel as supported
architecture.

Kind regards
Andreas.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003165#37

--
http://fam-tille.de

Andreas Tille

unread,
Jul 27, 2022, 4:10:04 AM7/27/22
to
Control: tags -1 unreproducible
Control: tags -1 moreinfo
Control: severity -1 important

Hi,

BTW, there is another bug in scikit-learn, but I can't reproduce it and
have set tags accordingly. Could someone else please give it a try?

Kind regards

Andreas.

Am Wed, Jul 20, 2022 at 09:23:28PM +0200 schrieb Andreas Tille:
> Hi Nilesh,
>
> Am Wed, Jul 20, 2022 at 06:21:19PM +0530 schrieb Nilesh Patra:
> > On 7/20/22 4:50 PM, Andreas Tille wrote:
> > > Before we stop progress in Debian for many other architectures since we
> > > cant't solve this on our own or otherwise are burning patience of
> > > upstream I'd alternatively consider droping armel as supported
> > > architecture.
> >
> > I am definitely +1 for this, however scikit-learn is a key package and dropping
> > it from armel would mean dropping several revdeps as well.
> > I am a bit unsure if that is fine or not.
>
> Its not fine at all and I would not be happy about it. However, the other
> side of a key package is, that lots of package have testing removal warnings
> on architectures which are widely used and we have real trouble because of
> this.
>
> Kind regards
>
> Andreas.
>
> --
> http://fam-tille.de
>
>

--
http://fam-tille.de

M. Zhou

unread,
Jul 27, 2022, 12:10:03 PM7/27/22
to
The previous segfault on armel becomes Bus Error on armel and armhf.
I can build it on Power9, but it seems that the test fails on power8 (our buildd).

Andreas Tille

unread,
Jul 27, 2022, 4:40:04 PM7/27/22
to
Am Wed, Jul 27, 2022 at 08:57:09AM -0700 schrieb M. Zhou:
> The previous segfault on armel becomes Bus Error on armel and armhf.

Which does not make it much better as long as no-one is investigating
the issue in a timely manner. So my suggestion to remove arm 32 bit
architectures (at least until the issue is fixed) remains.

> I can build it on Power9, but it seems that the test fails on power8 (our buildd).

Hmmm, can we talk to buildd admins about this?

Kind regards

Andreas.
--
http://fam-tille.de

Graham Inggs

unread,
Jul 28, 2022, 3:20:04 AM7/28/22
to
Hi

On Wed, 27 Jul 2022 at 17:57, M. Zhou <lu...@debian.org> wrote:
> The previous segfault on armel becomes Bus Error on armel and armhf.
> I can build it on Power9, but it seems that the test fails on power8 (our buildd).

In #1003165, one of the arm porters wrote they are happy to look at
the bus errors, but the baseline issue should be fixed first.

> I have skimmed over the build logs and one of the main issues is the use of
> -march flags to enforce a certain baseline [1]:
>
> powerpc64le-linux-gnu-gcc: error: unrecognized command-line option ‘-march=native’; did you mean ‘-mcpu=native’?

This may be the cause of the test failures on power8.

Regards
Graham

Gard Spreemann

unread,
Jul 28, 2022, 4:50:04 AM7/28/22
to
Hi,

> I admit I'm not sure at what point / what tool might inject this
> string and I'm also not sure whether the option -march=native is
> really used in the amd64 case.

From my (very limited!) understanding, this is just setuptools(?) trying
out various compiler options. The actual C compiler invocations look
more à la:

x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.10/sklearn/ensemble/_hist_gradient_boosting/_bitset.o -Lbuild/temp.linux-x86_64-3.10 -o /<<PKGBUILDDIR>>/.pybuild/cpython3_3.10/build/sklearn/ensemble/_hist_gradient_boosting/_bitset.cpython-310-x86_64-linux-gnu.so -fopenmp

Moreover, the build finishes with:

########### EXT COMPILER OPTIMIZATION ###########
Platform :
Architecture: x64
Compiler : gcc

CPU baseline :
Requested : 'min'
Enabled : SSE SSE2 SSE3
Flags : -msse -msse2 -msse3
Extra checks: none

CPU dispatch :
Requested : 'max -xop -fma4'
Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
Generated : none
CCompilerOpt.cache_flush[809] : write cache to path -> /<<PKGBUILDDIR>>/build/temp.linux-x86_64-3.10/ccompiler_opt_cache_ext.py

########### CLIB COMPILER OPTIMIZATION ###########
Platform :
Architecture: x64
Compiler : gcc

CPU baseline :
Requested : 'min'
Enabled : SSE SSE2 SSE3
Flags : -msse -msse2 -msse3
Extra checks: none

CPU dispatch :
Requested : 'max -xop -fma4'
Enabled : SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL
Generated : none


... and similarly on armel. I don't know the internal magic of these
tools at all, but it seems superficially plausible that the march=native
invocations are just instances of the compiler being probed.


-- Gard

signature.asc

Andreas Tille

unread,
Jul 29, 2022, 12:20:03 AM7/29/22
to
Am Thu, Jul 28, 2022 at 08:15:45PM -0700 schrieb M. Zhou:
> I have a long-term power 9 VM (not QEMU) as testbed.
> I'm trying to investigate the issues for release architectures,
> but this package is too slow to build with QEMU (multiple hours).
> (abel.debian.org is also extremely slow for scikit-learn)
> I've not yet given up, but the build speed means I cannot
> address this issue in timely manner.

I just like to repeat my point: If the package is to slow on release
architectures, that we will not manage to fix it, it is in the interest
of our users to not support the problematic architectures in favour of
providing it for the architetures where the package is used in practice.

I have perfectly understood that we will loose several packages on that
architectures and that this is not a good step. But having those
packages not at all is eve worse.

Nilesh Patra

unread,
Aug 13, 2022, 4:50:03 AM8/13/22
to
On 8/13/22 13:34, Andreas Tille wrote:
> The drawback of this solution is that we will not get any warning for
> new *potentially more important* issues since all test failures will be
> ignored now. For me this is outweighted by the advantage that we can
> present upstream a full log of all issues in certain architectures and
> can open according issues. I admit I'm not really enthusiastic that
> upstream will care much about this - but at least we have the logs at
> hand and can do something in case someone wants to invest time into
> this.

Considering long term maintainance this does not seem to be nice especially
keeping in mind the fact that sklearn is a key package.
I think it is OK to do it _for the moment_ to allow the dust to settle a bit,
and rm'ed packages to get to their destination once again
but I'd suggest ``incrementally'' enabling the tests once everything is in place.

I agree that upstream is probably not very enthusiastic about fixing those, but
if we get fixes, we should keep propagating them.

In a nutshell, IMO the sklearn revision that enters bookworm _should_ have tests enabled, without
hacks and the tests that do not pass can be disabled (after all, it does not come from our end)

> I do not plan to close bugs #1003165 and #1008369 but I think it is
> appropriate to reduce its severity to important and thus enable the
> package and its dependencies to migrate to testing (I have not checked
> debci yet).

Sounds good, and thanks for caring for it.

> [1] https://salsa.debian.org/science-team/scikit-learn/-/blob/master/debian/rules#L227


--
Best,
Nilesh

Graham Inggs

unread,
Aug 25, 2022, 7:30:03 AM8/25/22
to
Hi Adrian

On Wed, 16 Feb 2022 at 13:36, John Paul Adrian Glaubitz
<glau...@physik.fu-berlin.de> wrote:
> On 2/16/22 12:33, Christian Kastner wrote:
> >> Bus errors are normally easy to spot. Just run the code in question through
> >> GDB and see where it crashes. Then look at the backtrace with the debug
> >> symbols installed.
> >>
> >> Usually it's a result of bad pointer arithmetics which should definitely be
> >> fixed as such operations usually violate the C/C++ standards.
> >>
> >> I can have quick look.
> >
> > one of these errors has been reported in the past, and I already did
> > some analysis way back then:
> >
> > https://github.com/scikit-learn/scikit-learn/issues/16443
> >
> > Check the last comment. The relevant Cython code doesn't look wrong, so
> > I guess the problem is with the binary result produced during build, as
> > you point out.
>
> I'm happy to look at this issue but first the baseline issue must be fixed
> as this is a Debian Policy violation.

It was pointed out by Gard Spreemann [1], but I notice now that
debian-arm was not in CC:

> it seems superficially plausible that the march=native
> invocations are just instances of the compiler being probed.

I have also had a look and cannot see that '-march=native' is used in
the actual builds on any of the architectures.

It would be very much appreciated if the arm porters could take a look
at this issue, as it still plagues the scikit-learn autopkgtests on
armhf [2], and currently prevents quite a number of packages from
being part of testing. It appears that armel [1] has the same error,
so hopefully one fix could resolve both.

Regards
Graham


[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003165#86
[2] https://ci.debian.net/packages/s/scikit-learn/testing/armhf/
[3] https://ci.debian.net/packages/s/scikit-learn/testing/armel/
0 new messages