Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Really enable -fstack-clash-protection on armhf/armel?

27 views
Skip to first unread message

Matthias Klose

unread,
Nov 23, 2023, 5:20:04 AM11/23/23
to
Hi,

it looks like enabling this flag on armel/armhf is a little bit premature.

Apparently it's not completely supported upstream, and might cause
regressions, according to
https://bugzilla.redhat.com/show_bug.cgi?id=1522678

Is that a feature that the Debian ARM32 porters and the security team
really want to support actively, despite the missing upstream support?

In Ubuntu, people tracked down segfaults due to this change in at least
valgrind and gnutls, maybe more.

Thanks, Matthias

Guillem Jover

unread,
Nov 23, 2023, 7:40:04 PM11/23/23
to
Hi!

On Thu, 2023-11-23 at 10:45:33 +0100, Matthias Klose wrote:
> it looks like enabling this flag on armel/armhf is a little bit premature.
>
> Apparently it's not completely supported upstream, and might cause
> regressions, according to
> https://bugzilla.redhat.com/show_bug.cgi?id=1522678

I note that this bug was closed on 2018-01, so the information therein
might not be the most up-to-date?

> Is that a feature that the Debian ARM32 porters and the security team really
> want to support actively, despite the missing upstream support?

According to https://bugs.debian.org/918914#73 there were no pending
toolchain issues related to this. And I think the security team mostly
deferred to the ports teams.

> In Ubuntu, people tracked down segfaults due to this change in at least
> valgrind and gnutls, maybe more.

If there's some missing support somewhere that might make this a
common thing instead of just affecting a handful of packages that
could simply disable the flags, and the Arm porters consider that
fixing that is not feasible in the short term, I guess it makes
sense to stop emitting the flag for the arm32 arches. In the end
I'd still defer to what the porters prefer, and I can easily revert
that change for arm32 and queue it for a next upload if desired.

Thanks,
Guillem

Emanuele Rocca

unread,
Nov 24, 2023, 1:20:04 AM11/24/23
to
Hello!

On 2023-11-24 01:34, Guillem Jover wrote:
> According to https://bugs.debian.org/918914#73 there were no pending
> toolchain issues related to this.

That is correct. The GCC maintainers at Arm confirm that
stack-clash-protection is supported on 32 bit too.

In case there are any bugs, which is of course possible, please file
them and add debian-arm@ to X-Debbugs-CC.

So far I'm only aware of an issue with plplot, which turned out to be an
actual bug in the software that stack-clash-protection helped uncover:
https://bugs.debian.org/1055228#24

Matthias Klose

unread,
Nov 24, 2023, 5:10:05 AM11/24/23
to
On 24.11.23 07:19, Emanuele Rocca wrote:
> Hello!
>
> On 2023-11-24 01:34, Guillem Jover wrote:
>> According to https://bugs.debian.org/918914#73 there were no pending
>> toolchain issues related to this.
>
> That is correct. The GCC maintainers at Arm confirm that
> stack-clash-protection is supported on 32 bit too.

yes, but it's a different implementation, that apparently breaks a few
more things than on the other architectures where it is enabled.
>
> In case there are any bugs, which is of course possible, please file
> them and add debian-arm@ to X-Debbugs-CC.

No, I will not do that. Sorry, but the task of the porters it NOT to
put this kind of work on the shoulders on others, but to do this
analysis themself. You seem to rely on every other package maintainer
to figure out these issues on their own. Please don't do that.

Debian is the first distro to turn this on on armhf, but didn't do any
checks or test rebuilds before turning this on.

> So far I'm only aware of an issue with plplot, which turned out to be an
> actual bug in the software that stack-clash-protection helped uncover:
> https://bugs.debian.org/1055228#24

I filed now
https://bugs.launchpad.net/ubuntu/+source/libselinux/+bug/2044506
to collect some information what Ubuntu apparently hit.

A major problem will be valgrind stopping to work, causing issues in the
test suites of other packages.

Also after rebuilding libxml2, libarchive, gnutls28, libselinux without
this flag on armhf, issues go away again. I'm not directly working on
these, so can't give more information.

Matthias

Florian Weimer

unread,
Nov 24, 2023, 5:20:04 AM11/24/23
to
* Emanuele Rocca:

> Hello!
>
> On 2023-11-24 01:34, Guillem Jover wrote:
>> According to https://bugs.debian.org/918914#73 there were no pending
>> toolchain issues related to this.
>
> That is correct. The GCC maintainers at Arm confirm that
> stack-clash-protection is supported on 32 bit too.

Jeff Law, the original designer of -fstack-clash-protection,
disagrees:

| So to reiterate, this is precisely the kind of problem we avoid by
| having stack-clash specific prologues on the Red Hat Enterprise
| Linux architectures. We didn't do a 32bit ARM implementation and
| instead rely on the limited protections provided by the Ada
| -fstack-check bits.

<https://bugzilla.redhat.com/show_bug.cgi?id=1522678#c1>

And as far as I can see the code has not changed since then.

It's a bit unfortunate that GCC accepts the -fstack-clash-protection
flag even if target support is not really there.

Note that RISC-V has the same problem, but at least Jeff has mid-term
plans to fix that.

Adrien Nader

unread,
Nov 24, 2023, 6:30:05 AM11/24/23
to
Hi,

Short introduction: I work at Canonical in the Foundations team and made
changes in gnutls which is one of the packages that first
encountered/caused issues which then started blocking various migrations
and changes.

On Fri, Nov 24, 2023, Matthias Klose wrote:
> On 24.11.23 07:19, Emanuele Rocca wrote:
> > Hello!
> >
> > On 2023-11-24 01:34, Guillem Jover wrote:
> > > According to https://bugs.debian.org/918914#73 there were no pending
> > > toolchain issues related to this.
> >
> > That is correct. The GCC maintainers at Arm confirm that
> > stack-clash-protection is supported on 32 bit too.
>
> yes, but it's a different implementation, that apparently breaks a few more
> things than on the other architectures where it is enabled.
> >
> > In case there are any bugs, which is of course possible, please file
> > them and add debian-arm@ to X-Debbugs-CC.
>
> No, I will not do that. Sorry, but the task of the porters it NOT to put
> this kind of work on the shoulders on others, but to do this analysis
> themself. You seem to rely on every other package maintainer to figure out
> these issues on their own. Please don't do that.
>
> Debian is the first distro to turn this on on armhf, but didn't do any
> checks or test rebuilds before turning this on.
>
> > So far I'm only aware of an issue with plplot, which turned out to be an
> > actual bug in the software that stack-clash-protection helped uncover:
> > https://bugs.debian.org/1055228#24
>
> I filed now
> https://bugs.launchpad.net/ubuntu/+source/libselinux/+bug/2044506
> to collect some information what Ubuntu apparently hit.

Thanks. I put some details on
https://code.launchpad.net/~adrien-n/ubuntu/+source/dpkg/+git/dpkg/+merge/456181
and I'll expand the information on the bug but I need a couple hours
first. I expected the topic to be shorter somehow (it was late in the
day :) ).

> A major problem will be valgrind stopping to work, causing issues in the
> test suites of other packages.
>
> Also after rebuilding libxml2, libarchive, gnutls28, libselinux without this
> flag on armhf, issues go away again. I'm not directly working on these, so
> can't give more information.

I'm not opposed to investigating the issues but the number of failures
we'll get is still unknown, and their source and whether it would
actually be due to the use of valgrind aren't clear. In any case, the
failure under valgrind is 100% unexploitable. I want to look at that
plplot bug in order to understand how this helped find an actual bug
because what I've seen so far doesn't lend itself to quick analysis.

What I'm not convinced is that packages should be uploaded in that
state. As far as I understand, it's possible to work on the libraries of
a single package at a time and a test rebuild followed by an (emulated)
autopkgtest should be enough; iterating maybe wouldn't be incredibly
fast but still probably much faster than iterating through the archive.
Moreover a local build is probably needed anyway because AFAICT there's
nothing to learn from the current test logs.

I'm not here to tell how to run Debian and it's probably worth noting
that we're still early in the current debian cycle while we're quite far
in the ubuntu cycle for an LTS release (plus holidays season). This
might lead to different solutions but in any case, the change and the
breadth and depth of its consequences were a surprise; this is recent
yet the problematic packages were really quickly piling on.

Reflecting a bit more on this: would the issues raised be always
similar? I mean, if we expect the same kind of issues in most packages
and the same solutions, we should make a guide for maintainers so they
can address this quickly. And if it's likely different every time, we
need to think about the maintainers' time and availability.

--
Adrien

Wookey

unread,
Nov 24, 2023, 7:40:05 PM11/24/23
to
On 2023-11-24 01:34 +0100, Guillem Jover wrote:
> On Thu, 2023-11-23 at 10:45:33 +0100, Matthias Klose wrote:
> > it looks like enabling this flag on armel/armhf is a little bit premature.

> > In Ubuntu, people tracked down segfaults due to this change in at least
> > valgrind and gnutls, maybe more.
>
> If there's some missing support somewhere that might make this a
> common thing instead of just affecting a handful of packages that
> could simply disable the flags, and the Arm porters consider that
> fixing that is not feasible in the short term, I guess it makes
> sense to stop emitting the flag for the arm32 arches.

Assuming this problem only affects some packages they can have their
build flags adjusted in the short term. dpkg-buildflags makes this straightforward.

And we can investigate and fix in the longer term.

So I don't think we need to turn it off for the whole architecture
unless we find loads of stuff that is broken.

Are there any bugs reports on how to reproduce issues?

I just tried building gnutls28 both with and without
fstack-clash-protection.
It is one test better with -fstack-clash-protection enabled: dtls/dtls-resume.sh

-fstack-clash-protection
enabled disabled
TOTAL: 501 501
PASS: 461 460
SKIP: 20 20
XFAIL: 0 0
FAIL: 20 21
XPASS: 0 0
ERROR: 0 0

So that's worthy of investigation, but suggests there is a problem
here which scp isn't making worse.

Some additional info from Richard Earnshaw:
---
Note that for valgrind, I suspect the problem is that it has not been
updated for the following, relatively recent, relaxation in the AAPCS:

6.2.1.3 Stack probing
In order to ensure stack integrity a process may emit stack probes
immediately prior to allocating additional stack space (moving SP from
SP_old to SP_new). Stack probes must be in the region of [SP_new, SP_old
- 1] and may be either read or write operations. The minimum interval
for stack probing is defined by the target platform but must be a
minimum of 4KBytes. No recoverable data can be saved below the currently
allocated stack region.

Prior to this addition (2018Q4) all accesses below SP were forbidden,
and I think that's what valgrind still implements.
---

So that does sound like valgrind needs an update for this, and yes it
would have been better if that wasn't a surprise. My initial feeling
is that we should just fix that, rather than reverting at this stage,
but I understand what Adrien says about the Ubuntu cycle being at a
different point and this being a change that is causing trouble there.

Clearly only doing a rebuild for arm64 and assuming armhf would be
fine because the compiler team here said it would be was overly
optimistic. Sorry about that.

I see that this has already been reverted in Ubuntu dpkg, which seems
like the right thing to do there for the time being. For debian we'll
keep an eye on it, do a belated rebuild to see how much of a problem
we really have, and then decide if we should revert it too until some
stuff if fixed. We should have a better idea of whether to go back or
forward farily soon. I look forward to some more details on what
actually broke (other than valgrind) soon.

Wookey
--
Principal hats: Debian, Wookware, ARM
http://wookware.org/
signature.asc

Moritz Muehlenhoff

unread,
Nov 27, 2023, 12:10:04 PM11/27/23
to
On Fri, Nov 24, 2023 at 01:34:21AM +0100, Guillem Jover wrote:
> > Is that a feature that the Debian ARM32 porters and the security team really
> > want to support actively, despite the missing upstream support?
>
> According to https://bugs.debian.org/918914#73 there were no pending
> toolchain issues related to this. And I think the security team mostly
> deferred to the ports teams.

Indeed. From our PoV anything beyond amd64 is fully at the discretion of the
respective porters to decide whether it makes sense or not.

Cheers,
Moritz

Emanuele Rocca

unread,
Nov 29, 2023, 3:30:05 PM11/29/23
to
Hi Matthias,

On 2023-11-24 10:50, Matthias Klose wrote:
> On 24.11.23 07:19, Emanuele Rocca wrote:
> > In case there are any bugs, which is of course possible, please file
> > them and add debian-arm@ to X-Debbugs-CC.
>
> No, I will not do that. Sorry, but the task of the porters it NOT to put
> this kind of work on the shoulders on others, but to do this analysis
> themself. You seem to rely on every other package maintainer to figure out
> these issues on their own. Please don't do that.

I'm sorry if that is the impression you got. What I was trying to say
is: if you know that something is broken, please let us know because we
are not aware of any issues.

> Debian is the first distro to turn this on on armhf, but didn't do any
> checks or test rebuilds before turning this on.

I have rebuilt and tested a few key packages myself (clearly not
valgrind, heh).

It's true that we should have done a full archive rebuild, you're right.
I've asked for it in August and did not get any reply till October when
the flag was already enabled (and we had no reports of breakage at that point):
https://lists.debian.org/debian-arm/2023/08/msg00024.html

Now I've asked Lucas to go ahead with an armhf rebuild so that hopefully
we'll get a clearer picture.

One day debusine will make all this smooth and easy. :-)
https://wiki.debian.org/DebianEvents/gb/2023/MiniDebConfCambridge/Zini

> I filed now
> https://bugs.launchpad.net/ubuntu/+source/libselinux/+bug/2044506
> to collect some information what Ubuntu apparently hit.
>
> A major problem will be valgrind stopping to work, causing issues in the
> test suites of other packages.

Thank you.

Emanuele Rocca

unread,
Nov 30, 2023, 12:00:06 PM11/30/23
to
Hi,

On 2023-11-24 10:50, Matthias Klose wrote:
> A major problem will be valgrind stopping to work, causing issues in the
> test suites of other packages.
>
> Also after rebuilding libxml2, libarchive, gnutls28, libselinux without this
> flag on armhf, issues go away again.

FTR there is no issue in Debian with any of the above in my tests.
Also the packages don't seem to use valgrind at any point: not when
building, not in the autopkgtests.

Full build logs including autopkgtest output here:
https://people.debian.org/~ema/armhf-stack-clash-protection/

What exactly did not work in Ubuntu and how? Perhaps there are
additional jobs running valgrind in CI that may explain the failures?

Thanks,
Emanuele

Mate Kukri

unread,
Dec 3, 2023, 8:30:04 AM12/3/23
to
Hello,

Another Canonicaler chiming in, I was also involved with debugging this problem in Ubuntu.

I believe the most obvious issues we were having was the gsasl tests indirectly triggered
by gnutls28, and the unrar-free tests triggered by libarchive. Both of which do include
valgrind use.

In addition to flag being obviously incompatible with valgrind, it also caused issues
with gdb for me, and some segmentation faults in gsasl outside any debugging tools
(although I did not investigate these in much detail).

There are claims from upstream about the implementation on 32-bit arm being questionable,
and no other distros seem to ship it. I believe enabling this before more upstream work
to fix these issues would be unwise. Breaking valgrind and gdb is already problematic
enough by itself, let alone any previously unknown issues discovered entering uncharted
waters.

Mate Kukri

Mate Kukri

unread,
Dec 3, 2023, 1:10:05 PM12/3/23
to
I believe the most obvious issues we were having was the gsasl tests indirectly triggered
by gnutls28 the unrar-free tests triggered by libarchive. Both of which do include
valgrind use.

In addition to flag being obviously incompatible with valgrind, it also caused issues
with gdb for me, and some segmentation faults in gsasl outside any debugging tools
(although I did not investigate these in much detail).

There are also claims from upstream about the implementation on 32-bit arm being questionable,
and no other distros s

Mate Kukri

Julian Andres Klode

unread,
Jan 8, 2024, 5:50:04 AMJan 8
to
It's 1.5 months later, valgrind is still failing and apt in valgrind
hence segfaults. I am disabling the apt valgrind test on armhf in 2.7.8,
but this situation is somewhat untenable.

I did clone the bug to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1060251
now.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en

Emanuele Rocca

unread,
Feb 14, 2024, 3:40:04 AMFeb 14
to
Hi,

On 2023-11-25 12:37, Wookey wrote:
> For debian we'll keep an eye on it, do a belated rebuild to see how
> much of a problem we really have, and then decide if we should revert
> it too until some stuff if fixed.

I now finally have some data to share. In total, out of the whole Debian
archive, 4 packages fail to build because of stackclash on armhf and 2
on armel. Additionally, 5 packages have failing autopkgtests.

The main issue really is the open valgrind bug on armhf when checking
programs built with stack-clash-protection:
https://bugs.debian.org/1061496
No problem on armel, given that valgrind is not supported at all there.

The procedure I followed to get the FTBFS data was starting from the
list of build failures kindly gather by Lucas with his archive rebuild
last month (see http://qa-logs.debian.net/2024/01/11/). I've rebuilt all
packages that failed, and it turns out that most failed due to
transient issues at the time. Then starting from the list of my failed
rebuilds I performed another build - this time without stackclash.

Note that of the 4 armhf FTBFS, 2 are due to the fact that the build
process uses valgrind (#1061496). Additionally, the valgrind issue
caused autopkgtest failures in 5 packages: apt, libgd2, libgssglue,
libvorbis, and sndfile-tools.

The workaround I've been suggesting for the FTBFS is to disable
stackclash on armhf or armel for the few packages that fail building.
For packages using valgrind in autopkgtest, I've been suggesting either
to skip the tests that fail or disabling stackclash - on armhf only of
course.

For all of the above, I have filed bugs with the usertag 32bit-stackclash:
https://bugs.debian.org/cgi-bin/pkgreport.cgi?users=debia...@lists.debian.org;tag=32bit-stackclash
0 new messages