Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

armhf SIGILL, Illegal Instruction

6 views
Skip to first unread message

Ash Hughes

unread,
Sep 29, 2021, 4:10:03 PM9/29/21
to
Hi,

I've been getting some programs terminated with SIGILL today, and I'm
trying to find out if this is a package issue or if Debian (Bullseye) is
no longer compatible with my ARM machine. I first got an error with
onedrive, with gdb output:

#0  0xb6948ca8 in gc.impl.conservative.gc.Gcx.fullcollect(bool) ()
   from /usr/lib/arm-linux-gnueabihf/libdruntime-ldc-shared.so.94

which is "vldr    d18, [pc, #216] ;".

I then tried to run ldc2, and I got something similar:

Core was generated by `ldc2 -c --output-o -conf= -w -mattr=-neon -O3
-release -relocation-model=pic -d'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0089e15c in
dmd.parse.Parser!(dmd.astcodegen.ASTCodegen).Parser.parsePrimaryExp() ()

which is also a vldr instruction ("vldr    d16, [r6, #80]  ; 0x50")

Finally, I tried to compile ldc2 myself and running it I got:

#0  0xb4a6eabc in ?? () from /usr/lib/arm-linux-gnueabihf/libLLVM-11.so.1

also vldr ("vldr        d16, [sp, #8]")

It looks like the vldr instruction is being used in several LLVM
packages, in a way my CPU doesn't like. Here's my cpuinfo:

processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
BogoMIPS        : 37.39
Features        : half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16
tls idivt
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0x581
CPU revision    : 1

Hardware        : Marvell Armada 370/XP (Device Tree)
Revision        : 0000
Serial          : 0000000000000000

I don't have neon, although I think armhf doesn't require it, unless
this has changed for Bullseye? If neon isn't required for Debian armhf,
does this mean some LLVM related packages could be built differently to
improve compatibility?

Thanks,

Ash

Jeffrey Walton

unread,
Sep 29, 2021, 4:30:02 PM9/29/21
to
I think John Paul Adrian Glaubitz (with the help of others) on the
PowerPC mailing list determined that Autools is the problem. Autotools
is using an M4 macro that is selecting the wrong platform or features.
It is new behavior.

Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems
since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html.
In particular, from a followup at
https://lists.debian.org/debian-powerpc/2021/09/msg00077.html:

<QUOTE>
It turns out that m4/ax_gcc_archflag.m4 contains code to detect the
baseline of the host system and sets the GCC architecture accordingly.

Thus, a libffi compiled on a POWER8 machine will not work on a POWER5
machine as the compiler is emitting POWER8 instructions in this case.

Since the m4 script contains such a host enviroment detection for aarch64
as well [1], this bug can potentially affect arm64 which is a release
architecture.

We should therefore pass "--enable-portable-binary" in debian/rules.

[1] https://github.com/libffi/libffi/blob/master/m4/ax_gcc_archflag.m4#L209
</QUOTE>

This is also of interest
https://lists.debian.org/debian-powerpc/2021/09/msg00048.html. There's
a lot of back-and-forth, but it is where the problem is revealed.

I could be mistaken, so take it with a grain of salt.

Jeff

John Paul Adrian Glaubitz

unread,
Sep 29, 2021, 5:00:02 PM9/29/21
to
Hi Jeffrey!

On 9/29/21 22:28, Jeffrey Walton wrote:
> I think John Paul Adrian Glaubitz (with the help of others) on the
> PowerPC mailing list determined that Autools is the problem. Autotools
> is using an M4 macro that is selecting the wrong platform or features.
> It is new behavior.
>
> Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems
> since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html.
> In particular, from a followup at
> https://lists.debian.org/debian-powerpc/2021/09/msg00077.html:

It looks like a different bug as the SIGILL faults that Ash is seeing are not
occurring inside libffi.so.8. I think it's more likely an issue with LLVM in
this case as could be seen from the backtrace.

But I would have to look into the details to figure out who the culprit is.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glau...@debian.org
`. `' Freie Universitaet Berlin - glau...@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

peter green

unread,
Sep 29, 2021, 5:10:04 PM9/29/21
to
As I understand it, there are two variants of "VFPv3", a version with 32 double registers (d0 to d31) and a version with only 16 double registers (d0 to d16).
The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as "vfpv3_d16".

Debian is supposed to support vfpv3_d16 but because there is relatively little hardware out there that doesn't support the extra registers bugs may take a while
to get noticed.

So IMO this is a bug in the compiler that is generating that code. What i'm not so sure about is whether selecting the correct compilation settings is the
responsibility of the frontend (ldc) or the backend (llvm).

Jeffrey Walton

unread,
Sep 29, 2021, 6:50:03 PM9/29/21
to
On Wed, Sep 29, 2021 at 5:05 PM peter green <plug...@p10link.net> wrote:
>
> As I understand it, there are two variants of "VFPv3", a version with 32 double registers (d0 to d31) and a version with only 16 double registers (d0 to d16).
> The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as "vfpv3_d16".
>
> Debian is supposed to support vfpv3_d16 but because there is relatively little hardware out there that doesn't support the extra registers bugs may take a while
> to get noticed.
>
> So IMO this is a bug in the compiler that is generating that code. What i'm not so sure about is whether selecting the correct compilation settings is the
> responsibility of the frontend (ldc) or the backend (llvm).

Shouldn't that show up in the build logs? You should see 'gcc
-march=armv7 -fpu=vfpv3-d16 ...'? Also see
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html .

I'm used to building with -fpu=neon, so I'm not too familiar with a
fpu that does not do NEON. But I seem to recall we needed something
similar for early Android devices.

( I also have never used ldc, so my [limited] knowledge must really be old...).

Jeff

peter green

unread,
Sep 29, 2021, 8:20:03 PM9/29/21
to
On 29/09/2021 23:39, Jeffrey Walton wrote:
> On Wed, Sep 29, 2021 at 5:05 PM peter green <plug...@p10link.net> wrote:
>>
>> As I understand it, there are two variants of "VFPv3", a version with 32 double registers (d0 to d31) and a version with only 16 double registers (d0 to d16).
>> The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as "vfpv3_d16".
>>
>> Debian is supposed to support vfpv3_d16 but because there is relatively little hardware out there that doesn't support the extra registers bugs may take a while
>> to get noticed.
>>
>> So IMO this is a bug in the compiler that is generating that code. What i'm not so sure about is whether selecting the correct compilation settings is the
>> responsibility of the frontend (ldc) or the backend (llvm).
>
> Shouldn't that show up in the build logs?

It will only show up in build logs if the build process is overriding the built-in defaults of the compiler.

Normal practice in Debian is that when invoked without specific architecture flags compilers should generate
code that will run on the baseline CPU of the port. If they don't then that is a bug in the compiler.
0 new messages