[LLVMdev] GCC/LLVM frame pointer incompatibility on ARM

346 views
Skip to first unread message

Yury Gribov

unread,
Jul 16, 2014, 1:48:27 AM7/16/14
to LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
Hi all,

As has been mentioned several times (*), LLVM and GCC setup frame
pointer to point to different stack slots on ARM. GCC's fp points to
stack slot holding lr while LLVM's fp points at the next slot.

Fp incompatibility complicates low-level system code e.g. stack
unwinders because it is impossible to robustly determine location of
caller's fp.

Is this incompatibility intentional/desired or we could somehow unify
GCC and LLVM in this regard?

(*) Links to older discussions:
* http://comments.gmane.org/gmane.comp.compilers.llvm.devel/69514
* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771

--
Best regards,
Yury Gribov
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Joerg Sonnenberger

unread,
Jul 16, 2014, 3:13:38 AM7/16/14
to llv...@cs.uiuc.edu
On Wed, Jul 16, 2014 at 09:45:05AM +0400, Yury Gribov wrote:
> Fp incompatibility complicates low-level system code e.g. stack
> unwinders because it is impossible to robustly determine location of
> caller's fp.

I don't understand this argument. The ARM EH / DWARF annotation is
supported by LLVM and encodes exactly the data required for robustly
unwinding the stack.

Joerg

Evgeniy Stepanov

unread,
Jul 16, 2014, 3:30:45 AM7/16/14
to LLVM Developers Mailing List
On Wed, Jul 16, 2014 at 11:10 AM, Joerg Sonnenberger
<jo...@britannica.bec.de> wrote:
> On Wed, Jul 16, 2014 at 09:45:05AM +0400, Yury Gribov wrote:
>> Fp incompatibility complicates low-level system code e.g. stack
>> unwinders because it is impossible to robustly determine location of
>> caller's fp.
>
> I don't understand this argument. The ARM EH / DWARF annotation is
> supported by LLVM and encodes exactly the data required for robustly
> unwinding the stack.

Not fast enough for us.

Renato Golin

unread,
Jul 16, 2014, 3:44:17 AM7/16/14
to Evgeniy Stepanov, LLVM Developers Mailing List
On 16 July 2014 08:27, Evgeniy Stepanov <eugeni....@gmail.com> wrote:
>> I don't understand this argument. The ARM EH / DWARF annotation is
>> supported by LLVM and encodes exactly the data required for robustly
>> unwinding the stack.

Plus, relying on specific compiler's output cannot ever be robust,
even if all known compilers do the same thing, one unknown will stand
different. Following EH directives is the only sure way of getting
things right. Robust or fast, pick one. :)


> Not fast enough for us.

I'm afraid you'll have to cope with different compilers' outputs. Even
if LLVM changes that, there will be others. You could say: "I only
care about GCC and LLVM", but you probably said before "I only care
about GCC", and it has proven problematic.

Another way would be to get all compilers to agree on style, document,
and follow as if it was a "compiler standard". That doesn't guarantee
anything, but at least provides a well documented, with strong
arguments, why implementing A rather than B is optimal, and might
convince other compilers to abide by our decision.

We're going to discuss about GCC + LLVM interactions at the GNU
Cauldron this Friday, I might add this topic to the list. I don't
particularly have any preference, but people might, so I'd be keen on
hearing the arguments on both sides.

cheers,
--renato

Tim Northover

unread,
Jul 16, 2014, 3:47:40 AM7/16/14
to Yury Gribov, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw, LLVM
> As has been mentioned several times (*), LLVM and GCC setup frame pointer to
> point to different stack slots on ARM. GCC's fp points to stack slot holding
> lr while LLVM's fp points at the next slot.

This looks flipped from my tests. Both create an { fp, lr } struct;
GCC sets current fp to the address of lr in that struct; LLVM sets
current fp to the address of fp in that struct.

> Is this incompatibility intentional/desired or we could somehow unify GCC
> and LLVM in this regard?

What are the chances of getting GCC to change here? It's entirely a
bike-shedding argument, but there are a couple of reasons to prefer
LLVM's choice. It's most consistent with what *is* required in the
AArch64 ABI, and it means fp really points to the frame record, not
some random point half way through it.

Cheers.

Tim.

Renato Golin

unread,
Jul 16, 2014, 3:57:52 AM7/16/14
to Tim Northover, LLVM, Maxim Ostapenko, Richard Earnshaw, Konstantin Serebryany
On 16 July 2014 08:45, Tim Northover <t.p.no...@gmail.com> wrote:
> What are the chances of getting GCC to change here? It's entirely a
> bike-shedding argument, but there are a couple of reasons to prefer
> LLVM's choice. It's most consistent with what *is* required in the
> AArch64 ABI, and it means fp really points to the frame record, not
> some random point half way through it.

I'm not an expert in x86_64 asm, but it seems that both AArch64 and
x86_64 GCC do the same:

x86_64:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp

AArch64:
stp x29, x30, [sp, -32]!
add x29, sp, 0

which would indicate that LLVM's implementation on ARM is the most
consistent. I'm guessing ARM GCC's implementation was not an accident,
but a long forgotten hack... :/

cheers,
--renato

Evgeniy Stepanov

unread,
Jul 16, 2014, 4:00:25 AM7/16/14
to Tim Northover, LLVM, Maxim Ostapenko, Richard Earnshaw, Konstantin Serebryany
On Wed, Jul 16, 2014 at 11:45 AM, Tim Northover <t.p.no...@gmail.com> wrote:
>> As has been mentioned several times (*), LLVM and GCC setup frame pointer to
>> point to different stack slots on ARM. GCC's fp points to stack slot holding
>> lr while LLVM's fp points at the next slot.
>
> This looks flipped from my tests. Both create an { fp, lr } struct;
> GCC sets current fp to the address of lr in that struct; LLVM sets
> current fp to the address of fp in that struct.
>
>> Is this incompatibility intentional/desired or we could somehow unify GCC
>> and LLVM in this regard?
>
> What are the chances of getting GCC to change here? It's entirely a
> bike-shedding argument, but there are a couple of reasons to prefer
> LLVM's choice. It's most consistent with what *is* required in the
> AArch64 ABI, and it means fp really points to the frame record, not
> some random point half way through it.

It is also consistent with x86: we use exactly the same code to unwind
stack on both platforms.

Renato Golin

unread,
Jul 16, 2014, 4:09:18 AM7/16/14
to Evgeniy Stepanov, Konstantin Serebryany, LLVM, Maxim Ostapenko, Richard Earnshaw
On 16 July 2014 08:57, Evgeniy Stepanov <eugeni....@gmail.com> wrote:
> It is also consistent with x86: we use exactly the same code to unwind
> stack on both platforms.

I'm checking with the ARM GCC folks if there's any reason behind this.
If it's just legacy, I'll propose a change on Friday (not holding my
breath, though).

cheers,
--renato

Yury Gribov

unread,
Jul 17, 2014, 7:52:32 AM7/17/14
to Tim Northover, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw, LLVM
On 07/16/2014 11:45 AM, Tim Northover wrote:
>>> As has been mentioned several times (*), LLVM and GCC setup frame pointer to
>> point to different stack slots on ARM. GCC's fp points to stack slot holding
>> lr while LLVM's fp points at the next slot.
>
> This looks flipped from my tests. Both create an { fp, lr } struct;
> GCC sets current fp to the address of lr in that struct; LLVM sets
> current fp to the address of fp in that struct.

Right, I misread the assembly :(

>> Is this incompatibility intentional/desired or we could somehow unify GCC
>> and LLVM in this regard?
>
> What are the chances of getting GCC to change here?

Well, their logic is that as long as FP is not part of ARM ABI they can
make arbitrary choice
even if it complicates user's life. I really hope that Renato could
persuade people that
this is worth changing.

> It's entirely a
> bike-shedding argument, but there are a couple of reasons to prefer
> LLVM's choice. It's most consistent with what *is* required in the
> AArch64 ABI, and it means fp really points to the frame record, not
> some random point half way through it.

Yeah, I think everyone agrees on this.

-Y

Renato Golin

unread,
Jul 17, 2014, 9:13:36 AM7/17/14
to Yury Gribov, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
On 17 July 2014 12:49, Yury Gribov <y.gr...@samsung.com> wrote:
> Well, their logic is that as long as FP is not part of ARM ABI they can make
> arbitrary choice even if it complicates user's life. I really hope that Renato could persuade
> people that this is worth changing.

So, this is a lot more complicated than it seems and the choice was
not arbitrary.

The old APCS required the frame pointer to be pointing to LR in the
stack, and due to the number of problems that it created [1], AAPCS
said "we're having none of it". With that in mind, the GCC engineers
didn't change the FP logic when they implemented AAPCS. The AArch64
AAPCS had a better description of what to do with the FP, and since it
was a new target, both GCC and LLVM engineers decided to do like any
other target instead.

As you may imagine, changing how the FP behaves will have an impact
not just in GCC itself, but many other tools (known and unknown) that
rely on that behaviour. So, while it's undecided and the change is
*possible*, it would need a strong argument to start that change.
Being "like the others" is not strong enough, and I agree with that.
Moreover, the AAPCS can theoretically change again, and enforce yet
another standard, where we'd have to change it all over.

For those reasons, changing ARM GCC's prologue/epilogue is probably
not happening soon.

As you probably already know, the reason why the AAPCS retreated from
controlling the FP is exactly the same as we're discussing it here.
People use it to unwind the stack. On the other hand, eliminating the
prologue when no local logic requires it is pointless and can be a big
difference in performance on devices that are already restricted by
extreme power constraints, so to produce really optimal code for ARM
you have to be able to change that.

What the AAPCS did was just to put in paper what was already true:
don't trust the prologue.

I know it's not the answer we wanted to hear, but it's a damn good
one, and one that I accept as the least costly solution. Given that
LLVM is *also* not breaking the AAPCS, I don't think it'd be a good
idea to replicate GCC's behaviour in the prologue for ARM just for the
sake of fast stack unwinding, but other people are free to disagree.

cheers,
-renato

Tim Northover

unread,
Jul 17, 2014, 9:52:25 AM7/17/14
to Renato Golin, LLVM, Maxim Ostapenko, Richard Earnshaw, Konstantin Serebryany
> I know it's not the answer we wanted to hear, but it's a damn good
> one,

It's an answer. I wouldn't go any further than that myself.

Tim.

Renato Golin

unread,
Jul 17, 2014, 10:09:12 AM7/17/14
to Tim Northover, LLVM, Maxim Ostapenko, Richard Earnshaw, Konstantin Serebryany
On 17 July 2014 14:49, Tim Northover <t.p.no...@gmail.com> wrote:
> It's an answer. I wouldn't go any further than that myself.

Maybe I didn't explain my position right. GCC folks are *definitely*
willing to change IFF there is a formal proposal from ARM. They also
agree that this is as bad as anything else when it comes to guessing
undocumented behaviour (but the formal reason is APCS), and they
*also* understand the headaches other people have with the
differences.

But changing this now will have repercussions across the toolchain and
other tools that rely on it, only for a year later ARM decide to do
something else entirely. It's not worth the headache.

LLVM has a greater freedom to move and deprecate things, they don't. I
find it hard to see how this could be different.

--renato

Yury Gribov

unread,
Jul 17, 2014, 10:16:26 AM7/17/14
to Renato Golin, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
On 07/17/2014 05:10 PM, Renato Golin wrote:
> As you may imagine, changing how the FP behaves will have an impact
> not just in GCC itself, but many other tools (known and unknown) that
> rely on that behaviour.

Note that these tools wouldn't work with Clang then.
And vice verse: tools that are developed in Clang (Asan) won't work with
GCC.

> On the other hand, eliminating the
> prologue when no local logic requires it is pointless

I think you meant "keeping prologue when no local logic requires it is
pointless" ?

> and can be a big
> difference in performance on devices that are already restricted by
> extreme power constraints, so to produce really optimal code for ARM
> you have to be able to change that.

It's the same for x64 - if you need ability to do fast unwinding
you have to ask for it explicitly with -fno-omit-frame-pointer,
otherwise compiler
is free to re-use rbp for general computations.

-Y

Renato Golin

unread,
Jul 17, 2014, 10:56:26 AM7/17/14
to Yury Gribov, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
On 17 July 2014 15:13, Yury Gribov <y.gr...@samsung.com> wrote:
> Note that these tools wouldn't work with Clang then.
> And vice verse: tools that are developed in Clang (Asan) won't work with
> GCC.

That's the point. Break one to fix the other when there is no agreed
standard is not a good use of resources. Whenever there's an agreed
standard, we can all move to the same implementation.


> I think you meant "keeping prologue when no local logic requires it is
> pointless" ?

Yes, sorry.

I'll have to take that to a higher level, ie ARM, just like Jim was
doing with the assembly aliases in ARMCC's docs. It could take a
while...

--renato

Reid Kleckner

unread,
Jul 17, 2014, 4:11:03 PM7/17/14
to Renato Golin, Konstantin Serebryany, Maxim Ostapenko, Richard Earnshaw, LLVM
Would they be willing to have a flag?  Would we be willing to have a flag?  Or should we conditionalize this on OS and say, on Linux, do the gcc thing, and on OS X, do the LLVM thing?

Renato Golin

unread,
Jul 18, 2014, 11:05:26 AM7/18/14
to Reid Kleckner, Konstantin Serebryany, Maxim Ostapenko, Richard Earnshaw, LLVM
On 17 July 2014 21:08, Reid Kleckner <r...@google.com> wrote:
> Would they be willing to have a flag? Would we be willing to have a flag?

That's a good question. Anything we do would be easier than wait for
them to do anything, so if we decide to go with a flag, it should be
us implementing.


> Or should we conditionalize this on OS and say, on Linux, do the gcc thing,
> and on OS X, do the LLVM thing?

I think you agree with me that both solutions are ugly, but I'd rather
not make this default behaviour anywhere, so that only who needs it
(sanitizers) turns it on with a flag.

cheers,
--renato

Jim Grosbach

unread,
Jul 18, 2014, 12:51:56 PM7/18/14
to Renato Golin, Maxim Ostapenko, Konstantin Serebryany, LLVM, Richard Earnshaw
Having a different code path for prologue just for the sanitizers sounds pretty risky to me. That code is already strewn with conditional and modal stuff. Adding another variable to the permutations scares me. Is there really no alternative? Conditional code in the sanitizers that figure things out? LLDB and GDB have a very similar sort of problem for backtraces, including when debug info isn’t available. How do they solve it?

-Jim

Renato Golin

unread,
Jul 18, 2014, 5:15:34 PM7/18/14
to Jim Grosbach, Maxim Ostapenko, Konstantin Serebryany, LLVM, Richard Earnshaw
On 18 July 2014 17:49, Jim Grosbach <gros...@apple.com> wrote:
> Having a different code path for prologue just for the sanitizers sounds pretty risky to me. That code is already strewn with conditional and modal stuff. Adding another variable to the permutations scares me.

Same here.


> Is there really no alternative? Conditional code in the sanitizers that figure things out? LLDB and GDB have a very similar sort of problem for backtraces, including when debug info isn’t available. How do they solve it?

The alternative is to use the unwind tables that both GCC and LLVM
generate even on C code, and that the ABI tells us to use, but their
argument is that's too slow. I don't know LLDB, but GDB uses tables,
but also the hidden logic (for faster unwinding), so I guess that with
code produced by LLVM, it just uses the tables.

GDB has a lot of hidden context with GCC that only works because their
development roadmaps are tied together and it's more scaring than
that, but I don't know how they chose to use magic or not.

Reid Kleckner

unread,
Jul 18, 2014, 8:05:08 PM7/18/14
to Renato Golin, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
On Fri, Jul 18, 2014 at 2:11 PM, Renato Golin <renato...@linaro.org> wrote:
The alternative is to use the unwind tables that both GCC and LLVM
generate even on C code, and that the ABI tells us to use, but their
argument is that's too slow. I don't know LLDB, but GDB uses tables,
but also the hidden logic (for faster unwinding), so I guess that with
code produced by LLVM, it just uses the tables.

It's not just sanitizers that need to be able to get fast, accurate stack traces.  Consider sampling profilers that capture call stacks.  Using the unwind tables is disruptively slow to the process under profile.

Jim Grosbach

unread,
Jul 18, 2014, 8:09:22 PM7/18/14
to Reid Kleckner, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
Why not do the unwind table parsing after the fact? Especially for a profiler, there’s no reason to do that during the actual profile collection.

Reid Kleckner

unread,
Jul 18, 2014, 8:31:59 PM7/18/14
to Jim Grosbach, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
On Fri, Jul 18, 2014 at 4:59 PM, Jim Grosbach <gros...@apple.com> wrote:
It's not just sanitizers that need to be able to get fast, accurate stack traces.  Consider sampling profilers that capture call stacks.  Using the unwind tables is disruptively slow to the process under profile.

Why not do the unwind table parsing after the fact? Especially for a profiler, there’s no reason to do that during the actual profile collection.

I'm not sure how that would work, without memcpy-ing the entire stack.  If you don't have frame pointers you can't walk upwards to find the return addresses to save, at least not without... looking at the unwind tables.  :)

Jim Grosbach

unread,
Jul 18, 2014, 10:29:18 PM7/18/14
to Reid Kleckner, LLVM, Maxim Ostapenko, Konstantin Serebryany, Richard Earnshaw
 Sorry, brain fart. I was thinking about symbolication, which is usually the annoying problem on Darwin. 

Sent from my iPad
Reply all
Reply to author
Forward
0 new messages