[llvm-dev] ABI-specific Stack Pointer Register?

Kavon Farvardin via llvm-dev

unread,

Dec 4, 2019, 3:16:26 PM12/4/19

to llvm-dev, Ben Gamari

Hi,

In the runtime system for GHC Haskell, the stack pointer register is not the same as the one defined by the operating system ABI, and it's difficult for GHC to change that. Following the example of CoreCLR in LLVM, it seems one way to remedy this situation is to define a new ABI (i.e., a new llvm::Triple::EnvironmentType ) and modify the code generator as-needed to respect to our ABI, through our own definition of an MCAsmInfo class for each architecture, etc.

My question is: how feasible is it to extend MCAsmInfo with an override for the stack pointer register and have that fact be respected by the code generator?

Currently, MCAsmInfo only specifies the stack direction, and from what I've seen in the x86 backend (and instruction selection), RSP is assumed to be the stack-pointer register and is hard-coded in a number of places.

Any other suggestions of alternate ways to achieve this change in LLVM code generation for both ARM64 and x86-64 are welcomed!

Thanks,

Kavon

signature.asc

Aaron Smith via llvm-dev

unread,

Dec 4, 2019, 3:40:14 PM12/4/19

to Kavon Farvardin, llvm-dev, Ben Gamari

Try changing AArch64RegisterInfo::getFrameRegister() [1].

if (MF->getFunction().getCallingConv() == CallingConv::GHC)
return AArch64::GHCSP;

[1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp#L294

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

unread,

Dec 4, 2019, 3:46:07 PM12/4/19

to Kavon Farvardin, llvm-dev, Ben Gamari

From past conversations, I was under the impression that GHC functions are really "stackless", they use CPS: they run, perform some computation, update the passed-in virtual stack registers, and then return with some sort of tail call, preserving and updating the set of registers which were passed in. Why does it matter if LLVM spills intermediate data to the architectural stack if it clears it all off on exit?

Regarding the feasibility of it, I think it's actually feasible. Depending on how radical the changes you need are, I would recommend subclassing the TargetFrameLowering classes and providing new implementations of emitPrologue, emitEpilogue, and getFrameIndexReference. The X86 emitPrologue codepath is already very, very complicated, and handles far too many special cases. Much of the complexity comes from CFI, and that wouldn't be a concern with an alternative stack.

On Wed, Dec 4, 2019 at 12:16 PM Kavon Farvardin via llvm-dev <llvm...@lists.llvm.org> wrote:

Kavon Farvardin via llvm-dev

unread,

Dec 5, 2019, 12:35:56 PM12/5/19

to Reid Kleckner, llvm-dev, Ben Gamari

On Wed, 2019-12-04 at 12:45 -0800, Reid Kleckner wrote:

From past conversations, I was under the impression that GHC functions are really "stackless", they use CPS: they run, perform some computation, update the passed-in virtual stack registers, and then return with some sort of tail call, preserving and updating the set of registers which were passed in. Why does it matter if LLVM spills intermediate data to the architectural stack if it clears it all off on exit?

The problem is with LLVM's code generation, where some values that could be re-materialized (jump table offsets, constants) are instead spilled to the architectural stack instead of the virtual stack. Rather than try to fix or work round this behavior, I am considering an extension to LLVM in concert with GC Statepoints so that the code generator's "architectural" stack is the same our "virtual" stack by defining a different ABI, starting with a change to the stack pointer register.

Regarding the feasibility of it, I think it's actually feasible. Depending on how radical the changes you need are, I would recommend subclassing the TargetFrameLowering classes and providing new implementations of emitPrologue, emitEpilogue, and getFrameIndexReference. The X86 emitPrologue codepath is already very, very complicated, and handles far too many special cases. Much of the complexity comes from CFI, and that wouldn't be a concern with an alternative stack.

My main concern with the changes were around instruction selection being dependent on the ABI and the flexibility of frame layout (e.g., where register spills go relative to allocas) so that the garbage collector can still parse the stack. Do you think there would be any issues there?

Thanks,

Kavon

signature.asc

Reid Kleckner via llvm-dev

unread,

Dec 6, 2019, 6:08:38 PM12/6/19

to Kavon Farvardin, llvm-dev, Ben Gamari

On Thu, Dec 5, 2019 at 9:35 AM Kavon Farvardin <ka...@farvard.in> wrote:

On Wed, 2019-12-04 at 12:45 -0800, Reid Kleckner wrote:
From past conversations, I was under the impression that GHC functions are really "stackless", they use CPS: they run, perform some computation, update the passed-in virtual stack registers, and then return with some sort of tail call, preserving and updating the set of registers which were passed in. Why does it matter if LLVM spills intermediate data to the architectural stack if it clears it all off on exit?

The problem is with LLVM's code generation, where some values that could be re-materialized (jump table offsets, constants) are instead spilled to the architectural stack instead of the virtual stack. Rather than try to fix or work round this behavior, I am considering an extension to LLVM in concert with GC Statepoints so that the code generator's "architectural" stack is the same our "virtual" stack by defining a different ABI, starting with a change to the stack pointer register.

That's reasonable. Essentially, there are actually points in the frame where you want parseable state, it's not enough to be parseable only on entry and exit.

Regarding the feasibility of it, I think it's actually feasible. Depending on how radical the changes you need are, I would recommend subclassing the TargetFrameLowering classes and providing new implementations of emitPrologue, emitEpilogue, and getFrameIndexReference. The X86 emitPrologue codepath is already very, very complicated, and handles far too many special cases. Much of the complexity comes from CFI, and that wouldn't be a concern with an alternative stack.

My main concern with the changes were around instruction selection being dependent on the ABI and the flexibility of frame layout (e.g., where register spills go relative to allocas) so that the garbage collector can still parse the stack. Do you think there would be any issues there?

My understanding is that spill slots are just regular stack objects, so they'll appear near allocas.

Reply all

Reply to author

Forward