Justification for save area for reg/stack split arguments (GCC)?

39 views
Skip to first unread message

Alex Bradbury

unread,
Sep 5, 2017, 11:19:22 AM9/5/17
to RISC-V SW Dev
Hi,

I've been spending time again looking at calling convention issues
recently, which in many cases involves looking at what GCC currently
does. I notice that when there is an argument split between the stack
and registers, GCC will have the callee allocate extra space at
positive offsets of the frame pointer to hold the part passed via the
stack.

e.g. the following function will see the a7_s0 argument split between
an argument register and the stack when compiled for RV32 soft-float:
void callee(int a0, int a1, int a2, int a3, int a4, int a5, int
a6, double a7_s0, int64_t s1, float s2, struct large_struct s3) {
...
}

When compiling a simple function like this, gcc might produce a stack
frame where sp = oldsp - 32, and fp = oldsp - 16. [fp+12] is used to
store a7, while the [fp], [fp+4], [fp+8] are unused. As pointed out in
this bug report, this approach can actually lead to misaligned memory
accesses when you might prefer to avoid them
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82106>. Is there a
justification for this save area beyond possibly avoiding an extra
store to reassemble a split argument? It's much clearer with the
vararg save area - which is required to ensure that varargs passed in
registers are stored on the stack contiguously with stack arguments
(i.e. ensuring stdarg.h helpers can work). I couldn't see a similar
sort of justification here, but thought it was worth double checking
that I'm not missing something obvious.

Best,

Alex

Andrew Waterman

unread,
Sep 5, 2017, 2:13:01 PM9/5/17
to Alex Bradbury, RISC-V SW Dev
I think this is just how we happened to implement it. It can certainly be done differently, without affecting the ABI. We should fix GCC to align the save slot, and of course LLVM could take a different approach entirely.

Most likely the same issue would arise with RV64Q.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2BwH295D9XFg%2BZhS%2BWRsAN3LxHU44OXuffRXVT-Kn6XZQQzDGw%40mail.gmail.com.

David Chisnall

unread,
Sep 6, 2017, 5:47:35 AM9/6/17
to Alex Bradbury, RISC-V SW Dev
This sounds suspiciously like someone copied and pasted some MIPS o32 code in bringing up the RISC-V back end in gcc. o32 has exactly this problem (and, in fact, the ABI requires that some types be passed misaligned on the stack as a result).

Having a contiguous save area (i.e. not putting the stack pointer and return address spills in the middle) can be a noticeable win for C++ code, where it’s fairly common to pass some values in registers and require reassembly. We’ve recently been looking at C++ calling conventions in some detail and have noted that both x86 and ARM (32 and 64-bit variants) use some heuristics that seem not to be entirely sensible.

David

Alex Bradbury

unread,
Sep 6, 2017, 6:13:39 AM9/6/17
to David Chisnall, RISC-V SW Dev
On 6 September 2017 at 10:47, David Chisnall
<David.C...@cl.cam.ac.uk> wrote:
> Having a contiguous save area (i.e. not putting the stack pointer and return address spills in the middle) can be a noticeable win for C++ code, where it’s fairly common to pass some values in registers and require reassembly. We’ve recently been looking at C++ calling conventions in some detail and have noted that both x86 and ARM (32 and 64-bit variants) use some heuristics that seem not to be entirely sensible.

Do you think this is likely to still be a win on RISC-V? It's only
scalars or aggregates where XLEN < len(arg) <= 2*XLEN that can be
split between registers and the stack. Therefore the worst possible
reassembly case involves two xlen-sized stores if there's no save area
(reassembling to a new stack slot) vs one xlen-sized store with a save
area (storing a7 to a potentially misaligned location in the save
area).

Alex

David Chisnall

unread,
Sep 6, 2017, 6:20:55 AM9/6/17
to Alex Bradbury, RISC-V SW Dev
It’s been over a year since I looked at the RISC-V ABI in detail and so I don’t recall exactly the decisions taken there, but in general terms:

The big loss is when you’re passing a largeish struct that spans registers and the stack. For example:

void foo(SomeBigClassType x)
{
x.someMethod();
}

If the amount of SomeBigClassType on the stack is large, then you end up doing a memcpy of this region followed by stores of the in-register parts. This is a lot more expensive than the two alternatives: passing the whole thing on the stack, or reserving enough space on the stack to spill bits passed in registers. The latter alternative is often preferable because you often see code that only accesses one or two fields and so having those fields in registers can be a win.

David

Alex Bradbury

unread,
Sep 6, 2017, 6:31:11 AM9/6/17
to David Chisnall, RISC-V SW Dev
On 6 September 2017 at 11:20, David Chisnall
Thanks for the clarification, I thought that may be the cause of
concern. We don't have that problem in the RISC-V ABI as in the
soft-float ABI values > 2*xlen are passed by reference. With
hard-float, fp+int, int+fp, or fp+fp structs >2xlen may be passed in
an FPR+GPR of FPR+FPR pair, but never split between registers and the
stack.

Best,

Alex
Reply all
Reply to author
Forward
0 new messages