Towards a "golden model" of the RISC-V calling convention(s)

Alex Bradbury

unread,

Mar 14, 2017, 5:33:31 PM3/14/17

to RISC-V SW Dev

Hi all,

I've been working through the currently documented RISC-V calling
convention <https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md>,
and as part of that felt it would be helpful to provide a simple
Python model of the calling convention as described. The hope is that
this can eventually become a "golden model" of the RISC-V calling
convention, and play a part in an ABI compliance suite, but of course
it's early days yet. I'm sharing this in the spirit of "release early,
release often" and would greatly appreciate anyone taking a look at
the implementation to see if you agree with my interpretation of the
currently specified calling convention rules.

See https://github.com/lowRISC/riscv-calling-conv-model for the
Python-implemented model. You'll note that most of the complexity
comes from handling the special rules for the floating point calling
convention (see
https://github.com/lowRISC/riscv-calling-conv-model/blob/master/rvcc.py#L277),
the integer convention is rather straight-forward. I have yet to test
this extensively vs the current gcc implementation.

A quick usage example is below. 'sNN, 'aNN' refers to the calculated
size and alignment for a type.

$ python3
Python 3.6.0 (default, Jan 16 2017, 12:12:55)
[GCC 6.3.1 20170109] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rvcc import *
>>> m = RVMachine(xlen=32, flen=64)
>>> m.call([
... Int(32),
... Float(64),
... Struct(Int(8), Array(Float(32), 1)),
... Struct(Array(Int(8), 20)),
... Int(64),
... Int(64),
... Int(64)])
Args:
arg00: SInt32
arg01: Float64
arg02: Struct([SInt8, Pad24, Array(Float32*1, s32, a32)], s64, a32)
arg03: Struct([Array(SInt8*20, s160, a8)], s160, a8)
arg04: SInt64
arg05: SInt64
arg06: SInt64

GPRs:
GPR[a0]: arg00
GPR[a1]: arg02[0:7]
GPR[a2]: &arg03
GPR[a3]: arg04[0:31]
GPR[a4]: arg04[32:63]
GPR[a5]: arg05[0:31]
GPR[a6]: arg05[32:63]
GPR[a7]: arg06[0:31]

FPRs:
FPR[fa0]: arg01
FPR[fa1]: arg02[32:63]
FPR[fa2]: ?
FPR[fa3]: ?
FPR[fa4]: ?
FPR[fa5]: ?
FPR[fa6]: ?
FPR[fa7]: ?

Stack:
arg06[32:63]
""")

Comments, feedback, ideas, bug reports, pull requests are all
obviously very welcome!

Best,

Alex

Andrew Waterman

unread,

Mar 14, 2017, 6:36:14 PM3/14/17

to Alex Bradbury, RISC-V SW Dev

Great idea, Alex.

> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2BwH295aj3BR0xFfcNF%2BtqWRu461CR6D4Oxk%3D-GB4QqQkzf30w%40mail.gmail.com.

Alex Bradbury

unread,

Sep 7, 2017, 5:19:58 AM9/7/17

to RISC-V SW Dev

On 14 March 2017 at 21:33, Alex Bradbury <a...@asbradbury.org> wrote:
> Hi all,
>
> I've been working through the currently documented RISC-V calling
> convention <https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md>,
> and as part of that felt it would be helpful to provide a simple
> Python model of the calling convention as described. The hope is that
> this can eventually become a "golden model" of the RISC-V calling
> convention, and play a part in an ABI compliance suite, but of course
> it's early days yet. I'm sharing this in the spirit of "release early,
> release often" and would greatly appreciate anyone taking a look at
> the implementation to see if you agree with my interpretation of the
> currently specified calling convention rules.
>
> See https://github.com/lowRISC/riscv-calling-conv-model for the
> Python-implemented model.

I've made numerous updates and bug fixes. The model can now be used to
calculate the convention for return values, the vararg convention is
supported, and stack offsets are calculated. I've also added some
simple pytest-based tests.

https://github.com/lowRISC/riscv-calling-conv-model

I'll soon be adding support for compiler test case generation, which
should make the whole thing rather more directly useful.

There are a few ways you can help:
* Bug hunting - can you find any cases where incorrect register
assignments are generated?
* Help me create an in-browser version using Skulpt/Trinket/Transcrypt
<https://github.com/lowRISC/riscv-calling-conv-model/issues/3>. Sadly
the current codebase is incompatible with Trinket.io
* Name suggestions! As the focus of this code will move towards test
case generation, a different name may be more appropriate. It's also
possible I'd want to support targets other than RISC-V. abicc,
abifuzz, abicheck, abitest are call taken by other projects...

Best,

Alex

Bruce Hoult

unread,

Sep 7, 2017, 10:01:31 AM9/7/17

to Alex Bradbury, RISC-V SW Dev

Tangentially related:

Why is 16-byte stack alignment mandated for RV32?

It makes sense for RV64. Other 64 bit ISAs such as x86_64 and aarch64 also require 16 byte alignment. This is seldom a waste as it's only the return address plus one other register.

Other comparable 32 bit ISAs including ARM32 require 8 byte alignment (also two registers) at "public interfaces". (ARM allows 4 byte alignment within a function)

In the embedded world this can make a significant difference.

I'd even argue that 4 byte alignment would be ok if the system doesn't have the D extension.

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2BwH295T6e9sJq%2BDNwigenMbuqpwneL%3DekwgmS5zpHbdV7OyVg%40mail.gmail.com.

Alex Bradbury

unread,

Sep 7, 2017, 10:20:30 AM9/7/17

to Bruce Hoult, RISC-V SW Dev

On 7 September 2017 at 15:01, Bruce Hoult <br...@hoult.org> wrote:
> Tangentially related:
>
> Why is 16-byte stack alignment mandated for RV32?
>
> It makes sense for RV64. Other 64 bit ISAs such as x86_64 and aarch64 also
> require 16 byte alignment. This is seldom a waste as it's only the return
> address plus one other register.
>
> Other comparable 32 bit ISAs including ARM32 require 8 byte alignment (also
> two registers) at "public interfaces". (ARM allows 4 byte alignment within a
> function)
>
> In the embedded world this can make a significant difference.
>
> I'd even argue that 4 byte alignment would be ok if the system doesn't have
> the D extension.

There was some discussion of this here
https://github.com/riscv/riscv-elf-psabi-doc/issues/21. One motivation
was the C.ADDI16SP instruction.

Best,

Alex

Bruce Hoult

unread,

Sep 7, 2017, 10:59:15 AM9/7/17

to Alex Bradbury, RISC-V SW Dev

A C.ADDI works for arbitrary frame sizes up to 16 bytes, and a single C.ADDI16SP plus a C.ADDI can be used together for arbitrary frame sizes up to about 512 bytes.

As Andrew Waterman points out, functions with large stack frames tend to have a lot of code -- they probably initialize each slot and have at least one instruction to read each slot. Adding a C.ADDI to the prologue and epilogue is a proportionally small overhead in both space and time.

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2BwH297BN_2vPiFvOmZ1kqV3UznOoGKAebV-ZhEf61nwcqPD6w%40mail.gmail.com.

Samuel Falvo II

unread,

Sep 7, 2017, 12:24:59 PM9/7/17

to Bruce Hoult, Alex Bradbury, RISC-V SW Dev

On Thu, Sep 7, 2017 at 7:59 AM, Bruce Hoult <br...@hoult.org> wrote:
> A C.ADDI works for arbitrary frame sizes up to 16 bytes, and a single
> C.ADDI16SP plus a C.ADDI can be used together for arbitrary frame sizes up
> to about 512 bytes.

You might as well just use ADDI at that point (same code size, single
opcode, easier to decode, potentially faster, never slower).

Personally, in my compiler work (Oberon and Forth) I utterly ignore
the 16-byte alignment requirement, and stick with native register
width alignment. If I ever need to push a structure on a non-aligned
boundary for some reason, I'll pad the stack space only when
necessary; though, honestly, I've so far not had a need to do so yet,
so I suspect this will be required only very rarely.

--
Samuel A. Falvo II

Bruce Hoult

unread,

Sep 7, 2017, 1:14:12 PM9/7/17

to Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On Thu, Sep 7, 2017 at 7:24 PM, Samuel Falvo II <sam....@gmail.com> wrote:

On Thu, Sep 7, 2017 at 7:59 AM, Bruce Hoult <br...@hoult.org> wrote:
> A C.ADDI works for arbitrary frame sizes up to 16 bytes, and a single
> C.ADDI16SP plus a C.ADDI can be used together for arbitrary frame sizes up
> to about 512 bytes.

You might as well just use ADDI at that point (same code size, single
opcode, easier to decode, potentially faster, never slower).

Yes, good point, and gives up to 2KB range.

I don't think many people would really care that a function that logically needs a 68 byte stack frame gets rounded up to 80. Where it really hurts is when the function logically only needs 4 or 8 bytes, especially if it's a recursive function. Those fit into a C.ADDI.

Personally, in my compiler work (Oberon and Forth) I utterly ignore
the 16-byte alignment requirement, and stick with native register
width alignment.

The question is: what is trying to be achieved by this restriction? I don't know of any aim other than to prevent misaligned accesses.

An enforced alignment of larger than the largest data type supported by the processor is unnecessary to avoid this.

If you don't have double precision FP then 4 bytes is enough on a 32 bit system. 8 is enough on a 64 bit system even with FP.

SIMD vector registers might be bigger, but I note that NEON at least is happy with 8 byte alignment for 16 byte vector registers. Intel x86 SSE required aligned operands, but AVX doesn't -- it gives better performance if avx256 operands are 32-byte aligned (I assume 64-byte for avx512?), but doesn't require it.

I'd assume Krste's Cray-style vectors won't have large alignment requirements.

Stefan O'Rear

unread,

Sep 7, 2017, 2:12:57 PM9/7/17

to Bruce Hoult, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On Thu, Sep 7, 2017 at 10:14 AM, Bruce Hoult <br...@hoult.org> wrote:
> If you don't have double precision FP then 4 bytes is enough on a 32 bit
> system. 8 is enough on a 64 bit system even with FP.

16 was chosen because of the possible presence of the Q extension.

-s

Bruce Hoult

unread,

Sep 7, 2017, 2:18:41 PM9/7/17

to Stefan O'Rear, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

I quote from the RISC-V spec manual, first para, section 12:

'The 128-bit or quad-precision binary floating-point instruction subset is named “Q”, and requires RV64IFD.'

So I ask again: why apply 16 byte alignment to RV32?

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CADJ6UvOD7A%2BktayXU3X%3DJaLoojE1b%3Dt5XdP6v2Fgs_XxpBqR%3Dg%40mail.gmail.com.

Andrew Waterman

unread,

Sep 7, 2017, 2:50:49 PM9/7/17

to Bruce Hoult, Stefan O'Rear, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

RV32Q was a legal combination at the time we decided to have 16-byte
stack alignment for RV32. Sometime later, the ISA spec was amended to
make the statement you quoted.

On Thu, Sep 7, 2017 at 11:18 AM, Bruce Hoult <br...@hoult.org> wrote:
> I quote from the RISC-V spec manual, first para, section 12:
>
> 'The 128-bit or quad-precision binary floating-point instruction subset is
> named “Q”, and requires RV64IFD.'
>
> So I ask again: why apply 16 byte alignment to RV32?
>
> On Thu, Sep 7, 2017 at 9:12 PM, Stefan O'Rear <sor...@gmail.com> wrote:
>>
>> On Thu, Sep 7, 2017 at 10:14 AM, Bruce Hoult <br...@hoult.org> wrote:
>> > If you don't have double precision FP then 4 bytes is enough on a 32 bit
>> > system. 8 is enough on a 64 bit system even with FP.
>>
>> 16 was chosen because of the possible presence of the Q extension.
>>
>> -s
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to sw-dev+un...@groups.riscv.org.

>> To post to this group, send email to sw-...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CADJ6UvOD7A%2BktayXU3X%3DJaLoojE1b%3Dt5XdP6v2Fgs_XxpBqR%3Dg%40mail.gmail.com.
>
>

> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to sw-dev+un...@groups.riscv.org.

> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit

> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAMU%2BEkxo0UtzGDGVVLpCfNBG2vVX9%2B%2Btq8r74gv7YGBc3NpZLg%40mail.gmail.com.

Bruce Hoult

unread,

Sep 7, 2017, 3:36:28 PM9/7/17

to Andrew Waterman, Stefan O'Rear, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

OK, so .. a concrete proposal:

There are at least four mutually-incompatible ABIs already, counting only floating point differences (M and C multiply this further of course):

1) RV32 soft float

2) RV32 hard float

3) RV64 soft float

4) RV64 hard float

(I suspect RV64 soft float might be comparatively rare, but let's assume someone in the embedded world wants it)

Libraries compiled for any one of these will never be mixed with code compiled for any of the others. They can safely have arbitrary differences.

Proposal:

1) hard float ABIs have alignment equal to twice the integer register size. This allows for D float on RV32 and Q float on RV64.

2) soft float ABIs have alignment equal to the integer register size.

I'd be willing to drop 2), but since those ABIs are non miscible with the others anyway, why not?

These are the minimum required alignments for stack pointer, heap objects, and program data sections (code too?). A compiler or runtime system might choose to use larger alignment or sizes when it's more convenient. For example, an RV32 soft float compiler might use C.ADDI to implement 4, 8, and 12 byte stack frames, but for anything larger it might choose to round the frame size (but *not* the alignment) up to a multiple of 16 and use an C.ADDI16SP instruction to adjust the stack instead of a larger 4 byte alignment using ADDI.

Non Proposal:

3) RV32IF and RV64IFD could have alignment equal to the integer register size. However this would require two more incompatible ABIs and sets of libraries for no great benefit. Machines with hardware float are unlikely to be suffering from extremely constrained RAM. It makes little sense for RV32IMAFC libraries to not be usable on an RV32IMAFDC machine.

On Thu, Sep 7, 2017 at 9:50 PM, Andrew Waterman <and...@sifive.com> wrote:

RV32Q was a legal combination at the time we decided to have 16-byte
stack alignment for RV32. Sometime later, the ISA spec was amended to
make the statement you quoted.

On Thu, Sep 7, 2017 at 11:18 AM, Bruce Hoult <br...@hoult.org> wrote:
> I quote from the RISC-V spec manual, first para, section 12:
>
> 'The 128-bit or quad-precision binary floating-point instruction subset is
> named “Q”, and requires RV64IFD.'
>
> So I ask again: why apply 16 byte alignment to RV32?
>
> On Thu, Sep 7, 2017 at 9:12 PM, Stefan O'Rear <sor...@gmail.com> wrote:
>>
>> On Thu, Sep 7, 2017 at 10:14 AM, Bruce Hoult <br...@hoult.org> wrote:
>> > If you don't have double precision FP then 4 bytes is enough on a 32 bit
>> > system. 8 is enough on a 64 bit system even with FP.
>>
>> 16 was chosen because of the possible presence of the Q extension.
>>
>> -s
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to sw-dev+unsubscribe@groups.riscv.org.

>> To post to this group, send email to sw-...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CADJ6UvOD7A%2BktayXU3X%3DJaLoojE1b%3Dt5XdP6v2Fgs_XxpBqR%3Dg%40mail.gmail.com.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to sw-dev+unsubscribe@groups.riscv.org.

> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit

> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAMU%2BEkxo0UtzGDGVVLpCfNBG2vVX9%2B%2Btq8r74gv7YGBc3NpZLg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2B%2B6G0CJvVbRZEH1_TLN0ciXDq55hc9oPUCAG7ioLtgz7Lb4FA%40mail.gmail.com.

Andrew Waterman

unread,

Sep 7, 2017, 3:59:26 PM9/7/17

to Bruce Hoult, Stefan O'Rear, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On Thu, Sep 7, 2017 at 12:36 PM, Bruce Hoult <br...@hoult.org> wrote:
> OK, so .. a concrete proposal:
>
> There are at least four mutually-incompatible ABIs already, counting only
> floating point differences (M and C multiply this further of course):
>
> 1) RV32 soft float
> 2) RV32 hard float
> 3) RV64 soft float
> 4) RV64 hard float
>
> (I suspect RV64 soft float might be comparatively rare, but let's assume
> someone in the embedded world wants it)
>
> Libraries compiled for any one of these will never be mixed with code
> compiled for any of the others. They can safely have arbitrary differences.
>
> Proposal:
>
> 1) hard float ABIs have alignment equal to twice the integer register size.
> This allows for D float on RV32 and Q float on RV64.

This proposal makes sense to me, since RV32Q is not a legal
combination. Bear in mind this change is not ABI-compatible. It's
quite possibly benign at this stage, but others would have to speak to
that.

>
> 2) soft float ABIs have alignment equal to the integer register size.
>
> I'd be willing to drop 2), but since those ABIs are non miscible with the
> others anyway, why not?

Option 2 is not great because you can use floating-point ISA
extensions with soft-float ABIs.

(Since RV32E isn't compatible with hardware floating-point, this
argument doesn't apply, hence its 4-byte stack alignment.)

>> >> email to sw-dev+un...@groups.riscv.org.

>> >> To post to this group, send email to sw-...@groups.riscv.org.
>> >> Visit this group at
>> >> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> >> To view this discussion on the web visit
>> >>
>> >> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CADJ6UvOD7A%2BktayXU3X%3DJaLoojE1b%3Dt5XdP6v2Fgs_XxpBqR%3Dg%40mail.gmail.com.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "RISC-V SW Dev" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > email to sw-dev+un...@groups.riscv.org.

>> > To post to this group, send email to sw-...@groups.riscv.org.
>> > Visit this group at
>> > https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAMU%2BEkxo0UtzGDGVVLpCfNBG2vVX9%2B%2Btq8r74gv7YGBc3NpZLg%40mail.gmail.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to sw-dev+un...@groups.riscv.org.

David Chisnall

unread,

Sep 8, 2017, 3:22:10 AM9/8/17

to Bruce Hoult, Alex Bradbury, RISC-V SW Dev

On 7 Sep 2017, at 15:01, Bruce Hoult <br...@hoult.org> wrote:
>
> In the embedded world this can make a significant difference.

The impact on the embedded world should be minimal with a decent toolchain. The ABI-defined calling convention is only required for functions that are exposed for interoperability. In application code, this means anything exposed by or to shared libraries but in embedded code this only means functions that are visible to assembly routines (often just the initial entry point and interrupt handlers). Anything else is free to use a different calling convention.

Both gcc and clang will do this for static functions (and C++ functions in anonymous namespaces) and for functions with hidden visibility when doing link-time optimisation. This is typically done by marking all functions that are not externally visible as using a ‘fast’ calling convention, where the compiler has no requirement to use a stable ABI (and is free to use a different ABI for different calls, which happens when the compiler supports inter-procedural register allocation). When optimising for size on an embedded platform, I’d expect the toolchain to perform link-time optimisation and to set the minimum stack alignment for the fast calling convention to something much smaller than the public alignment.

David

David Chisnall

unread,

Sep 8, 2017, 3:27:48 AM9/8/17

to Andrew Waterman, Bruce Hoult, Stefan O'Rear, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On 7 Sep 2017, at 20:59, Andrew Waterman <and...@sifive.com> wrote:
>
>> Proposal:
>>
>> 1) hard float ABIs have alignment equal to twice the integer register size.
>> This allows for D float on RV32 and Q float on RV64.
>
> This proposal makes sense to me, since RV32Q is not a legal
> combination. Bear in mind this change is not ABI-compatible. It's
> quite possibly benign at this stage, but others would have to speak to
> that.

I would strongly recommend against reducing the stack alignment. Experience from x86, ARM, and MIPS has shown us four things:

- ISAs *always* grow extensions that require stricter alignment (for performance if not correctness)

- Increasing the default alignment once there is a large body of extant code is practically impossible.

- Forceable stack realignment on every function using the newer data types is painful.

- Data types with stricter alignment always end up in buffers allocated on the stack by older code (especially true for things like autovectorisation) and special-casing this hurts performance.

This leads to all sorts of corner cases where you scream at the lack of foresight by the original ABI designers. If we reduce the minimum stack alignment now, we are going to suffer in a few years, for the sole benefit of saving a small amount of d-cache for application code and working around poorly designed embedded toolchains (which have no requirement to follow the ABI for the vast majority of functions).

David

Andrew Waterman

unread,

Sep 8, 2017, 1:00:03 PM9/8/17

to David Chisnall, Alex Bradbury, Bruce Hoult, RISC-V SW Dev, Samuel Falvo II, Stefan O'Rear

Good points, David.

A compromise is to keep the ABI stake alignment, and add a stack-alignment flag, so embedded developers can reduce it below the ABI requirement.

Palmer Dabbelt

unread,

Sep 8, 2017, 3:10:13 PM9/8/17

to Andrew Waterman, David.C...@cl.cam.ac.uk, Alex Bradbury, br...@hoult.org, sw-...@groups.riscv.org, sam....@gmail.com, sor...@gmail.com

I like this option best.

Richard W.M. Jones

unread,

Sep 9, 2017, 3:33:40 PM9/9/17

to Samuel Falvo II, Bruce Hoult, Alex Bradbury, RISC-V SW Dev

OCaml on x86 also used to ignore the (semi-documented) 16 byte stack
alignment.

However it caused very strange and difficult to debug problems on
paths which called from OCaml -> C. GCC and CLANG -- which assume the
16 byte alignment -- would use some x86 instructions that require
alignment and that would segfault because the assumption was sometimes
incorrect. (eg: https://caml.inria.fr/mantis/view.php?id=5700#c10779
https://caml.inria.fr/mantis/view.php?id=6038)

Of course if your compiled code can never call C then you should be OK.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Bruce Hoult

unread,

Sep 9, 2017, 4:09:14 PM9/9/17

to Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

Note that I'm talking about doing this only on CPUs that *don't* have any instructions requiring large alignment.

For example an RV32 machine without hardware FP or with hardware single precision but not double precision has no need of 8 byte alignment, let alone 16 byte.

I bow to the argument that even if the function call ABI is soft FP, hardware FP may be present on the CPU and used for calculations within the function, and loading and storing from structures or arrays in memory.

That's why the important consideration is the arch, not the abi, at least for functions that are not exported.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/20170909193335.GI30459%40redhat.com.

Cesar Eduardo Barros

unread,

Sep 9, 2017, 9:27:49 PM9/9/17

to Andrew Waterman, David Chisnall, Alex Bradbury, Bruce Hoult, RISC-V SW Dev, Samuel Falvo II, Stefan O'Rear

Em 08-09-2017 13:59, Andrew Waterman escreveu:
> Good points, David.
>
> A compromise is to keep the ABI stake alignment, and add a
> stack-alignment flag, so embedded developers can reduce it below the ABI
> requirement.

The problem with adding a flag is that users might compile code with the
flag, which works *for now*... until some dynamic library gets upgraded,
and starts using new instructions which require the ABI-specified alignment.

This has recently happened with Fedora: the 32-bit x86 C library was
upgraded to a newer version, and this new version used aligned vector
instructions to the stack. That broke binary-only programs which were
compiled assuming a smaller alignment. The solution was to disable
modern vector instructions for the 32-bit x86 C library (which exists
mostly for compatibility, modern Fedora is 64-bit).

IMO, the ideal minimum stack alignment for RISC-V should be 2*XLEN,
since a future extension might want to implement double-wide LR/SC (or
double-wide CAS). For RV32, however, I'd argue that the ideal minimum
stack alignment should also be the same as RV64, since RV32G also has
64-bit registers (for floating-point). And as another potential
argument, a packed vector with 4 32-bit elements would also be naturally
aligned when using a 16-byte alignment. Therefore, it seems to me that
16-byte stack alignment is a sweet spot.

> On Fri, Sep 8, 2017 at 12:27 AM David Chisnall
> <David.C...@cl.cam.ac.uk <mailto:David.C...@cl.cam.ac.uk>> wrote:
>
> On 7 Sep 2017, at 20:59, Andrew Waterman <and...@sifive.com

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Bruce Hoult

unread,

Sep 10, 2017, 4:36:25 AM9/10/17

to Cesar Eduardo Barros, Andrew Waterman, David Chisnall, Alex Bradbury, RISC-V SW Dev, Samuel Falvo II, Stefan O'Rear

On Sun, Sep 10, 2017 at 4:27 AM, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:

Em 08-09-2017 13:59, Andrew Waterman escreveu:

Good points, David.

A compromise is to keep the ABI stake alignment, and add a stack-alignment flag, so embedded developers can reduce it below the ABI requirement.

The problem with adding a flag is that users might compile code with the flag, which works *for now*... until some dynamic library gets upgraded, and starts using new instructions which require the ABI-specified alignment.

This has recently happened with Fedora: the 32-bit x86 C library was upgraded to a newer version, and this new version used aligned vector instructions to the stack. That broke binary-only programs which were compiled assuming a smaller alignment. The solution was to disable modern vector instructions for the 32-bit x86 C library (which exists mostly for compatibility, modern Fedora is 64-bit).

An alternative solution would be for such functions to re-align the stack to their needs. This only requires ANDing the SP with ~0xf and saving the old SP value and restoring it instead of assuming you know how much to add to restore it.

The affected functions are certainly uncommon, and probably quite heavy, so neither the static nor dynamic number of instructions executed should be much affected, as a percentage.

IMO, the ideal minimum stack alignment for RISC-V should be 2*XLEN, since a future extension might want to implement double-wide LR/SC (or double-wide CAS). For RV32, however, I'd argue that the ideal minimum stack alignment should also be the same as RV64, since RV32G also has 64-bit registers (for floating-point). And as another potential argument, a packed vector with 4 32-bit elements would also be naturally aligned when using a 16-byte alignment. Therefore, it seems to me that 16-byte stack alignment is a sweet spot.

Neither possible future ISA extensions nor possible future library versions apply to statically linked programs shipped in ROM.

It's been said many times there there is an assumption any RISC-V system running Linux will be 64 bit and 32 bit is mostly for embedded use.

David Chisnall

unread,

Sep 10, 2017, 5:10:10 AM9/10/17

to Bruce Hoult, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On 9 Sep 2017, at 21:09, Bruce Hoult <br...@hoult.org> wrote:
>
> Note that I'm talking about doing this only on CPUs that *don't* have any instructions requiring large alignment.

The problem with this argument is that very few binaries (outside of firmware and the purely embedded space where the compiler is free to disregard the ABI entirely) ever target a CPU, they target an ISA and they need to interoperate with code that targets a superset of that ISA.

The case of functions that allocate their own on-stack space is the easy one to fix. You can forcibly realign the stack (though, as I’ve said previously, this has nontrivial overhead) for these and they’ll work. The bigger problem comes from functions that take buffers from elsewhere. If the buffers are allocated on the stack then you either rely on the caller to align them adequately, or you have slow-path code to handle incorrect alignment. If your caller has an 8-byte stack alignment guarantee and your callee expects the on-stack buffer to be 16-byte aligned then it will work on average 50% of the time. This kind of thing is *incredibly* hard to debug[1] and causes all sorts of failures.

TL;DR: Increasing the alignment requirements later is really, really hard.

The embedded argument continues to not make sense, because embedded code is completely free to use a different ABI per function call and a modern compiler doing link-time optimisation and interprocedural register allocation (both of which result in smaller binaries and less stack usage, so should be encouraged for embedded) will do precisely this.

Constraining an ABI that needs to be stable for a decade or more to work around limitations in specific embedded toolchains seems like a terrible idea.

David

[1] I speak from experience here, because we’ve moved from 16-byte to stack 32-byte alignment in the last few years - the number of times where the debugging environment happens to get 32-byte alignment, but someone else manages to get 16-byte alignment is quite large.

Bruce Hoult

unread,

Sep 10, 2017, 5:19:21 AM9/10/17

to David Chisnall, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On Sun, Sep 10, 2017 at 12:09 PM, David Chisnall <David.C...@cl.cam.ac.uk> wrote:

On 9 Sep 2017, at 21:09, Bruce Hoult <br...@hoult.org> wrote:
>
> Note that I'm talking about doing this only on CPUs that *don't* have any instructions requiring large alignment.

The problem with this argument is that very few binaries (outside of firmware and the purely embedded space where the compiler is free to disregard the ABI entirely) ever target a CPU, they target an ISA and they need to interoperate with code that targets a superset of that ISA

I'm talking about firmware and the embedded space.

gcc currently does not have any way to tell it to not use ABI alignment.

If I (for example) submit a patch to add a flag to tell it to use a smaller alignment then I'll likely be told "that violates the ABI in the architecture manual".

David Chisnall

unread,

Sep 10, 2017, 5:29:45 AM9/10/17

to Bruce Hoult, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On other architecture at least, gcc will happily violate the ABI for any static functions or functions with private linkage when performing link-time optimisation. The ABI is specifically for code that must interwork with other code over a long time, there is no requirement that it be used for purely internal calls (which, in much embedded code, means all function calls).

David

Richard W.M. Jones

unread,

Sep 10, 2017, 5:31:23 AM9/10/17

to David Chisnall, Bruce Hoult, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On Sun, Sep 10, 2017 at 10:09:59AM +0100, David Chisnall wrote:
> [1] I speak from experience here, because we’ve moved from 16-byte
> to stack 32-byte alignment in the last few years

Out of interest, which platform has a 32 byte alignment requirement?

> the number of
> times where the debugging environment happens to get 32-byte
> alignment, but someone else manages to get 16-byte alignment is
> quite large.

This sums up the problem I had tracking down that OCaml bug very nicely :-)

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com

virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

David Chisnall

unread,

Sep 10, 2017, 5:40:04 AM9/10/17

to Richard W.M. Jones, Bruce Hoult, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On 10 Sep 2017, at 10:31, Richard W.M. Jones <rjo...@redhat.com> wrote:
>
> On Sun, Sep 10, 2017 at 10:09:59AM +0100, David Chisnall wrote:
>> [1] I speak from experience here, because we’ve moved from 16-byte
>> to stack 32-byte alignment in the last few years
>
> Out of interest, which platform has a 32 byte alignment requirement?

Our experimental research architecture (http://cheri-cpu.org) requires pointers to be 32-byte aligned and in our current implementation is a superset of MIPS, which has a 16-byte stack alignment.

MIPS is actually a really good example of this issue, because the o32 ABI, as a result of having fairly tight initial stack alignment requirements, actually mandates passing vector arguments with incorrect alignment, which requires that the caller and the callee both spill to the stack and then use integer (non-vector) loads and stores to access the argument save area, because the vector extension doesn’t allow vector loads and stores to the place that the ABI requires.

>
>> the number of
>> times where the debugging environment happens to get 32-byte
>> alignment, but someone else manages to get 16-byte alignment is
>> quite large.
>
> This sums up the problem I had tracking down that OCaml bug very nicely :-)

At least the OCaml case is relatively easy to fix, because you can forcibly realign the stack before calling C. Forth does this as well, with an internal ABI that has very weak alignment (Forth can use a lot of very small stack frames and so really doesn’t want to use the C ABI) and a little stub that realigns the stack before setting up the call frame for invoking C.

David

Bruce Hoult

unread,

Sep 10, 2017, 6:16:05 AM9/10/17

to David Chisnall, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

I don't see it. Please explain how to make this happen. Recursive function print_uint saves only ra and s0 on the stack, but allocates a 16 byte stack frame for them. gcc is riscv32-unknown-elf-gcc (GCC) 6.1.0

----------

#include <stdio.h>

__attribute__ ((noinline))

static void myputchar(char c){

putchar(c);

}

static void print_uint(unsigned u) {

unsigned d = u/10;

char ch = u - d*10 + '0';

if (d != 0) print_uint(d);

myputchar(ch);

}

int main(){

print_uint(12345);

printf("\ndone\n");

}

----------

$ riscv32-unknown-elf-gcc foo.c -o foo -Os -march=rv32imc -flto

----------

000101c2 <print_uint>:

101c2: 1141 addi sp,sp,-16

101c4: c422 sw s0,8(sp)

101c6: 4429 li s0,10

101c8: 87aa mv a5,a0

101ca: 02855533 divu a0,a0,s0

101ce: 03078793 addi a5,a5,48

101d2: c606 sw ra,12(sp)

101d4: 02850433 mul s0,a0,s0

101d8: 40878433 sub s0,a5,s0

101dc: 0ff47413 andi s0,s0,255

101e0: c111 beqz a0,101e4 <print_uint+0x22>

101e2: 37c5 jal 101c2 <print_uint>

101e4: 8522 mv a0,s0

101e6: 40b2 lw ra,12(sp)

101e8: 4422 lw s0,8(sp)

101ea: 0141 addi sp,sp,16

101ec: b75d j 10192 <myputchar>

----------

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/94B797DF-B0F6-4BBA-8630-7CC4C50EB735%40cl.cam.ac.uk.

David Chisnall

unread,

Sep 10, 2017, 6:47:00 AM9/10/17

to Bruce Hoult, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

On 10 Sep 2017, at 11:16, Bruce Hoult <br...@hoult.org> wrote:
>
> Recursive function print_uint saves only ra and s0 on the stack, but allocates a 16 byte stack frame for them. gcc is riscv32-unknown-elf-gcc (GCC) 6.1.0

How is the fast calling convention defined for RISC-V gcc? What is its stack alignment?

David

Bruce Hoult

unread,

Sep 10, 2017, 6:53:45 AM9/10/17

to David Chisnall, Richard W.M. Jones, Samuel Falvo II, Alex Bradbury, RISC-V SW Dev

This is not an x86. Calls are already designed properly. Adding "__attribute__((fastcall)): results in:

foo.c:9:1: warning: 'fastcall' attribute directive ignored [-Wattributes]

David

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/5DD6598E-D809-4EDF-A042-41F7FBDAD224%40cl.cam.ac.uk.

Reply all

Reply to author

Forward