Seeking inputs for EABI design

90 views

Skip to first unread message

Kito Cheng

unread,

Jan 18, 2022, 12:51:24 AM1/18/22

to RISC-V SW Dev (sw-dev@groups.riscv.org)

Hi :

I am Kito from the RISC-V psABI group, currently we are discussing the
embedded ABI(EABI), and seeking inputs for the ABI design, feel free
to tell us what your thoughts and expectations are for the EABI.

You can create an issue on the psabi spec
repo(https://github.com/riscv-non-isa/riscv-elf-psabi-doc) or just
reply to this mail :)

Here is the list we collect from the community about the EABI, items
in list the might *NOT* included in final EABI spec, plz let me know
if some item is useful to you:

- Size of long double:
- 128 bit floating point generally is unnecessary for embedded applications.
- Stack alignment
- 16 byte alignment is required due to the long double is 128 bits,
however this might be a waste for embedded applications.
- Use TP as secondary GP to further reduce the code size by linker relaxation.
- Pass 64-bit in an odd-even pair register in RV32, and must start
with odd registers.
- Optimized for P-ext and Zdinx in RV32.
- No unify EABI, able to customize vendor’s own EABI like -march can
do any valid combination.
- Pros: RISC-V spirit, smörgåsbord on ABI.
- Cons: Fragmentation, and hard to testing all combination

Jim Wilson

unread,

Jan 18, 2022, 1:37:00 AM1/18/22

to Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)

On Mon, Jan 17, 2022 at 9:51 PM Kito Cheng <kito....@sifive.com> wrote:

I am Kito from the RISC-V psABI group, currently we are discussing the
embedded ABI(EABI), and seeking inputs for the ABI design, feel free
to tell us what your thoughts and expectations are for the EABI.

Krste made a proposal long ago.

https://github.com/riscv-non-isa/riscv-eabi-spec

It has some of the things you mentioned, but also reduces the number of arg registers so that the code can work on both rv32i and rv32e. Also, there was an attempt to reduce the number of registers that need to be saved on an interrupt, to provide faster interrupts.

Anders Berg of IAR made a counter proposal. There should be mentions of it on the tech-eabi list, but I'm not sure if it was explicitly written down. It was discussed in meetings a number of times. He thought that reducing the number or arg regs and saved regs was misguided. He argued for toolchain optimizations instead. But Anders' proposal makes assumptions about complete control of the programming environment which is not true for LLVM or GCC which need to work with multiple libraries and linkers. I don't think Anders' proposal could be easily implemented in LLVM or GCC. But it might be good for a true embedded compiler like IAR.

Anders also made an attempt to define a runtime ABI document, e.g. what we call libgcc in the GNU toolchain. This was based heavily on an ARM runtime ABI document, and would need extensive rewriting before we could use it. This is also on the tech-eabi mailing list. This for example defines a "float __rvabi_fadd(float, float)" function for adding two float values, which is what gcc calls __addsf3. GCC could easily be modified to use alternate names for these functions if we want to do that. It already does this for ARM targets. This is message #53 on the tech-eabi list, from Jan 6 2021.

Anyways, I suggest contacting Anders Berg and including him in the discussion.

Whatever choices we make, I would suggest that there should be a trial implementation first, in either llvm or gcc, and some code size and performance benchmarking work done. I did some experiments with Krste's proposal, but I never had time to do a complete toolchain port (including libraries) so my results aren't convincing.

Jim

Kito Cheng

unread,

Jan 24, 2022, 9:06:21 AM1/24/22

to Jim Wilson, RISC-V SW Dev (sw-dev@groups.riscv.org)

Hi Jim:

Thanks for your feedback!

> Krste made a proposal long ago.
> https://github.com/riscv-non-isa/riscv-eabi-spec
> It has some of the things you mentioned, but also reduces the number of arg registers so that the code can work on both rv32i and rv32e. Also, there was an attempt to reduce the number of registers that need to be saved on an interrupt, to provide faster interrupts.

Thanks for reminding that, I guess I should list those item on the
list, although some point I didn't fully agree :p

> Anders Berg of IAR made a counter proposal. There should be mentions of it on the tech-eabi list, but I'm not sure if it was explicitly written down. It was discussed in meetings a number of times. He thought that reducing the number or arg regs and saved regs was misguided. He argued for toolchain optimizations instead. But Anders' proposal makes assumptions about complete control of the programming environment which is not true for LLVM or GCC which need to work with multiple libraries and linkers. I don't think Anders' proposal could be easily implemented in LLVM or GCC. But it might be good for a true embedded compiler like IAR.

From my point of view, optimizing interrupt speed via ABI is kind of
weird to me, since that might punish non-interrupt routines

> Anders also made an attempt to define a runtime ABI document, e.g. what we call libgcc in the GNU toolchain. This was based heavily on an ARM runtime ABI document, and would need extensive rewriting before we could use it. This is also on the tech-eabi mailing list. This for example defines a "float __rvabi_fadd(float, float)" function for adding two float values, which is what gcc calls __addsf3. GCC could easily be modified to use alternate names for these functions if we want to do that. It already does this for ARM targets. This is message #53 on the tech-eabi list, from Jan 6 2021.

Anders has mentioned that on the last psABI call too, honestly I
didn't notice that is an issue before, but I think that should be more
than EABI issue, we should have runtime ABI in psABI.

Bruce Hoult

unread,

Jan 24, 2022, 6:53:25 PM1/24/22

to Kito Cheng, Jim Wilson, RISC-V SW Dev (sw-dev@groups.riscv.org)

On Tue, Jan 25, 2022 at 3:06 AM Kito Cheng <kito....@sifive.com> wrote:

From my point of view, optimizing interrupt speed via ABI is kind of
weird to me, since that might punish non-interrupt routines

Of course it does.

The standard ABI is designed to maximize the speed of main-line (non interrupt) code. To the extent it succeeds in that, any deviation will result in slower code.

If I remember correctly, the Embench project found that RV32E using the simple truncated ABI (6 A, 2 S, 3 T vs 8 A, 12 S, 7 T) is about 1.3x slower than the standard ABI.

I suspect the EABI's proposed 4 A, 5 S, 2 T probably suffers about the same, but maybe a bit less as only having 2 S registers probably causes more spills than reducing A+T from 9 to 6. [1][2]

The purpose of RV32E is to save die space by reducing the register set by 512 bits or 64 bytes. If the 1.3x code size increase is correct and if registers are 2x more expensive than SRAM then you have parity at 32 registers and 384 bytes of code in ITIM vs 16 registers and 512 bytes of code in ITIM. That makes RV32E probably very seldom worth it.

EABI is a bit different. The only purpose I can see for the proposed EABI is for users for whom interrupt latency is everything, and they have either just about no code at all running in the "main program" or else it is low priority background tasks. They are more than happy to give up 1.3x on main program execution speed (and code size) for the trade-off of getting 2 to 2.5 times quicker interrupt response.

If your interrupt routines don't call any other functions then you're better off using the standard ABI and putting __attribute__((interrupt)) on the interrupt routines so they save only and exactly the registers they need.

Presumably EABI users' interrupt handlers are at least somewhat complex, including calling other functions. That means they also will suffer the slowdown from having fewer A+T registers. At some point you'd be better off by using the standard ABI and just saving all those 15 registers on interrupt entry (or on calling a non- __attribute__((interrupt)) function after determining "slow path" handling is needed)

No doubt there is a sweet spot between not calling any other functions and calling too many other functions where the proposed EABI (or something like it) is the best solution. It might be a very small sweet spot.

Regardless of the actual overall performance truth, there is a marketing advantage to being able to tell prospective customers (or their management) that RISC-V has an EABI with similar interrupt latency to the ABI on competitor CPUs. Even if in the end the project engineers don't actually choose to use it.

From this point of view the right number of A registers is clearly four, the same as the competition ESPECIALLY as many people have designed software around four function arguments (no more, no fewer) being supported efficiently. The question then becomes whether two T registers is too many, given that the competitor only has to save one (r12). Having two T registers quite likely (I think) results in overall faster code -- but then so does just not using the EABI in the first place, in most cases.

I think for marketing reasons we should have the RISC-V EABI mimic the competitor ABI as closely as possible, and be available and supported by the tools, even if almost no-one should end up actually using it.

[1] RV32I using the proposed EABI of course has a large number of S registers

[2] Of course this should be tested. I was ready and advocating to do this work in late 2018 but it was deemed to have insufficient priority compared to adding initial V support to Spike (and other later tasks).

Reply all

Reply to author

Forward

0 new messages