Register-based ABI benchmarks

164 views
Skip to first unread message

Didier Spezia

unread,
Feb 3, 2022, 10:19:16 AM2/3/22
to golang-nuts
We are using our own benchmark to evaluate the performance of different CPU models of cloud providers.

One point we have realized is the results of such benchmark can be biased depending on the version of the Go compiler. 

For instance, the register-based ABI has a measurable positive impact on performance, but it does not come with the same version of Go depending on the CPU architecture. When we run different versions of Go against the same code base for recent Intel and ARM CPUs, we get: https://github.com/AmadeusITGroup/cpubench1A/issues/8

It is about +10% throughput for x86_86 (from go 1.16.13 -> 1.17.6) and +17% for Aarch64 (from go 1.17.6 -> 1.18beta1). Yay!

It seems Aarch64 benefits more from the register-based ABI than x86_64.
I don''t see really why. Does anyone have a clue?
Thanks.

Best regards,
Didier.

Robert Engels

unread,
Feb 3, 2022, 10:34:58 AM2/3/22
to Didier Spezia, golang-nuts
Usually Arm cpus have a lot more registers to pass values in. 

On Feb 3, 2022, at 9:21 AM, Didier Spezia <didi...@gmail.com> wrote:

We are using our own benchmark to evaluate the performance of different CPU models of cloud providers.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0dae635a-768a-4cf4-ae05-84e294ca8745n%40googlegroups.com.

Ian Lance Taylor

unread,
Feb 3, 2022, 3:24:27 PM2/3/22
to Didier Spezia, golang-nuts
On Thu, Feb 3, 2022 at 7:21 AM Didier Spezia <didi...@gmail.com> wrote:
>
> It seems Aarch64 benefits more from the register-based ABI than x86_64.
> I don''t see really why. Does anyone have a clue?

My view is that the x86 architecture has fewer registers and has had a
massive decades-long investment in performance, so stack operations
are highly optimized in hardware, including things like forwarding
values stored in the stack by the caller to the retrieval from the
stack by the callee without waiting even for the memory cache. The
ARM architecture has more registers and has historically focused more
on power savings than on raw performance, so it has less optimization
on stack handling and benefits more from a smarter compiler.

In my experience testing compiler optimizations can be frustrating on
x86 because the hardware is just so good. Almost every other
processor architecture shows bigger benefits from compiler
optimizations.

Ian

Robert Engels

unread,
Feb 3, 2022, 8:28:17 PM2/3/22
to Ian Lance Taylor, Didier Spezia, golang-nuts
+1. Sometimes the compiler optimizations are even worse if they change the behavior the chip was typically expecting.

> On Feb 3, 2022, at 2:23 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcVBg%2BWkrT636M-VuBjnaSOjUiAd_Einso_%3DBWFWMKRttA%40mail.gmail.com.

Didier Spezia

unread,
Feb 4, 2022, 4:01:22 AM2/4/22
to Robert Engels, Ian Lance Taylor, golang-nuts
Thank you - it makes sense.

I thought there were plenty of registers for parameters, even for x86_64.

But with string, slices, interfaces, etc ... multiple registers are used, so it does not take 
so many parameters before having to spill on the stack.

Regards,
Didier.
Reply all
Reply to author
Forward
0 new messages