About supporting new instructions of ARMv8.1

320 views
Skip to first unread message

Wei....@arm.com

unread,
Oct 26, 2017, 6:31:05 AM10/26/17
to golang-dev
As you know, arm architecture is evolving constantly and quickly. Current GOARCH="arm64" is for ARMv8.0 and many arm server CPUs have begun to support ARMv8.1 which adds many new features.
One typical feature is that a set of atomic instructions have been introduced. Take the intrinsic: atomic.AddInt64 for example, following 4 instructions are needed to implement it with ARMv8.0.

  LDAXR        (Rarg0), Rout

  ADD          Rarg1, Rout

  STLXR        Rout, (Rarg0), Rtmp

  CBNZ         Rtmp, -3(PC)


But with ARMv8.1, only 1 instruction is needed to implement it, as shown below:

  LDADDA    (Rarg0), Rtmp, Rout


So we are considering to support these new instructions introduced by ARMv8.1 for better performance and scalability. Generally, there are 2 solutions:

Solution 1: add variant, just like variants of GOARCH="arm". We can reuse GoArm (https://github.com/golang/go/wiki/GoArm) or add a new one for specifying CPU variant at compilation-time.

Solution 2: dynamic feature detection, just like what have done for some optional feature (such as CRC) of ARMv8.0. The feature detection happens when atomic package is imported.


Obviously, solution 1 has better performance since the variant selection happens at static compilation-time and compiler (gc) can inline the instruction into its caller. But it may result in other problems. E.g OCI (open container initiative) refers architectures and variants supported by Golang and adding more variants for arm64 may result in fragmentation on container images.

Solution 2 doesn't have fragmentation problem and user just needs to build one binary for both ARMv8.0 and ARMv8.1 CPU. But it can't exploit the best performance since additional instructions (something like C function pointer) are needed to choose the right implementation at run-time and the overhead should be nontrivial. There can even be performance regression for ARMv8.0 CPU due to the additional overhead incurred by dynamic feature detection.


Any other solution to void fragmentation and achieve best performance? If no, which one is more suitable to be implemented?




David Crawshaw

unread,
Oct 26, 2017, 10:11:05 AM10/26/17
to Wei....@arm.com, golang-dev
Do you know if ARMv8.1 is destined for distinct product lines? That
is, will consumer devices remain ARMv8.0, and servers be ARMv8.1, or
will consumer devices eventually be ARMv8.1?

If they are going to remain split, then both options are available as
programmers will know at compile time what general categories of chips
their binaries are going to. If however ARMv8.1 is a replacement for
8.0, then it will be much harder for programmers to target their
binaries, and your "solution 2", dynamic detection, has a significant
advantage.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages