Keith Randall submitted the change![Open in Gerrit]()
Change information
Commit message:
cmd/compile: use 128-bit arm64 vector ops for Move expansion
Update Move rewrite rules to use FMOVQload/store and FLDPQ/FSTPQ
for medium-sized copies (16-64 bytes). This generates fewer and
wider instructions than the previous approach using LDP/STP pairs.
Executable Base .text go1 Change
----------------------------------------------------
asm 2112308 2105732 -0.31%
cgo 1826132 1823172 -0.16%
compile 10474868 10460644 -0.14%
cover 1990036 1985748 -0.22%
fix 3234116 3226340 -0.24%
link 2702628 2695316 -0.27%
preprofile 947652 947028 -0.07%
vet 3140964 3133524 -0.24%
Performance effect on OrangePi 6 plus:
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4711n ± 0% 0.3852n ± 0% -18.23% (p=0.000 n=10)
CopyFat17 0.7705n ± 0% 0.7705n ± 0% ~ (p=0.984 n=10)
CopyFat18 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.771 n=10)
CopyFat19 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.637 n=10)
CopyFat20 0.7703n ± 0% 0.7704n ± 0% ~ (p=0.103 n=10)
CopyFat21 0.7703n ± 0% 0.7708n ± 0% ~ (p=0.505 n=10)
CopyFat22 0.7704n ± 0% 0.7705n ± 0% ~ (p=0.589 n=10)
CopyFat23 0.7703n ± 0% 0.7703n ± 0% ~ (p=0.347 n=10)
CopyFat24 0.7704n ± 0% 0.7703n ± 0% ~ (p=0.383 n=10)
CopyFat25 0.8385n ± 0% 0.6589n ± 0% -21.41% (p=0.000 n=10)
CopyFat26 0.8386n ± 0% 0.6590n ± 0% -21.42% (p=0.000 n=10)
CopyFat27 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat28 0.8386n ± 0% 0.6571n ± 0% -21.65% (p=0.000 n=10)
CopyFat29 0.8385n ± 0% 0.6590n ± 0% -21.41% (p=0.000 n=10)
CopyFat30 0.8387n ± 0% 0.6591n ± 0% -21.42% (p=0.000 n=10)
CopyFat31 0.8385n ± 0% 0.6589n ± 0% -21.42% (p=0.000 n=10)
CopyFat32 0.8318n ± 0% 0.4969n ± 0% -40.26% (p=0.000 n=10)
CopyFat33 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat34 1.1560n ± 0% 0.7703n ± 0% -33.37% (p=0.000 n=10)
CopyFat35 1.1550n ± 0% 0.7705n ± 0% -33.29% (p=0.000 n=10)
CopyFat36 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat37 1.1555n ± 0% 0.7704n ± 0% -33.33% (p=0.000 n=10)
CopyFat38 1.1550n ± 0% 0.7704n ± 0% -33.30% (p=0.000 n=10)
CopyFat39 1.1560n ± 0% 0.7703n ± 0% -33.36% (p=0.000 n=10)
CopyFat40 1.0020n ± 0% 0.7705n ± 0% -23.10% (p=0.000 n=10)
CopyFat41 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat42 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat43 1.2060n ± 0% 0.7705n ± 0% -36.11% (p=0.000 n=10)
CopyFat44 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat45 1.2060n ± 0% 0.7704n ± 0% -36.12% (p=0.000 n=10)
CopyFat46 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat47 1.2060n ± 0% 0.7703n ± 0% -36.12% (p=0.000 n=10)
CopyFat48 1.2060n ± 0% 0.7703n ± 0% -36.13% (p=0.000 n=10)
CopyFat49 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat50 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat51 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat52 1.3620n ± 0% 0.8623n ± 0% -36.69% (p=0.000 n=10)
CopyFat53 1.3620n ± 0% 0.8621n ± 0% -36.70% (p=0.000 n=10)
CopyFat54 1.3620n ± 0% 0.8622n ± 0% -36.70% (p=0.000 n=10)
CopyFat55 1.3620n ± 0% 0.8620n ± 0% -36.71% (p=0.000 n=10)
CopyFat56 1.3120n ± 0% 0.8622n ± 0% -34.28% (p=0.000 n=10)
CopyFat57 1.5905n ± 0% 0.8621n ± 0% -45.80% (p=0.000 n=10)
CopyFat58 1.5830n ± 1% 0.8622n ± 0% -45.53% (p=0.000 n=10)
CopyFat59 1.5865n ± 1% 0.8621n ± 0% -45.66% (p=0.000 n=10)
CopyFat60 1.5720n ± 1% 0.8622n ± 0% -45.15% (p=0.000 n=10)
CopyFat61 1.5900n ± 1% 0.8621n ± 0% -45.78% (p=0.000 n=10)
CopyFat62 1.5890n ± 0% 0.8622n ± 0% -45.74% (p=0.000 n=10)
CopyFat63 1.5900n ± 1% 0.8620n ± 0% -45.78% (p=0.000 n=10)
CopyFat64 1.5440n ± 0% 0.8568n ± 0% -44.51% (p=0.000 n=10)
geomean 1.093n 0.7636n -30.13%
Kunpeng 920C:
goos: linux
goarch: arm64
pkg: runtime
│ orig.out │ movq.out │
│ sec/op │ sec/op vs base │
CopyFat16 0.4892n ± 1% 0.5072n ± 0% +3.68% (p=0.000 n=10)
CopyFat17 0.6394n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat18 0.6394n ± 0% 0.4638n ± 0% -27.46% (p=0.000 n=10)
CopyFat19 0.6395n ± 0% 0.4638n ± 0% -27.48% (p=0.000 n=10)
CopyFat20 0.6393n ± 0% 0.4638n ± 0% -27.45% (p=0.000 n=10)
CopyFat21 0.6394n ± 0% 0.4637n ± 0% -27.48% (p=0.000 n=10)
CopyFat22 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat23 0.6395n ± 0% 0.4638n ± 0% -27.47% (p=0.000 n=10)
CopyFat24 0.6091n ± 0% 0.4639n ± 0% -23.84% (p=0.000 n=10)
CopyFat25 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat26 0.9107n ± 0% 0.4674n ± 0% -48.68% (p=0.000 n=10)
CopyFat27 0.9108n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat28 0.9109n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat29 0.9110n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat30 0.9109n ± 0% 0.4673n ± 0% -48.70% (p=0.000 n=10)
CopyFat31 0.9110n ± 0% 0.4674n ± 0% -48.69% (p=0.000 n=10)
CopyFat32 0.6845n ± 0% 0.4845n ± 1% -29.21% (p=0.000 n=10)
CopyFat33 0.9130n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat34 0.9131n ± 0% 0.9118n ± 0% -0.14% (p=0.001 n=10)
CopyFat35 0.9131n ± 0% 0.9117n ± 0% -0.15% (p=0.001 n=10)
CopyFat36 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.003 n=10)
CopyFat37 0.9129n ± 0% 0.9117n ± 0% -0.14% (p=0.000 n=10)
CopyFat38 0.9130n ± 0% 0.9118n ± 0% -0.14% (p=0.000 n=10)
CopyFat39 0.9131n ± 0% 0.9118n ± 0% -0.15% (p=0.000 n=10)
CopyFat40 0.9112n ± 0% 0.9118n ± 0% +0.07% (p=0.027 n=10)
CopyFat41 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat42 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat43 1.1390n ± 0% 0.9116n ± 0% -19.96% (p=0.000 n=10)
CopyFat44 1.1390n ± 0% 0.9119n ± 0% -19.94% (p=0.000 n=10)
CopyFat45 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat46 1.1390n ± 0% 0.9118n ± 0% -19.95% (p=0.000 n=10)
CopyFat47 1.1390n ± 0% 0.9117n ± 0% -19.96% (p=0.000 n=10)
CopyFat48 0.9111n ± 0% 0.9116n ± 0% +0.06% (p=0.002 n=10)
CopyFat49 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat50 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat51 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat52 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat53 1.2160n ± 0% 0.9293n ± 0% -23.58% (p=0.000 n=10)
CopyFat54 1.2160n ± 0% 0.9302n ± 0% -23.50% (p=0.000 n=10)
CopyFat55 1.2160n ± 0% 0.9292n ± 0% -23.59% (p=0.000 n=10)
CopyFat56 1.1480n ± 0% 0.9303n ± 0% -18.96% (p=0.000 n=10)
CopyFat57 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat58 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat59 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat60 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat61 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat62 1.3690n ± 0% 0.9303n ± 0% -32.05% (p=0.000 n=10)
CopyFat63 1.3690n ± 0% 0.9293n ± 0% -32.12% (p=0.000 n=10)
CopyFat64 1.1470n ± 0% 0.5742n ± 0% -49.94% (p=0.000 n=10)
geomean 0.9710n 0.7214n -25.70%
Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512
Files:
- M src/cmd/compile/internal/ssa/_gen/ARM64.rules
- M src/cmd/compile/internal/ssa/rewriteARM64.go
Change size: L
Delta: 2 files changed, 80 insertions(+), 176 deletions(-)
Branch: refs/heads/master
Submit Requirements:
Code-Review: +1 by Cherry Mui, +2 by Keith Randall, +1 by Keith Randall
TryBots-Pass: LUCI-TryBot-Result+1 by Go LUCI
Open in Gerrit
Gerrit-MessageType: merged
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: Iecfe52fde1d431a1e4503cd848813a67f3896512
Gerrit-Change-Number: 738261
Gerrit-PatchSet: 6