[go] cmd/compile: pair NEON loads/stores

4 views
Skip to first unread message

Arseny Samoylov (Gerrit)

unread,
Jun 8, 2026, 8:42:04 AM (2 days ago) Jun 8
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Arseny Samoylov has uploaded the change for review

Commit message

cmd/compile: pair NEON loads/stores
Change-Id: Id12674ae56af4c3610f1e18a6d85d169a784dcca

Change diff

diff --git a/src/cmd/compile/internal/ssa/pair.go b/src/cmd/compile/internal/ssa/pair.go
index be524dc..0d76237 100644
--- a/src/cmd/compile/internal/ssa/pair.go
+++ b/src/cmd/compile/internal/ssa/pair.go
@@ -24,14 +24,14 @@
pairStores(f)
}

-type pairableLoadInfo struct {
+type pairInfo struct {
width int64 // width of one element in the pair, in bytes
pair Op
}

// All pairableLoad ops must take 2 arguments, a pointer and a memory.
// They must also take an offset in Aux/AuxInt.
-var pairableLoads = map[Op]pairableLoadInfo{
+var pairableLoads = map[Op]pairInfo{
OpARM64MOVDload: {8, OpARM64LDP},
OpARM64MOVWUload: {4, OpARM64LDPW},
OpARM64MOVWload: {4, OpARM64LDPSW},
@@ -39,21 +39,20 @@
// if we knew the upper bits of one of them weren't being used.
OpARM64FMOVDload: {8, OpARM64FLDPD},
OpARM64FMOVSload: {4, OpARM64FLDPS},
-}
-
-type pairableStoreInfo struct {
- width int64 // width of one element in the pair, in bytes
- pair Op
+ // NEON loads
+ OpARM64FMOVQload: {16, OpARM64FLDPQ},
}

// All pairableStore keys must take 3 arguments, a pointer, a value, and a memory.
// All pairableStore values must take 4 arguments, a pointer, 2 values, and a memory.
// They must also take an offset in Aux/AuxInt.
-var pairableStores = map[Op]pairableStoreInfo{
+var pairableStores = map[Op]pairInfo{
OpARM64MOVDstore: {8, OpARM64STP},
OpARM64MOVWstore: {4, OpARM64STPW},
OpARM64FMOVDstore: {8, OpARM64FSTPD},
OpARM64FMOVSstore: {4, OpARM64FSTPS},
+ // NEON stores
+ OpARM64FMOVQstore: {16, OpARM64FSTPQ},
}

// offsetOk returns true if a pair instruction should be used
@@ -115,6 +114,8 @@
if off >= -512 && off <= 504 && off%8 == 0 {
return true
}
+ // If offsetOk is re-enabled in the future,
+ // width==16 (FLDPQ/FSTPQ) will need support here.
}
return false
}
@@ -400,12 +401,14 @@
// storeWidth returns the width of store,
// or 0 if it is not a store this pass understands.
storeWidth := func(op Op) int64 {
+ if info, ok := pairableStores[op]; ok {
+ return info.width
+ }
+
+ // We don't pair these stores, but returning zero here
+ // would flush the memory chain.
var width int64
switch op {
- case OpARM64MOVDstore, OpARM64FMOVDstore:
- width = 8
- case OpARM64MOVWstore, OpARM64FMOVSstore:
- width = 4
case OpARM64MOVHstore:
width = 2
case OpARM64MOVBstore:
@@ -413,6 +416,7 @@
default:
width = 0
}
+
return width
}

@@ -457,12 +461,15 @@
if v.Uses != 1 && len(memChain) > 0 ||
len(memChain) > 0 && (v.Args[0] != memChain[0].Args[0] || v.Aux != memChain[0].Aux) ||
len(memChain) == limit {
- // If v has multiple uses and it is not the latest store in the chain,
+ // 1. If v has multiple uses and it is not the latest store in the chain,
// we cannot merge it with other store instructions.
- // If v has a different base pointer or Aux value from the current chain,
+ //
+ // 2. If v has a different base pointer or Aux value from the current chain,
// we need to flush memChain and start a new one with v.
- // If memChain length limit is exceeded, we also need to flush the chain
+ //
+ // 3. If memChain length limit is exceeded, we also need to flush the chain
// and start a new one with v.
+ //
// Only look back so far.
// This keeps us in O(n) territory, and it
// also prevents us from keeping values
diff --git a/test/codegen/memcombine_simd.go b/test/codegen/memcombine_simd.go
new file mode 100644
index 0000000..46dfc7d
--- /dev/null
+++ b/test/codegen/memcombine_simd.go
@@ -0,0 +1,22 @@
+// asmcheck
+//go:build goexperiment.simd
+
+// Copyright 2026 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package codegen
+
+import "simd/archsimd"
+
+// TODO: Move these tests to memcombine.go when GOEXPERIMENT=simd becomes the default
+func dwloadInt64x2(p *struct{ a, b archsimd.Int64x2 }) (archsimd.Int64x2, archsimd.Int64x2) {
+ // arm64:"FLDPQ "
+ return p.a, p.b
+}
+
+func dwstoreInt64x2(p *struct{ a, b archsimd.Int64x2 }, a, b archsimd.Int64x2) {
+ // arm64:`FSTPQ\s\(F[0-9]+, F[0-9]+\), \(R[0-9]+\)`
+ p.a = a
+ p.b = b
+}

Change information

Files:
  • M src/cmd/compile/internal/ssa/pair.go
  • A test/codegen/memcombine_simd.go
Change size: M
Delta: 2 files changed, 44 insertions(+), 15 deletions(-)
Open in Gerrit

Related details

Attention set is empty
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newchange
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: Id12674ae56af4c3610f1e18a6d85d169a784dcca
Gerrit-Change-Number: 787301
Gerrit-PatchSet: 1
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Jun 8, 2026, 8:50:49 AM (2 days ago) Jun 8
to goph...@pubsubhelper.golang.org, Keith Randall, Denis Melnikov, Mark Freeman, golang-co...@googlegroups.com
Attention needed from Denis Melnikov, Keith Randall and Mark Freeman

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 1 (Latest):
Arseny Samoylov . resolved

Hi! The CL 760100 was the last on pair pass, so I've added reviewers from it.

Open in Gerrit

Related details

Attention is currently required from:
  • Denis Melnikov
  • Keith Randall
  • Mark Freeman
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: Id12674ae56af4c3610f1e18a6d85d169a784dcca
Gerrit-Change-Number: 787301
Gerrit-PatchSet: 1
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Denis Melnikov <melnikov.denis...@gmail.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Mark Freeman <markf...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Denis Melnikov <melnikov.denis...@gmail.com>
Gerrit-Attention: Mark Freeman <markf...@google.com>
Gerrit-Comment-Date: Mon, 08 Jun 2026 12:50:41 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Jun 8, 2026, 9:25:57 AM (2 days ago) Jun 8
to goph...@pubsubhelper.golang.org, Keith Randall, Denis Melnikov, Mark Freeman, golang-co...@googlegroups.com
Attention needed from Denis Melnikov, Keith Randall and Mark Freeman

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Note: this CL exposed that current master fails `test/codegen` on `arm64` with `GOEXPERIMENT=simd` enabled. I’ve reported it here: https://github.com/golang/go/issues/79899

Gerrit-Comment-Date: Mon, 08 Jun 2026 13:25:49 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Keith Randall (Gerrit)

unread,
Jun 9, 2026, 5:29:22 PM (12 hours ago) Jun 9
to Arseny Samoylov, goph...@pubsubhelper.golang.org, Keith Randall, Denis Melnikov, Mark Freeman, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Denis Melnikov and Mark Freeman

Keith Randall voted and added 1 comment

Votes added by Keith Randall

Code-Review+2
Commit-Queue+1

1 comment

File src/cmd/compile/internal/ssa/pair.go
Line 117, Patchset 1 (Latest): // If offsetOk is re-enabled in the future,
Keith Randall . unresolved

I believe FMOVQ instructions encode the offset as (7-bit signed) * 16, so we should probably have a width==16 case with off >= -1024 && off <= 1008 && off%16 == 0.

(Although there is a reasonable chance we will only be aligned to 8, So maybe that would prevent this optimization a fair amount of time? Not sure.)

Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Denis Melnikov
  • Mark Freeman
Submit Requirements:
  • requirement satisfiedCode-Review
  • requirement is not satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: Id12674ae56af4c3610f1e18a6d85d169a784dcca
Gerrit-Change-Number: 787301
Gerrit-PatchSet: 1
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Denis Melnikov <melnikov.denis...@gmail.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Mark Freeman <markf...@google.com>
Gerrit-Attention: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Attention: Denis Melnikov <melnikov.denis...@gmail.com>
Gerrit-Attention: Mark Freeman <markf...@google.com>
Gerrit-Comment-Date: Tue, 09 Jun 2026 21:29:17 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
satisfied_requirement
unsatisfied_requirement
open
diffy
Reply all
Reply to author
Forward
0 new messages