[go] runtime/,internal/runtime/maps: rewrite memhash{32,64} using simd/archsimd intrinsics

11 views
Skip to first unread message

Arseny Samoylov (Gerrit)

unread,
Mar 10, 2026, 1:28:06 PM (8 days ago) Mar 10
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Arseny Samoylov has uploaded the change for review

Commit message

runtime/,internal/runtime/maps: rewrite memhash{32,64} using simd/archsimd intrinsics

This is a test CL to show performance impact of using simd/archsimd intrinsics for hasing

goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
│ base-merged.stat │ opt-merged.stat │
│ sec/op │ sec/op vs base │
MapAccessHit/Key=int32/Elem=int32/len=6-4 21.57n ± 0% 21.33n ± 1% -1.11% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=int32/len=64-4 25.52n ± 0% 37.98n ± 0% +48.82% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=int32/len=65536-4 41.12n ± 0% 66.48n ± 0% +61.69% (p=0.000 n=40)
MapAccessHit/Key=int64/Elem=int64/len=6-4 21.38n ± 0% 21.53n ± 0% +0.70% (p=0.000 n=40)
MapAccessHit/Key=int64/Elem=int64/len=64-4 25.37n ± 0% 37.59n ± 0% +48.18% (p=0.000 n=40)
MapAccessHit/Key=int64/Elem=int64/len=65536-4 44.46n ± 0% 73.81n ± 0% +66.03% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=bigType/len=6-4 104.8n ± 0% 105.2n ± 0% +0.33% (p=0.002 n=40)
MapAccessHit/Key=int32/Elem=bigType/len=64-4 167.6n ± 2% 167.8n ± 3% ~ (p=0.373 n=40)
MapAccessHit/Key=int32/Elem=bigType/len=65536-4 602.8n ± 0% 601.8n ± 0% -0.18% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=*int32/len=6-4 21.94n ± 0% 21.69n ± 1% -1.14% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=*int32/len=64-4 26.14n ± 1% 38.07n ± 0% +45.61% (p=0.000 n=40)
MapAccessHit/Key=int32/Elem=*int32/len=65536-4 45.91n ± 0% 74.02n ± 0% +61.23% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=int32/len=6-4 23.49n ± 0% 23.50n ± 0% +0.04% (p=0.004 n=40)
MapAccessMiss/Key=int32/Elem=int32/len=64-4 25.09n ± 2% 35.12n ± 2% +39.98% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=int32/len=65536-4 37.64n ± 0% 58.34n ± 0% +55.01% (p=0.000 n=40)
MapAccessMiss/Key=int64/Elem=int64/len=6-4 22.93n ± 1% 23.48n ± 0% +2.40% (p=0.000 n=40)
MapAccessMiss/Key=int64/Elem=int64/len=64-4 25.44n ± 2% 34.76n ± 2% +36.64% (p=0.000 n=40)
MapAccessMiss/Key=int64/Elem=int64/len=65536-4 38.33n ± 0% 59.58n ± 14% +55.44% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=bigType/len=6-4 152.0n ± 0% 158.8n ± 0% +4.47% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=bigType/len=64-4 156.7n ± 0% 163.4n ± 0% +4.28% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=bigType/len=65536-4 182.1n ± 0% 189.1n ± 0% +3.84% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=*int32/len=6-4 23.73n ± 0% 23.38n ± 1% -1.50% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=*int32/len=64-4 24.86n ± 2% 35.21n ± 1% +41.65% (p=0.000 n=40)
MapAccessMiss/Key=int32/Elem=*int32/len=65536-4 38.11n ± 0% 59.30n ± 0% +55.62% (p=0.000 n=40)
MapAccessZero/Key=int64-4 2.708n ± 0% 2.708n ± 0% ~ (p=0.107 n=40)
MapAccessZero/Key=int32-4 2.708n ± 0% 2.708n ± 0% -0.02% (p=0.019 n=40)
MapAccessEmpty/Key=int64-4 3.096n ± 0% 3.095n ± 0% ~ (p=0.409 n=40)
MapAccessEmpty/Key=int32-4 3.096n ± 0% 3.095n ± 0% ~ (p=0.932 n=40)
geomean 30.89n 37.02n +19.85%
Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b

Change diff

diff --git a/src/internal/runtime/maps/export_simd_test.go b/src/internal/runtime/maps/export_simd_test.go
new file mode 100644
index 0000000..b7a77b5
--- /dev/null
+++ b/src/internal/runtime/maps/export_simd_test.go
@@ -0,0 +1,19 @@
+// Copyright 2026 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.simd
+
+package maps
+
+import (
+ "unsafe"
+)
+
+func Memhash32(p unsafe.Pointer, seed uintptr) uintptr {
+ return memhash32(p, seed)
+}
+
+func Memhash64(p unsafe.Pointer, seed uintptr) uintptr {
+ return memhash64(p, seed)
+}
diff --git a/src/internal/runtime/maps/memhash_nosimd.go b/src/internal/runtime/maps/memhash_nosimd.go
new file mode 100644
index 0000000..892a2d0
--- /dev/null
+++ b/src/internal/runtime/maps/memhash_nosimd.go
@@ -0,0 +1,19 @@
+// Copyright 2026 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !goexperiment.simd
+
+package maps
+
+import "unsafe"
+
+// Functions below pushed from runtime.
+//
+//go:noescape
+//go:linkname memhash32 runtime.memhash32
+func memhash32(p unsafe.Pointer, h uintptr) uintptr
+
+//go:noescape
+//go:linkname memhash64 runtime.memhash64
+func memhash64(p unsafe.Pointer, h uintptr) uintptr
diff --git a/src/internal/runtime/maps/memhash_simd.go b/src/internal/runtime/maps/memhash_simd.go
new file mode 100644
index 0000000..5d98006
--- /dev/null
+++ b/src/internal/runtime/maps/memhash_simd.go
@@ -0,0 +1,43 @@
+// Copyright 2026 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.simd
+
+package maps
+
+import (
+ "internal/goarch"
+ "simd/archsimd"
+ "unsafe"
+)
+
+// Pushed from runtime.
+const hashRandomBytes = goarch.PtrSize / 4 * 64
+
+//go:linkname aeskeysched runtime.aeskeysched
+var aeskeysched [hashRandomBytes]byte
+
+func memhash32(p unsafe.Pointer, seed uintptr) uintptr {
+ state := archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})
+ hash := state.
+ AsUint8x16().
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[0])))).
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[16])))).
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[32])))).
+ AsUint64x2().
+ GetElem(0)
+ return uintptr(hash)
+}
+
+func memhash64(p unsafe.Pointer, seed uintptr) uintptr {
+ state := archsimd.LoadUint64x2(&[2]uint64{uint64(seed), *(*uint64)(p)})
+ hash := state.
+ AsUint8x16().
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[0])))).
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[16])))).
+ AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[32])))).
+ AsUint64x2().
+ GetElem(0)
+ return uintptr(hash)
+}
diff --git a/src/internal/runtime/maps/memhash_simd_test.go b/src/internal/runtime/maps/memhash_simd_test.go
new file mode 100644
index 0000000..f53fd4e
--- /dev/null
+++ b/src/internal/runtime/maps/memhash_simd_test.go
@@ -0,0 +1,59 @@
+// Copyright 2026 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.simd
+
+package maps_test
+
+import (
+ "internal/runtime/maps"
+ "math/rand"
+ "testing"
+ "time"
+ "unsafe"
+)
+
+//go:linkname runtime_memhash32 runtime.memhash32
+func runtime_memhash32(p unsafe.Pointer, seed uintptr) uintptr
+
+//go:linkname runtime_memhash64 runtime.memhash64
+func runtime_memhash64(p unsafe.Pointer, seed uintptr) uintptr
+
+func TestMemhash32(t *testing.T) {
+ r := rand.New(rand.NewSource(time.Now().UnixNano()))
+
+ for i := 0; i < 1_000_000; i++ {
+ v := uint32(r.Uint32())
+ seed := uintptr(r.Uint64())
+
+ ref := runtime_memhash32(unsafe.Pointer(&v), seed)
+ got := maps.Memhash32(unsafe.Pointer(&v), seed)
+
+ if ref != got {
+ t.Fatalf(
+ "memhash32 mismatch\nvalue=%#x seed=%#x\nexpected=%#x got=%#x",
+ v, seed, ref, got,
+ )
+ }
+ }
+}
+
+func TestMemhash64(t *testing.T) {
+ r := rand.New(rand.NewSource(time.Now().UnixNano()))
+
+ for i := 0; i < 1_000_000; i++ {
+ v := uint64(r.Uint64())
+ seed := uintptr(r.Uint64())
+
+ ref := runtime_memhash64(unsafe.Pointer(&v), seed)
+ got := maps.Memhash64(unsafe.Pointer(&v), seed)
+
+ if ref != got {
+ t.Fatalf(
+ "memhash64 mismatch\nvalue=%#x seed=%#x\nexpected=%#x got=%#x",
+ v, seed, ref, got,
+ )
+ }
+ }
+}
diff --git a/src/internal/runtime/maps/runtime.go b/src/internal/runtime/maps/runtime.go
index 2c395d5..b6d2a0a 100644
--- a/src/internal/runtime/maps/runtime.go
+++ b/src/internal/runtime/maps/runtime.go
@@ -16,14 +16,6 @@
// Functions below pushed from runtime.
//
//go:noescape
-//go:linkname memhash32 runtime.memhash32
-func memhash32(p unsafe.Pointer, h uintptr) uintptr
-
-//go:noescape
-//go:linkname memhash64 runtime.memhash64
-func memhash64(p unsafe.Pointer, h uintptr) uintptr
-
-//go:noescape
//go:linkname strhash runtime.strhash
func strhash(a unsafe.Pointer, h uintptr) uintptr

diff --git a/src/runtime/alg.go b/src/runtime/alg.go
index 9b726b2..38f2a9d 100644
--- a/src/runtime/alg.go
+++ b/src/runtime/alg.go
@@ -382,6 +382,8 @@
const hashRandomBytes = goarch.PtrSize / 4 * 64

// used in asm_{386,amd64,arm64}.s to seed the hash function
+//
+//go:linkname aeskeysched
var aeskeysched [hashRandomBytes]byte

// used in hash{32,64}.go to seed the hash function

Change information

Files:
  • A src/internal/runtime/maps/export_simd_test.go
  • A src/internal/runtime/maps/memhash_nosimd.go
  • A src/internal/runtime/maps/memhash_simd.go
  • A src/internal/runtime/maps/memhash_simd_test.go
  • M src/internal/runtime/maps/runtime.go
  • M src/runtime/alg.go
Change size: M
Delta: 6 files changed, 142 insertions(+), 8 deletions(-)
Open in Gerrit

Related details

Attention set is empty
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newchange
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 1
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 10, 2026, 2:09:42 PM (8 days ago) Mar 10
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 1 (Latest):
Arseny Samoylov . resolved

Hi! This is a test CL that re-implements memhash function so they can be inlined (as discussed in #77892).

The benchmark result for MapAccess are... not great 😅. I tried analyzing them, but `pprof` isn't well suited for this, and from `perf` I see ~4% degradation*. This appears to caused by preparing seed+key on the stack and then copying it to the `xmm0` register. The same ~4% degradation I appears in `MapAccessMiss/Key=int32/Elem=bigType`, which seems more reasonable than degradation in other benchmarks. I will try to investigate this further.

\* Here is [link](https://imgur.com/a/DrbsReZ) to screenshot from my `perf` report. This was obtained with
```
# In the GOROOT
GOEXPERIMENT=simd GOFLAGS=-ldflags=randlayout=100 go test -a -c -o opt-rl-100.out runtime/
perf record taskset -c 44-47 ./opt-rl-100.out -test.run=^$ -test.bench="MapAccessHit/Key=int32/Elem=int32/len=64" -test.count=10
# Same for base
```
On the screen we see that the number of samples are relatively the same with extra ~2.4K samples on the `vmodqu` from the stack. This corresponds roughly to the 4% degradation out of ~49k samples in the base profile.


----------------------------------------------------------------------------

Currently, I've runned into a few issues:
1) Initially, I added the `archsimd.X86.AVXAES()` check and a fallback. However:
* With this check, the hashing functions aren't inlined
* The fallback doesn't work on my machine... I even tried using the original hashing function with useAeshash disabled, but I couldn’t build the compiler (it failed due to the timeout). I’ll report this separately.

2) The compiler doesn't remove redundant store/load of the key when hashing function is inlined (We discussed it [here](https://github.com/golang/go/issues/77892#issuecomment-3985932435))


3) The `state` (seed + key) is prepared on the stack instead of directly in `xmm` register, unlike in GOASM implementation

4) There extra moves to load `aeskeysched`. `VAESENC` can take a memory argument, like in GOASM implementation, avoiding extra moves.

For reference, here is convenient [Godbolt link](https://godbolt.org/z/qe86Mnhc5) to see the GOASM.

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 1
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Michael Pratt <mpr...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Tue, 10 Mar 2026 18:09:35 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 10, 2026, 2:23:45 PM (8 days ago) Mar 10
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Hi! This is a test CL that re-implements memhash function so they can be inlined (as discussed in #77892).

The benchmark result for MapAccess are... not great 😅. I tried analyzing them, but `pprof` isn't well suited for this, and from `perf` I see ~4% degradation*. This appears to caused by preparing seed+key on the stack and then copying it to the `xmm0` register. The same ~4% degradation I appears in `MapAccessMiss/Key=int32/Elem=bigType`, which seems more reasonable than degradation in other benchmarks. I will try to investigate this further.

\* Here is [link](https://imgur.com/a/DrbsReZ) to screenshot from my `perf` report. This was obtained with
```
# In the GOROOT
GOEXPERIMENT=simd GOFLAGS=-ldflags=randlayout=100 go test -a -c -o opt-rl-100.out runtime/
perf record taskset -c 44-47 ./opt-rl-100.out -test.run=^$ -test.bench="MapAccessHit/Key=int32/Elem=int32/len=64" -test.count=10
# Same for base
```
On the screen we see that the number of samples are relatively the same with extra ~2.4K samples on the `vmodqu` from the stack. This corresponds roughly to the 4% degradation out of ~49k samples in the base profile.


----------------------------------------------------------------------------

Currently, I've runned into a few issues:
1) Initially, I added the `archsimd.X86.AVXAES()` check and a fallback. However:
* With this check, the hashing functions aren't inlined
* The fallback doesn't work on my machine... I even tried using the original hashing function with useAeshash disabled, but I couldn’t build the compiler (it failed due to the timeout). I’ll report this separately.

2) The compiler doesn't remove redundant store/load of the key when hashing function is inlined (We discussed it [here](https://github.com/golang/go/issues/77892#issuecomment-3985932435))


3) The `state` (seed + key) is prepared on the stack instead of directly in `xmm` register, unlike in GOASM implementation

4) There extra moves to load `aeskeysched`. `VAESENC` can take a memory argument, like in GOASM implementation, avoiding extra moves.

For reference, here is convenient [Godbolt link](https://godbolt.org/z/qe86Mnhc5) to see the GOASM.

Arseny Samoylov

Note: by fallback I mean `memhash{32,64}Fallback` functions, the same that are used in GOASM `memhash{32,64}` implementations.

Gerrit-Comment-Date: Tue, 10 Mar 2026 18:23:38 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Arseny Samoylov <samoylo...@gmail.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 10, 2026, 2:28:32 PM (8 days ago) Mar 10
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Hi! This is a test CL that re-implements memhash function so they can be inlined (as discussed in #77892).

The benchmark result for MapAccess are... not great 😅. I tried analyzing them, but `pprof` isn't well suited for this, and from `perf` I see ~4% degradation*. This appears to caused by preparing seed+key on the stack and then copying it to the `xmm0` register. The same ~4% degradation I appears in `MapAccessMiss/Key=int32/Elem=bigType`, which seems more reasonable than degradation in other benchmarks. I will try to investigate this further.

\* Here is [link](https://imgur.com/a/DrbsReZ) to screenshot from my `perf` report. This was obtained with
```
# In the GOROOT
GOEXPERIMENT=simd GOFLAGS=-ldflags=randlayout=100 go test -a -c -o opt-rl-100.out runtime/
perf record taskset -c 44-47 ./opt-rl-100.out -test.run=^$ -test.bench="MapAccessHit/Key=int32/Elem=int32/len=64" -test.count=10
# Same for base
```
On the screen we see that the number of samples are relatively the same with extra ~2.4K samples on the `vmodqu` from the stack. This corresponds roughly to the 4% degradation out of ~49k samples in the base profile.


----------------------------------------------------------------------------

Currently, I've runned into a few issues:
1) Initially, I added the `archsimd.X86.AVXAES()` check and a fallback. However:
* With this check, the hashing functions aren't inlined
* The fallback doesn't work on my machine... I even tried using the original hashing function with useAeshash disabled, but I couldn’t build the compiler (it failed due to the timeout). I’ll report this separately.

2) The compiler doesn't remove redundant store/load of the key when hashing function is inlined (We discussed it [here](https://github.com/golang/go/issues/77892#issuecomment-3985932435))


3) The `state` (seed + key) is prepared on the stack instead of directly in `xmm` register, unlike in GOASM implementation

4) There extra moves to load `aeskeysched`. `VAESENC` can take a memory argument, like in GOASM implementation, avoiding extra moves.

For reference, here is convenient [Godbolt link](https://godbolt.org/z/qe86Mnhc5) to see the GOASM.

Arseny Samoylov

Note: by fallback I mean `memhash{32,64}Fallback` functions, the same that are used in GOASM `memhash{32,64}` implementations.

Arseny Samoylov

UPD: sorry for the broken link to the github in my 2. point. Here is the [proper one](https://github.com/golang/go/issues/77892#issuecomment-3985932435)

Gerrit-Comment-Date: Tue, 10 Mar 2026 18:28:24 +0000
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 11, 2026, 4:42:06 AM (8 days ago) Mar 11
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Here is the link to the issue I mentioned earlier: https://github.com/golang/go/issues/78073

Gerrit-Comment-Date: Wed, 11 Mar 2026 08:41:59 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 11, 2026, 1:44:31 PM (7 days ago) Mar 11
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #2 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 2
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 11, 2026, 1:57:40 PM (7 days ago) Mar 11
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 2 (Latest):
Arseny Samoylov . resolved

The problem with `memhashFallback` is solved for the current GOASM `memhash` implementation (thanks to @k...@golang.org). However, it still doesn't work for my intrinsic implementation =(.

But I decided to add a fallback in the CL regardless. I tested them by temproralily removing the `!` in the `archsimd.X86.AVXAES()` check.

I also moved the initialization of `hashkey` variable and added comments to the fallbacks (the unitilized `hashkey` was the issue for the GOASM `memhash`). If this CL gets abandoned, I guess I should open a separate one containing just these changes.

Note: with fallbacks the intrinsic implementation doesn't get inlined. I will update the benchmark results with new version later.

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 2
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Michael Pratt <mpr...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Wed, 11 Mar 2026 17:57:33 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 13, 2026, 10:44:40 AM (5 days ago) Mar 13
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #3 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 3
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 13, 2026, 11:29:44 AM (5 days ago) Mar 13
to goph...@pubsubhelper.golang.org, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 3 (Latest):
Arseny Samoylov . resolved

I updated benchmark results for the version with fallbacks.

The significant degradation is still there.

For reference, here the `pprof` output from running the `MapAccessHit/Key=int32/Elem=int32/len=64`

```
File: base-rl-100.out
Build ID: f93fb80547c492868810d23ef80e1a045fd04688
Type: cpu
Time: 2026-03-13 17:15:48 MSK
Duration: 12.34s, Total samples = 12.34s ( 100%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 5
Showing nodes accounting for 11.61s, 94.08% of 12.34s total
Dropped 30 nodes (cum <= 0.06s)
Showing top 5 nodes out of 11
flat flat% sum% cum cum%
3.83s 31.04% 31.04% 12.28s 99.51% runtime_test.benchmarkMapAccessHit[go.shape.int32,go.shape.int32]
3.27s 26.50% 57.54% 8.45s 68.48% runtime.mapaccess2_fast32
1.66s 13.45% 70.99% 1.66s 13.45% internal/runtime/maps.(*groupReference).key (inline)
1.57s 12.72% 83.71% 1.57s 12.72% runtime.memhash32
1.28s 10.37% 94.08% 1.28s 10.37% internal/runtime/maps.ctrlGroup.matchH2 (inline)
(pprof) list memhash32
Total: 12.34s
ROUTINE ======================== runtime.memhash32 in /home/asamoylov/go-upstream/src/runtime/asm_amd64.s
1.57s 1.57s (flat, cum) 12.72% of Total
. . 1647:TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT,$0-24
. . 1648: // AX = ptr to data
. . 1649: // BX = seed
230ms 230ms 1650: CMPB runtime·useAeshash(SB), $0
. . 1651: JEQ noaes
. . 1652: MOVQ BX, X0 // X0 = seed
160ms 160ms 1653: PINSRD $2, (AX), X0 // data
20ms 20ms 1654: AESENC runtime·aeskeysched+0(SB), X0
170ms 170ms 1655: AESENC runtime·aeskeysched+16(SB), X0
70ms 70ms 1656: AESENC runtime·aeskeysched+32(SB), X0
750ms 750ms 1657: MOVQ X0, AX // return X0
170ms 170ms 1658: RET
. . 1659:noaes:
. . 1660: JMP runtime·memhash32Fallback<ABIInternal>(SB)
. . 1661:
. . 1662:// func memhash64(p unsafe.Pointer, h uintptr) uintptr
. . 1663:// ABIInternal for performance.
```
```
File: opt-rl-100.out
Build ID: 658cf21f779792095026587af48c799ff2746e94
Type: cpu
Time: 2026-03-13 17:16:09 MSK
Duration: 12.46s, Total samples = 12.43s (99.80%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 12.34s, 99.28% of 12.43s total
Dropped 32 nodes (cum <= 0.06s)
Showing top 10 nodes out of 13
flat flat% sum% cum cum%
5.50s 44.25% 44.25% 5.62s 45.21% internal/runtime/maps.memhash32
2.38s 19.15% 63.40% 12.38s 99.60% runtime_test.benchmarkMapAccessHit[go.shape.int32,go.shape.int32]
1.96s 15.77% 79.16% 10s 80.45% runtime.mapaccess2_fast32
0.98s 7.88% 87.05% 0.98s 7.88% internal/runtime/maps.ctrlGroup.matchH2 (inline)
0.86s 6.92% 93.97% 0.86s 6.92% internal/runtime/maps.(*groupReference).key (inline)
0.24s 1.93% 95.90% 0.24s 1.93% internal/runtime/maps.(*groupsReference).group (inline)
0.13s 1.05% 96.94% 0.13s 1.05% internal/runtime/maps.h1 (inline)
0.12s 0.97% 97.91% 0.12s 0.97% simd/archsimd.X86Features.AVXAES (inline)
0.09s 0.72% 98.63% 0.09s 0.72% internal/runtime/maps.makeProbeSeq (inline)
0.08s 0.64% 99.28% 0.08s 0.64% internal/runtime/maps.(*Map).directoryIndex (inline)
(pprof) list memhash32
Total: 12.43s
ROUTINE ======================== internal/runtime/maps.memhash32 in /home/asamoylov/go-upstream/src/internal/runtime/maps/memhash_simd.go
5.50s 5.62s (flat, cum) 45.21% of Total
170ms 170ms 30:func memhash32(p unsafe.Pointer, seed uintptr) uintptr {
160ms 280ms 31: if !archsimd.X86.AVXAES() {
. . 32: return memhash32Fallback(p, seed)
. . 33: }
. . 34:
110ms 110ms 35: state := archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})
. . 36: hash := state.
. . 37: AsUint8x16().
2.02s 2.02s 38: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[0])))).
1s 1s 39: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[16])))).
870ms 870ms 40: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[32])))).
. . 41: AsUint64x2().
950ms 950ms 42: GetElem(0)
220ms 220ms 43: return uintptr(hash)
. . 44:}
. . 45:
. . 46:func memhash64(p unsafe.Pointer, seed uintptr) uintptr {
. . 47: if !archsimd.X86.AVXAES() {
. . 48: return memhash64Fallback(p, seed)
```

Because benchmarks use `b.N` - adaptive number of iterations it's hard to compare profiles directly.
So I replaced it with some hardcoded number. This way it easier to compare profiles:

```
File: base-fixed-it.out
Build ID: c44b7b5a810d4b439073f4a591d2bb12ef960b7c
Type: cpu
Time: 2026-03-13 18:23:49 MSK
Duration: 12.02s, Total samples = 11.99s (99.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 5
Showing nodes accounting for 11.09s, 92.49% of 11.99s total
Dropped 38 nodes (cum <= 0.06s)
Showing top 5 nodes out of 12
flat flat% sum% cum cum%
3.64s 30.36% 30.36% 11.94s 99.58% runtime_test.benchmarkMapAccessHit[go.shape.int32,go.shape.int32]
3.12s 26.02% 56.38% 8.30s 69.22% runtime.mapaccess2_fast32
1.60s 13.34% 69.72% 1.60s 13.34% internal/runtime/maps.(*groupReference).key (inline)
1.49s 12.43% 82.15% 1.49s 12.43% runtime.memhash32
1.24s 10.34% 92.49% 1.24s 10.34% internal/runtime/maps.ctrlGroup.matchH2 (inline)
(pprof) list memhash32
Total: 11.99s
ROUTINE ======================== runtime.memhash32 in /home/asamoylov/go-upstream/src/runtime/asm_amd64.s
1.49s 1.49s (flat, cum) 12.43% of Total
. . 1647:TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT,$0-24
. . 1648: // AX = ptr to data
. . 1649: // BX = seed
. . 1650: CMPB runtime·useAeshash(SB), $0
180ms 180ms 1651: JEQ noaes
. . 1652: MOVQ BX, X0 // X0 = seed
. . 1653: PINSRD $2, (AX), X0 // data
250ms 250ms 1654: AESENC runtime·aeskeysched+0(SB), X0
40ms 40ms 1655: AESENC runtime·aeskeysched+16(SB), X0
240ms 240ms 1656: AESENC runtime·aeskeysched+32(SB), X0
500ms 500ms 1657: MOVQ X0, AX // return X0
280ms 280ms 1658: RET
. . 1659:noaes:
. . 1660: JMP runtime·memhash32Fallback<ABIInternal>(SB)
. . 1661:
. . 1662:// func memhash64(p unsafe.Pointer, h uintptr) uintptr
. . 1663:// ABIInternal for performance.
```
```
File: opt-fixed-it.out
Build ID: 171f448834359efab787b16806f8166340faba32
Type: cpu
Time: 2026-03-13 18:24:22 MSK
Duration: 18.95s, Total samples = 18.90s (99.75%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 5
Showing nodes accounting for 17.79s, 94.13% of 18.90s total
Dropped 28 nodes (cum <= 0.09s)
Showing top 5 nodes out of 14
flat flat% sum% cum cum%
8.16s 43.17% 43.17% 8.34s 44.13% internal/runtime/maps.memhash32
3.98s 21.06% 64.23% 18.88s 99.89% runtime_test.benchmarkMapAccessHit[go.shape.int32,go.shape.int32]
2.72s 14.39% 78.62% 14.90s 78.84% runtime.mapaccess2_fast32
1.60s 8.47% 87.09% 1.60s 8.47% internal/runtime/maps.(*groupReference).key (inline)
1.33s 7.04% 94.13% 1.33s 7.04% internal/runtime/maps.ctrlGroup.matchH2 (inline)
(pprof) list memhash32
Total: 18.90s
ROUTINE ======================== internal/runtime/maps.memhash32 in /home/asamoylov/go-upstream/src/internal/runtime/maps/memhash_simd.go
8.16s 8.34s (flat, cum) 44.13% of Total
320ms 320ms 30:func memhash32(p unsafe.Pointer, seed uintptr) uintptr {
. 180ms 31: if !archsimd.X86.AVXAES() {
. . 32: return memhash32Fallback(p, seed)
. . 33: }
. . 34:
290ms 290ms 35: state := archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})
. . 36: hash := state.
. . 37: AsUint8x16().
2.91s 2.91s 38: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[0])))).
1.48s 1.48s 39: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[16])))).
1.19s 1.19s 40: AESEncryptOneRound(archsimd.LoadUint32x4((*[4]uint32)(unsafe.Pointer(&aeskeysched[32])))).
. . 41: AsUint64x2().
1.46s 1.46s 42: GetElem(0)
510ms 510ms 43: return uintptr(hash)
. . 44:}
. . 45:
. . 46:func memhash64(p unsafe.Pointer, seed uintptr) uintptr {
. . 47: if !archsimd.X86.AVXAES() {
. . 48: return memhash64Fallback(p, seed)
```

So from this we see that `memhash` degrades more than 5x =(.

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 3
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Michael Pratt <mpr...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Fri, 13 Mar 2026 15:29:36 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Michael Pratt (Gerrit)

unread,
Mar 13, 2026, 11:35:35 AM (5 days ago) Mar 13
to Arseny Samoylov, goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements and Keith Randall

Michael Pratt added 1 comment

Patchset-level comments
Michael Pratt . resolved

Hi Cherry, Junyang, David, FYI Areseny has an attempt to implement runtime.memhash32/64 for maps using simd/archsimd. See the CL comments for some of their analysis of the regression. Perhaps there is some improvement we can make in archsimd.

Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Austin Clements
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 3
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Fri, 13 Mar 2026 15:35:31 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 13, 2026, 11:46:30 AM (5 days ago) Mar 13
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Here are the results for fixed number of iteration from `perf`: https://imgur.com/a/CsKrzDQ

PS: Excuse me for sending it like this. I tried copy-pasting the text like I did with `pprof` output, but it turned out to be a mess =(.

Gerrit-Comment-Date: Fri, 13 Mar 2026 15:46:24 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Cherry Mui (Gerrit)

unread,
Mar 13, 2026, 12:04:40 PM (5 days ago) Mar 13
to Arseny Samoylov, goph...@pubsubhelper.golang.org, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements and Keith Randall

Cherry Mui added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Here are the results for fixed number of iteration from `perf`: https://imgur.com/a/CsKrzDQ

PS: Excuse me for sending it like this. I tried copy-pasting the text like I did with `pprof` output, but it turned out to be a mess =(.

Cherry Mui

Thank you so much for doing the experiment and sharing the perf result! I spotted a few differences in the perf result:
1. Due to the fallback, the SIMD version has a stack bounds check. Maybe try marking it `//go:nosplit` and see if it makes any difference?
2. `archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})` is compiled to preparing the vector on stack and doing a load. I wonder if `archsimd.Uint64x2{}.SetElem(0, uint64(seed)).SetElem(1, ...)` makes any difference. The compiler probably should choose a better code generation for that.
3. It doesn't combine the load with the AES instruction. We should do that. But I would expect the difference is not significant.
4. Besides the above, it seems the AVX version of the instructions are just slower than the SSE version? One experiment we could do is changing the assembly version to use VAESENC and see if it makes any difference.

If you'd like to dive more, feel free to try the changes mentioned above. Also feel free to not spend more time on this. This is already a great experiment. Thank you again!

Gerrit-Comment-Date: Fri, 13 Mar 2026 16:04:35 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Arseny Samoylov <samoylo...@gmail.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 13, 2026, 12:28:15 PM (5 days ago) Mar 13
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Here are the results for fixed number of iteration from `perf`: https://imgur.com/a/CsKrzDQ

PS: Excuse me for sending it like this. I tried copy-pasting the text like I did with `pprof` output, but it turned out to be a mess =(.

Cherry Mui

Thank you so much for doing the experiment and sharing the perf result! I spotted a few differences in the perf result:
1. Due to the fallback, the SIMD version has a stack bounds check. Maybe try marking it `//go:nosplit` and see if it makes any difference?
2. `archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})` is compiled to preparing the vector on stack and doing a load. I wonder if `archsimd.Uint64x2{}.SetElem(0, uint64(seed)).SetElem(1, ...)` makes any difference. The compiler probably should choose a better code generation for that.
3. It doesn't combine the load with the AES instruction. We should do that. But I would expect the difference is not significant.
4. Besides the above, it seems the AVX version of the instructions are just slower than the SSE version? One experiment we could do is changing the assembly version to use VAESENC and see if it makes any difference.

If you'd like to dive more, feel free to try the changes mentioned above. Also feel free to not spend more time on this. This is already a great experiment. Thank you again!

Arseny Samoylov

Due to the fallback, the SIMD version has a stack bounds check. Maybe try marking it //go:nosplit and see if it makes any difference?

I thought about adding `nosplit` but decided no to do so without a proper reason. Ideally this function should be inlined, so the check would automatically go away. I guess I will test the effect of `nosplit` when the intrinsic version will reach the baseline performance.


> archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))}) is compiled to preparing the vector on stack and doing a load. I wonder if archsimd.Uint64x2{}.SetElem(0, uint64(seed)).SetElem(1, ...) makes any difference. The compiler probably should choose a better code generation for that.


Looks like it helped (I can't right now make sense where key+seed is loaded, I'll try understand it later). Thanks! Here is goldbolt link for reference: https://godbolt.org/z/sM8q495jP


> It doesn't combine the load with the AES instruction. We should do that. But I would expect the difference is not significant.

Agree, I also don't expect improvement in performance, only in code size

Besides the above, it seems the AVX version of the instructions are just slower than the SSE version? One experiment we could do is changing the assembly version to use VAESENC and see if it makes any difference.

Thanks, this seems to be the reason behind increased number of samples on AES. I will test this as you suggested.

Thank you for your feedback!

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 3
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Fri, 13 Mar 2026 16:28:08 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Arseny Samoylov <samoylo...@gmail.com>
Comment-In-Reply-To: Cherry Mui <cher...@google.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Michael Pratt (Gerrit)

unread,
Mar 13, 2026, 12:47:47 PM (5 days ago) Mar 13
to Arseny Samoylov, goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements, Cherry Mui and Keith Randall

Michael Pratt added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Here are the results for fixed number of iteration from `perf`: https://imgur.com/a/CsKrzDQ

PS: Excuse me for sending it like this. I tried copy-pasting the text like I did with `pprof` output, but it turned out to be a mess =(.

Cherry Mui

Thank you so much for doing the experiment and sharing the perf result! I spotted a few differences in the perf result:
1. Due to the fallback, the SIMD version has a stack bounds check. Maybe try marking it `//go:nosplit` and see if it makes any difference?
2. `archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))})` is compiled to preparing the vector on stack and doing a load. I wonder if `archsimd.Uint64x2{}.SetElem(0, uint64(seed)).SetElem(1, ...)` makes any difference. The compiler probably should choose a better code generation for that.
3. It doesn't combine the load with the AES instruction. We should do that. But I would expect the difference is not significant.
4. Besides the above, it seems the AVX version of the instructions are just slower than the SSE version? One experiment we could do is changing the assembly version to use VAESENC and see if it makes any difference.

If you'd like to dive more, feel free to try the changes mentioned above. Also feel free to not spend more time on this. This is already a great experiment. Thank you again!

Arseny Samoylov

Due to the fallback, the SIMD version has a stack bounds check. Maybe try marking it //go:nosplit and see if it makes any difference?

I thought about adding `nosplit` but decided no to do so without a proper reason. Ideally this function should be inlined, so the check would automatically go away. I guess I will test the effect of `nosplit` when the intrinsic version will reach the baseline performance.


> archsimd.LoadUint64x2(&[2]uint64{uint64(seed), uint64(*(*uint32)(p))}) is compiled to preparing the vector on stack and doing a load. I wonder if archsimd.Uint64x2{}.SetElem(0, uint64(seed)).SetElem(1, ...) makes any difference. The compiler probably should choose a better code generation for that.


Looks like it helped (I can't right now make sense where key+seed is loaded, I'll try understand it later). Thanks! Here is goldbolt link for reference: https://godbolt.org/z/sM8q495jP


> It doesn't combine the load with the AES instruction. We should do that. But I would expect the difference is not significant.
Agree, I also don't expect improvement in performance, only in code size

Besides the above, it seems the AVX version of the instructions are just slower than the SSE version? One experiment we could do is changing the assembly version to use VAESENC and see if it makes any difference.

Thanks, this seems to be the reason behind increased number of samples on AES. I will test this as you suggested.

Thank you for your feedback!

Michael Pratt

Looks like it helped (I can't right now make sense where key+seed is loaded, I'll try understand it later). Thanks! Here is goldbolt link for reference: https://godbolt.org/z/sM8q495jP

I also spent a few minutes staring at this confused, because `seed` is in BX, but BX is literally not referenced!

The problem is that `SetElem` returns a new vector, so you need to assign the result (`state = state.SetElem(0, uint64(seed)).SetElem(1, *(*uint64)(p))`).

That generates:

```
VMOVQ BX, X0
MOVQ (AX), CX
VPINSRQ $1, CX, X0, X0
```

(This makes me wish we have a way to enforce not ignoring return values)

Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 3
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Fri, 13 Mar 2026 16:47:43 +0000
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 4:04:14 AM (3 days ago) Mar 16
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #4 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 4
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 5:07:21 AM (3 days ago) Mar 16
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #5 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 5
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 5:21:11 AM (3 days ago) Mar 16
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 5 (Latest):
Arseny Samoylov . resolved

By changing how the `state` vector is prepared, the abnormal degradation disappears. The results now look more reasonable.

Here they are (same as in commit message):
```

goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
                                                 │  base.stat  │               opt.stat               │
│ sec/op │ sec/op vs base │
MapAccessHit/Key=int32/Elem=int32/len=6-4 21.57n ± 0% 21.57n ± 0% ~ (p=0.136 n=30)
MapAccessHit/Key=int32/Elem=int32/len=64-4 25.24n ± 1% 26.68n ± 1% +5.71% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=int32/len=65536-4 40.01n ± 0% 42.30n ± 0% +5.74% (p=0.000 n=30)
MapAccessHit/Key=int64/Elem=int64/len=6-4 21.39n ± 0% 21.40n ± 0% +0.05% (p=0.045 n=30)
MapAccessHit/Key=int64/Elem=int64/len=64-4 25.38n ± 2% 26.69n ± 1% +5.14% (p=0.000 n=30)
MapAccessHit/Key=int64/Elem=int64/len=65536-4 44.16n ± 0% 45.13n ± 0% +2.22% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=6-4 104.3n ± 0% 105.0n ± 0% +0.67% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=64-4 166.2n ± 5% 166.6n ± 3% ~ (p=0.344 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=65536-4 584.9n ± 0% 585.7n ± 0% +0.13% (p=0.013 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=6-4 21.93n ± 0% 21.86n ± 0% -0.32% (p=0.001 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=64-4 25.94n ± 1% 27.18n ± 0% +4.80% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=65536-4 45.76n ± 0% 46.45n ± 0% +1.50% (p=0.000 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=6-4 23.48n ± 0% 23.48n ± 0% ~ (p=0.447 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=64-4 24.54n ± 5% 26.22n ± 1% +6.82% (p=0.000 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=65536-4 37.63n ± 0% 38.85n ± 0% +3.24% (p=0.000 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=6-4 22.98n ± 1% 22.96n ± 1% ~ (p=0.480 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=64-4 25.57n ± 4% 26.16n ± 3% +2.31% (p=0.006 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=65536-4 38.34n ± 0% 56.49n ± 45% +47.35% (p=0.000 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=6-4 128.1n ± 0% 128.1n ± 0% +0.04% (p=0.033 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=64-4 130.2n ± 0% 129.5n ± 0% -0.50% (p=0.003 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=65536-4 150.8n ± 0% 150.8n ± 0% ~ (p=0.401 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=6-4 23.74n ± 0% 23.74n ± 0% ~ (p=0.621 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=64-4 24.79n ± 4% 25.64n ± 3% +3.45% (p=0.006 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=65536-4 38.17n ± 0% 39.42n ± 0% +3.27% (p=0.000 n=30)
MapAccessZero/Key=int64-4 2.707n ± 0% 2.707n ± 0% ~ (p=1.000 n=30)
MapAccessZero/Key=int32-4 2.707n ± 0% 2.708n ± 0% ~ (p=0.228 n=30)
MapAccessEmpty/Key=int64-4 3.095n ± 0% 3.095n ± 0% ~ (p=0.988 n=30)
MapAccessEmpty/Key=int32-4 3.095n ± 0% 3.095n ± 0% ~ (p=0.652 n=30)
geomean 30.17n 31.07n +2.98%
```

This suggests that preparing state+key directly in the xmm register has a large effect compared to preparing it on the stack and than loading into to the xmm register.
It looks a little bit suspicious to me that this alone could explain the large degradation that were in previous results (+40-60% on some benchmarks).

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 5
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Mon, 16 Mar 2026 09:21:03 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 6:07:26 AM (3 days ago) Mar 16
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #6 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 6
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 6:20:27 AM (3 days ago) Mar 16
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 6 (Latest):
Arseny Samoylov . resolved

The fallbacks issue is resolved by moving `memhash{32,64}` to `runtime/`.

I suspect that the problem was that the GOASM & intrinsic implementations existed at the same time. When I manually changed the condition for fallback in intrinsic implementations, they started to diverge - GOASM used AES, intrinsic - the fallback. So someone could get the wrong hash, as @mpr...@google.com suggested [here](https://github.com/golang/go/issues/78073#issuecomment-4040669648).

However, the intrinsic implementation can no longer be inlined. I suspect this is duet to use of `linkname`.
I guess this could be solved by moving hashing to `internal/runtime/...`.

Note: previously the intrinsic implementations were inlined when there were no fallbacks. Now they aren't inined even when the fallback is commented out.

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 6
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Mon, 16 Mar 2026 10:20:20 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 8:20:44 AM (2 days ago) Mar 16
to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov uploaded new patchset

Arseny Samoylov uploaded patch set #7 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 7
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 8:22:11 AM (2 days ago) Mar 16
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 7 (Latest):
Arseny Samoylov . resolved

Here is updated results after moving hashing to runtime. The only difference is that fallback functions are inlined so there is no stack resize check.

Here is the copy of results (as in commit message):

```
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
│ base.stat │ opt.stat │
│ sec/op │ sec/op vs base │
MapAccessHit/Key=int32/Elem=int32/len=6-4          21.58n ± 0%   21.59n ±  0%   +0.05% (p=0.009 n=30)
MapAccessHit/Key=int32/Elem=int32/len=64-4 25.46n ± 0% 26.48n ± 2% +4.01% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=int32/len=65536-4 40.91n ± 3% 41.96n ± 0% +2.58% (p=0.000 n=30)
MapAccessHit/Key=int64/Elem=int64/len=6-4 21.39n ± 0% 21.39n ± 0% ~ (p=0.842 n=30)
MapAccessHit/Key=int64/Elem=int64/len=64-4 25.32n ± 1% 25.61n ± 2% +1.13% (p=0.003 n=30)
MapAccessHit/Key=int64/Elem=int64/len=65536-4 43.90n ± 0% 44.25n ± 1% +0.82% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=6-4 104.0n ± 0% 105.2n ± 0% +1.15% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=64-4 173.4n ± 3% 170.1n ± 3% ~ (p=0.255 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=65536-4 585.2n ± 0% 596.6n ± 0% +1.95% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=6-4 21.92n ± 0% 21.88n ± 0% -0.18% (p=0.004 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=64-4 25.91n ± 1% 26.64n ± 1% +2.82% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=65536-4 45.74n ± 0% 45.91n ± 0% ~ (p=0.080 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=6-4 23.48n ± 0% 23.49n ± 0% ~ (p=0.184 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=64-4 25.37n ± 3% 25.82n ± 3% ~ (p=0.135 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=65536-4 37.65n ± 0% 38.20n ± 0% +1.47% (p=0.000 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=6-4 22.96n ± 1% 22.92n ± 1% -0.20% (p=0.004 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=64-4 25.84n ± 3% 25.75n ± 2% ~ (p=0.739 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=65536-4 38.34n ± 0% 42.30n ± 55% +10.33% (p=0.000 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=6-4 128.2n ± 4% 129.1n ± 3% +0.66% (p=0.001 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=64-4 130.3n ± 3% 130.7n ± 3% ~ (p=0.420 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=65536-4 150.7n ± 0% 151.1n ± 1% +0.23% (p=0.001 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=6-4 23.73n ± 0% 23.73n ± 0% ~ (p=0.221 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=64-4 25.23n ± 2% 25.73n ± 2% ~ (p=0.130 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=65536-4 38.14n ± 0% 38.75n ± 0% +1.61% (p=0.000 n=30)
MapAccessZero/Key=int64-4 2.708n ± 0% 2.708n ± 0% ~ (p=0.938 n=30)
MapAccessZero/Key=int32-4 2.708n ± 0% 2.708n ± 0% ~ (p=0.933 n=30)
MapAccessEmpty/Key=int64-4 3.095n ± 0% 3.095n ± 0% ~ (p=0.335 n=30)
MapAccessEmpty/Key=int32-4 3.095n ± 0% 3.095n ± 0% ~ (p=0.357 n=30)
geomean 30.30n 30.62n +1.07%
```
Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 7
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Mon, 16 Mar 2026 12:22:03 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 8:47:53 AM (2 days ago) Mar 16
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui and Keith Randall

Arseny Samoylov added 1 comment

Patchset-level comments
Arseny Samoylov . resolved

Also, I noticed that latest changes aren't properly disassembled by `go tool objdump`

```
go tool objdump -s memhash32 a.out
TEXT runtime.memhash32(SB) /home/asamoylov/go-upstream/src/runtime/memhash32-64_simd_amd64.go
memhash32-64_simd_amd64.go:17 0x499500 90 NOPL
cpu.go:131 0x499501 803dfa85910000 CMPB internal/cpu.X86+66(SB), $0x0
cpu.go:131 0x499508 7444 JE 0x49954e
cpu.go:131 0x49950a 803def85910000 CMPB internal/cpu.X86+64(SB), $0x0
memhash32-64_simd_amd64.go:18 0x499511 743b JE 0x49954e
memhash32-64_simd_amd64.go:23 0x499513 c4e1f96e OUTSB DS:0(SI), DX
memhash32-64_simd_amd64.go:23 0x499517 c3 RET
memhash32-64_simd_amd64.go:23 0x499518 8b08 MOVL 0(AX), CX
memhash32-64_simd_amd64.go:23 0x49951a c4e3f922c1 ANDL CL, AL
memhash32-64_simd_amd64.go:23 0x49951f 01c5 ADDL AX, BP
memhash32-64_simd_amd64.go:27 0x499521 fa CLI
memhash32-64_simd_amd64.go:27 0x499522 6f OUTSD DS:0(SI), DX
memhash32-64_simd_amd64.go:27 0x499523 0df8819100 ORL $0x9181f8, AX
memhash32-64_simd_amd64.go:27 0x499528 c4e279dcc1 FADD F0, F1
memhash32-64_simd_amd64.go:28 0x49952d c5fa6f0dfb819100 VMOVDQU runtime.aeskeysched+16(SB), X1
memhash32-64_simd_amd64.go:28 0x499535 c4e279dcc1 FADD F0, F1
memhash32-64_simd_amd64.go:29 0x49953a c5fa6f0dfe819100 VMOVDQU runtime.aeskeysched+32(SB), X1
memhash32-64_simd_amd64.go:29 0x499542 c4e279dcc1 FADD F0, F1
memhash32-64_simd_amd64.go:31 0x499547 c4 ?
memhash32-64_simd_amd64.go:31 0x499548 e3f9 JRCXZ 0x499543
memhash32-64_simd_amd64.go:31 0x49954a 16 ?
memhash32-64_simd_amd64.go:31 0x49954b c000c3 ROLB $0xc3, 0(AX)
hash64.go:74 0x49954e 488b0d737d9100 MOVQ runtime.hashkey+8(SB), CX
alg.go:423 0x499555 8b00 MOVL 0(AX), AX
hash64.go:74 0x499557 4831c1 XORQ AX, CX
hash64.go:74 0x49955a 4831d8 XORQ BX, AX
hash64.go:74 0x49955d 4833055c7d9100 XORQ runtime.hashkey(SB), AX
hash64.go:83 0x499564 48f7e1 MULQ CX
hash64.go:84 0x499567 4831d0 XORQ DX, AX
hash64.go:84 0x49956a 4889c1 MOVQ AX, CX
hash64.go:83 0x49956d 48b84b127dc4274e8e1d MOVQ $0x1d8e4e27c47d124b, AX
hash64.go:83 0x499577 48f7e1 MULQ CX
hash64.go:84 0x49957a 4831d0 XORQ DX, AX
hash64.go:73 0x49957d 90 NOPL
hash64.go:88 0x49957e 90 NOPL
hash64.go:88 0x49957f 90 NOPL
memhash32-64_simd_amd64.go:19 0x499580 c3 RET
```

Here is proper output from regular `objdump`
```
objdump --disassemble=runtime.memhash32 a.out

a.out: file format elf64-x86-64


Disassembly of section .text:

0000000000499500 <runtime.memhash32>:
499500: 90 nop
499501: 80 3d fa 85 91 00 00 cmpb $0x0,0x9185fa(%rip) # db1b02 <internal/cpu.X86+0x42>
499508: 74 44 je 49954e <runtime.memhash32+0x4e>
49950a: 80 3d ef 85 91 00 00 cmpb $0x0,0x9185ef(%rip) # db1b00 <internal/cpu.X86+0x40>
499511: 74 3b je 49954e <runtime.memhash32+0x4e>
499513: c4 e1 f9 6e c3 vmovq %rbx,%xmm0
499518: 8b 08 mov (%rax),%ecx
49951a: c4 e3 f9 22 c1 01 vpinsrq $0x1,%rcx,%xmm0,%xmm0
499520: c5 fa 6f 0d f8 81 91 vmovdqu 0x9181f8(%rip),%xmm1 # db1720 <runtime.aeskeysched>
499527: 00
499528: c4 e2 79 dc c1 vaesenc %xmm1,%xmm0,%xmm0
49952d: c5 fa 6f 0d fb 81 91 vmovdqu 0x9181fb(%rip),%xmm1 # db1730 <runtime.aeskeysched+0x10>
499534: 00
499535: c4 e2 79 dc c1 vaesenc %xmm1,%xmm0,%xmm0
49953a: c5 fa 6f 0d fe 81 91 vmovdqu 0x9181fe(%rip),%xmm1 # db1740 <runtime.aeskeysched+0x20>
499541: 00
499542: c4 e2 79 dc c1 vaesenc %xmm1,%xmm0,%xmm0
499547: c4 e3 f9 16 c0 00 vpextrq $0x0,%xmm0,%rax
49954d: c3 retq
49954e: 48 8b 0d 73 7d 91 00 mov 0x917d73(%rip),%rcx # db12c8 <runtime.hashkey+0x8>
499555: 8b 00 mov (%rax),%eax
499557: 48 31 c1 xor %rax,%rcx
49955a: 48 31 d8 xor %rbx,%rax
49955d: 48 33 05 5c 7d 91 00 xor 0x917d5c(%rip),%rax # db12c0 <runtime.hashkey>
499564: 48 f7 e1 mul %rcx
499567: 48 31 d0 xor %rdx,%rax
49956a: 48 89 c1 mov %rax,%rcx
49956d: 48 b8 4b 12 7d c4 27 movabs $0x1d8e4e27c47d124b,%rax
499574: 4e 8e 1d
499577: 48 f7 e1 mul %rcx
49957a: 48 31 d0 xor %rdx,%rax
49957d: 90 nop
49957e: 90 nop
49957f: 90 nop
499580: c3 retq
```
Note: you can reproduce the same problem (but with GOASM implementation) with this patch
```
diff --git a/src/runtime/memhash32-64_nosimd_amd64.s b/src/runtime/memhash32-64_nosimd_amd64.s
index 82b00eb559..5ce94d7a64 100644
--- a/src/runtime/memhash32-64_nosimd_amd64.s
+++ b/src/runtime/memhash32-64_nosimd_amd64.s
@@ -15,9 +15,12 @@ TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT,$0-24
JEQ noaes

MOVQ BX, X0 // X0 = seed
        PINSRD  $2, (AX), X0    // data
-       AESENC  runtime·aeskeysched+0(SB), X0
- AESENC runtime·aeskeysched+16(SB), X0
- AESENC runtime·aeskeysched+32(SB), X0
+ VMOVQ runtime·aeskeysched+0(SB), X1
+ VAESENC X1, X0, X0
+ VMOVQ runtime·aeskeysched+16(SB), X1
+ VAESENC X1, X0, X0
+ VMOVQ runtime·aeskeysched+32(SB), X1
+ VAESENC X1, X0, X0

MOVQ X0, AX // return X0
        RET
noaes:
@@ -32,9 +35,12 @@ TEXT runtime·memhash64<ABIInternal>(SB),NOSPLIT,$0-24
JEQ noaes

MOVQ BX, X0 // X0 = seed
        PINSRQ  $1, (AX), X0    // data
- AESENC runtime·aeskeysched+0(SB), X0
- AESENC runtime·aeskeysched+16(SB), X0
- AESENC runtime·aeskeysched+32(SB), X0
+ VMOVQ runtime·aeskeysched+0(SB), X1
+ VAESENC X1, X0, X0
+ VMOVQ runtime·aeskeysched+16(SB), X1
+ VAESENC X1, X0, X0
+ VMOVQ runtime·aeskeysched+32(SB), X1
+ VAESENC X1, X0, X0

MOVQ X0, AX // return X0
        RET
noaes:
```
Gerrit-Comment-Date: Mon, 16 Mar 2026 12:47:46 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
Mar 16, 2026, 10:40:50 AM (2 days ago) Mar 16
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Austin Clements, Cherry Mui, Keith Randall and Michael Pratt

Arseny Samoylov added 1 comment

Patchset-level comments
File-level comment, Patchset 3:
Arseny Samoylov . resolved

Here are the results for fixed number of iteration from `perf`: https://imgur.com/a/CsKrzDQ

Arseny Samoylov

Here are the results from replacing `AES` to `VEAS` in GOASM implementation:

Results:

```
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
                                                 │ moved_to_runtime/base.stat │       base_vaes/base_vaes.stat        │      moved_to_runtime/opt.stat       │
│ sec/op │ sec/op vs base │ sec/op vs base │
MapAccessHit/Key=int32/Elem=int32/len=6-4 21.58n ± 0% 21.59n ± 0% +0.05% (p=0.006 n=30+40) 21.59n ± 0% +0.05% (p=0.009 n=30)
MapAccessHit/Key=int32/Elem=int32/len=64-4 25.46n ± 0% 25.49n ± 0% ~ (p=0.249 n=30+40) 26.48n ± 2% +4.01% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=int32/len=65536-4 40.91n ± 3% 40.66n ± 1% ~ (p=0.224 n=30+40) 41.96n ± 0% +2.58% (p=0.000 n=30)
MapAccessHit/Key=int64/Elem=int64/len=6-4 21.39n ± 0% 21.39n ± 0% ~ (p=0.365 n=30+40) 21.39n ± 0% ~ (p=0.842 n=30)
MapAccessHit/Key=int64/Elem=int64/len=64-4 25.32n ± 1% 25.34n ± 1% ~ (p=0.357 n=30+40) 25.61n ± 2% +1.13% (p=0.003 n=30)
MapAccessHit/Key=int64/Elem=int64/len=65536-4 43.90n ± 0% 43.98n ± 0% ~ (p=0.270 n=30+40) 44.25n ± 1% +0.82% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=6-4 104.0n ± 0% 104.5n ± 0% +0.48% (p=0.000 n=30+40) 105.2n ± 0% +1.15% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=64-4 173.4n ± 3% 171.9n ± 1% ~ (p=0.231 n=30+40) 170.1n ± 3% ~ (p=0.255 n=30)
MapAccessHit/Key=int32/Elem=bigType/len=65536-4 585.2n ± 0% 592.6n ± 0% +1.26% (p=0.000 n=30+40) 596.6n ± 0% +1.95% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=6-4 21.92n ± 0% 21.92n ± 0% ~ (p=0.749 n=30+40) 21.88n ± 0% -0.18% (p=0.004 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=64-4 25.91n ± 1% 25.96n ± 2% ~ (p=0.092 n=30+40) 26.64n ± 1% +2.82% (p=0.000 n=30)
MapAccessHit/Key=int32/Elem=*int32/len=65536-4 45.74n ± 0% 45.16n ± 0% -1.28% (p=0.000 n=30+40) 45.91n ± 0% ~ (p=0.080 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=6-4 23.48n ± 0% 23.48n ± 0% ~ (p=0.241 n=30+40) 23.49n ± 0% ~ (p=0.184 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=64-4 25.37n ± 3% 24.96n ± 2% ~ (p=0.409 n=30+40) 25.82n ± 3% ~ (p=0.135 n=30)
MapAccessMiss/Key=int32/Elem=int32/len=65536-4 37.65n ± 0% 37.65n ± 0% ~ (p=0.740 n=30+40) 38.20n ± 0% +1.47% (p=0.000 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=6-4 22.96n ± 1% 22.92n ± 1% -0.17% (p=0.001 n=30+40) 22.92n ± 1% -0.20% (p=0.004 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=64-4 25.84n ± 3% 25.67n ± 4% ~ (p=0.396 n=30+40) 25.75n ± 2% ~ (p=0.739 n=30)
MapAccessMiss/Key=int64/Elem=int64/len=65536-4 38.34n ± 0% 38.31n ± 0% ~ (p=0.597 n=30+40) 42.30n ± 55% +10.33% (p=0.000 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=6-4 128.2n ± 4% 128.1n ± 0% ~ (p=0.057 n=30+40) 129.1n ± 3% +0.66% (p=0.001 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=64-4 130.3n ± 3% 129.7n ± 0% -0.50% (p=0.005 n=30+40) 130.7n ± 3% ~ (p=0.420 n=30)
MapAccessMiss/Key=int32/Elem=bigType/len=65536-4 150.7n ± 0% 150.8n ± 0% ~ (p=0.952 n=30+40) 151.1n ± 1% +0.23% (p=0.001 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=6-4 23.73n ± 0% 23.73n ± 0% ~ (p=0.880 n=30+40) 23.73n ± 0% ~ (p=0.221 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=64-4 25.23n ± 2% 24.81n ± 2% ~ (p=0.210 n=30+40) 25.73n ± 2% ~ (p=0.130 n=30)
MapAccessMiss/Key=int32/Elem=*int32/len=65536-4 38.14n ± 0% 38.18n ± 0% ~ (p=0.085 n=30+40) 38.75n ± 0% +1.61% (p=0.000 n=30)
MapAccessZero/Key=int64-4 2.708n ± 0% 2.708n ± 0% ~ (p=0.628 n=30+40) 2.708n ± 0% ~ (p=0.938 n=30)
MapAccessZero/Key=int32-4 2.708n ± 0% 2.707n ± 0% ~ (p=0.462 n=30+40) 2.708n ± 0% ~ (p=0.933 n=30)
MapAccessEmpty/Key=int64-4 3.095n ± 0% 3.094n ± 0% -0.03% (p=0.025 n=30+40) 3.095n ± 0% ~ (p=0.335 n=30)
MapAccessEmpty/Key=int32-4 3.095n ± 0% 3.095n ± 0% ~ (p=0.407 n=30+40) 3.095n ± 0% ~ (p=0.357 n=30)
geomean 30.30n 30.24n -0.18% 30.62n +1.07%

```
From this result we can say that there is no difference between `AES` and `VAES`.

Here are the [link](https://www.canva.com/design/DAHEHtsQWmE/bXUm5cgg1tEZQCtT0CYFbg/edit?utm_content=DAHEHtsQWmE&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton) to a perf result.

Looking at the disasm we can see that the only difference now in:

  • `archsimd.X86.AVXAES()` checks two conditions, whereas GOASM checks only a single condition.
  • GOASM version uses `movq` to extract hash, while intrinsic version uses `vpextrq`.

I guess the only thing that we can do now is to try and address the latter point, but I don't know if it worth it.

Overall, intrinsic version almost reached baseline performance with 1% degradation gap.

Original goal was to reimplement `memhash` functions so they can be inlined removing the redundant load/store of key (current hashing API takes pointer, not the key itself).

My first version was inlined, however, the redundant load/store weren't gone. My current version, I suspect, are not inlinable due to `linkname` usage.
In future, I guess it will be feasible to make hash functions inlinable, for example by moving them to the `internal/runtime/...`.

Also, I think we could address the problem why compiler doesn't remove redundant load/stores when the hashing functions are inlined (for that purpose Patchset 1 can be used)

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Clements
  • Cherry Mui
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 7
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Cherry Mui <cher...@google.com>
Gerrit-Attention: Michael Pratt <mpr...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Mon, 16 Mar 2026 14:40:42 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Arseny Samoylov <samoylo...@gmail.com>
Comment-In-Reply-To: Cherry Mui <cher...@google.com>
Comment-In-Reply-To: Michael Pratt <mpr...@google.com>
unsatisfied_requirement
satisfied_requirement
open
diffy

Arseny Samoylov (Gerrit)

unread,
5:57 AM (13 hours ago) 5:57 AM
to goph...@pubsubhelper.golang.org, Cherry Mui, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Patchset-level comments
Arseny Samoylov

My first version was inlined, however, the redundant load/store weren't gone.

...


Also, I think we could address the problem why compiler doesn't remove redundant load/stores when the hashing functions are inlined (for that purpose Patchset 1 can be used)


I forgot to test how inlining works after changes in Patchset 4 (replacing `Load` with `Set`, as @cher...@google.com suggested).

And with this changes the redundant load/store are elimanated! (For reference: https://godbolt.org/z/9vTM9jj3q - note that there are still redundant `XCHGL AX, AX`, but this is improvement over the load/store)

Note: current version (Patchset 6,7) isn't inlined.

So, key takeaway: using `Set` instead of `Load` solved two issues:

  • abnormal performance degradation
  • persistence of redundant store/load after inlining
Gerrit-Comment-Date: Wed, 18 Mar 2026 09:57:10 +0000
unsatisfied_requirement
satisfied_requirement
open
diffy

Cherry Mui (Gerrit)

unread,
2:39 PM (4 hours ago) 2:39 PM
to Arseny Samoylov, goph...@pubsubhelper.golang.org, Junyang Shao, David Chase, Austin Clements, Keith Randall, Michael Pratt, Gopher Robot, golang-co...@googlegroups.com
Attention needed from Arseny Samoylov, Austin Clements, Keith Randall and Michael Pratt

Cherry Mui added 1 comment

Patchset-level comments
Cherry Mui

Thank you for the update! The compiler probably should do better code generation with Load. In the mean time, you can use SetElem.

The `XCHGL AX, AX` is a NOP instruction for marking the line number of the inlined call frame. See https://github.com/golang/go/issues/73787#issuecomment-3212628199 for what I plan to do. I'd expect it also doesn't affect performance much. The CPU is very good at handing NOPs.

In future, I guess it will be feasible to make hash functions inlinable, for example by moving them to the internal/runtime/....

For them to be inlineable, they need to be placed in the same package as the caller, or in a package imported by the caller. If you put them in the runtime package, they cannot be inlined into internal/runtime. It is probably a good idea to move all the memhash stuff to internal/runtime/maps. (There are probably also places where the memhash functions are called indirectly (not in the map fast32/64 functions though.), which are also not going to be inlined.)

archsimd.X86.AVXAES() checks two conditions, whereas GOASM checks only a single condition.

If it matters we could memoize archsimd.X86.AVXAES(). May not matter too much.

GOASM version uses movq to extract hash, while intrinsic version uses vpextrq.

Like Load, I think we can do better code generation for GetElem(0).

Overall, intrinsic version almost reached baseline performance with 1% degradation gap.

This is great! I would call this experiment a success! Thank you very much for doing this!

Open in Gerrit

Related details

Attention is currently required from:
  • Arseny Samoylov
  • Austin Clements
  • Keith Randall
  • Michael Pratt
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7d48651c22eb61b887ffda08287945f419ebff3b
Gerrit-Change-Number: 753740
Gerrit-PatchSet: 7
Gerrit-Owner: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Reviewer: Austin Clements <aus...@google.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Michael Pratt <mpr...@google.com>
Gerrit-CC: Cherry Mui <cher...@google.com>
Gerrit-CC: David Chase <drc...@google.com>
Gerrit-CC: Gopher Robot <go...@golang.org>
Gerrit-CC: Junyang Shao <shaoj...@google.com>
Gerrit-Attention: Arseny Samoylov <samoylo...@gmail.com>
Gerrit-Attention: Keith Randall <k...@golang.org>
Gerrit-Attention: Michael Pratt <mpr...@google.com>
Gerrit-Attention: Austin Clements <aus...@google.com>
Gerrit-Comment-Date: Wed, 18 Mar 2026 18:39:52 +0000
unsatisfied_requirement
satisfied_requirement
open
diffy
Reply all
Reply to author
Forward
0 new messages