Hi Keith,
Perhaps split the added benchmarks into a parent CL in a chain. Otherwise one has to apply them to properly benchmark master without the rest of your patch.
In any case, below are the numbers for my amd64 zen5 laptop in performance mode, to avoid throttling. It appears to be a net win throughout.
goos: linux
goarch: amd64
pkg: runtime
cpu: AMD Ryzen AI 9 HX 370 w/ Radeon 890M
│ tip │
cl678175 │
│ sec/op │ sec/op vs
base │
MemclrKnownSize112-24 1.0675n ± 1% 0.7465n ± 0% -30.07%
(p=0.000 n=10)
MemclrKnownSize128-24 1.0665n ± 2% 0.8535n ± 0% -19.97%
(p=0.000 n=10)
MemclrKnownSize192-24 1.492n ± 0% 1.280n ± 1% -14.20%
(p=0.000 n=10)
MemclrKnownSize248-24 2.237n ± 0% 2.072n ± 2% -7.33%
(p=0.000 n=10)
MemclrKnownSize256-24 1.917n ± 1% 1.707n ± 1% -10.95%
(p=0.000 n=10)
MemclrKnownSize512-24 3.623n ± 0% 3.412n ± 0% -5.84%
(p=0.000 n=10)
MemclrKnownSize1024-24 7.042n ± 1% 6.819n ± 0% -3.17%
(p=0.000 n=10)
MemmoveKnownSize112-24 1.2880n ± 0% 0.9974n ± 1% -22.56%
(p=0.000 n=10)
MemmoveKnownSize128-24 1.918n ± 1% 1.193n ± 2% -37.83%
(p=0.000 n=10)
MemmoveKnownSize192-24 1.918n ± 0% 1.862n ± 1% -2.89%
(p=0.000 n=10)
MemmoveKnownSize248-24 2.572n ± 2% 2.253n ± 1% -12.38%
(p=0.000 n=10)
MemmoveKnownSize256-24 2.557n ± 1% 2.306n ± 0% -9.84%
(p=0.000 n=10)
MemmoveKnownSize512-24 5.115n ± 0% 4.337n ± 1% -15.21%
(p=0.000 n=10)
MemmoveKnownSize1024-24 11.600n ± 1% 8.368n ± 4% -27.87%
(p=0.000 n=10)
geomean 2.486n 2.078n -16.39%
│ tip │
cl678175 │
│ B/s │ B/s vs
base │
MemclrKnownSize112-24 97.73Gi ± 0% 139.73Gi ± 0% +42.98%
(p=0.000 n=10)
MemclrKnownSize128-24 111.8Gi ± 1% 139.7Gi ± 0% +24.94%
(p=0.000 n=10)
MemclrKnownSize192-24 119.8Gi ± 0% 139.6Gi ± 1% +16.54%
(p=0.000 n=10)
MemclrKnownSize248-24 103.3Gi ± 0% 111.4Gi ± 2% +7.92%
(p=0.000 n=10)
MemclrKnownSize256-24 124.4Gi ± 1% 139.7Gi ± 1% +12.30%
(p=0.000 n=10)
MemclrKnownSize512-24 131.6Gi ± 0% 139.8Gi ± 0% +6.19%
(p=0.000 n=10)
MemclrKnownSize1024-24 135.4Gi ± 1% 139.9Gi ± 0% +3.28%
(p=0.000 n=10)
MemmoveKnownSize112-24 80.98Gi ± 0% 104.58Gi ± 1% +29.14%
(p=0.000 n=10)
MemmoveKnownSize128-24 62.15Gi ± 1% 99.97Gi ± 2% +60.85%
(p=0.000 n=10)
MemmoveKnownSize192-24 93.26Gi ± 0% 96.04Gi ± 1% +2.98%
(p=0.000 n=10)
MemmoveKnownSize248-24 89.80Gi ± 2% 102.49Gi ± 1% +14.13%
(p=0.000 n=10)
MemmoveKnownSize256-24 93.24Gi ± 1% 103.42Gi ± 0% +10.92%
(p=0.000 n=10)
MemmoveKnownSize512-24 93.23Gi ± 0% 109.97Gi ± 1% +17.95%
(p=0.000 n=10)
MemmoveKnownSize1024-24 82.22Gi ± 1% 113.98Gi ± 4% +38.63%
(p=0.000 n=10)
geomean 99.27Gi 118.7Gi +19.60%
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/CA%2BZMcOO_%3D%2BaJBTWiku0enKX1tiYJDdcu7YBhwcCgPik7iFTeVg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/CAGeFq%2BnzhqWxX_hzXqsyWB%2B%3DkTf0BfjNczCZqLdS043S_dAK%3DA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/4f097436-7a08-40d1-8b22-eeb918275696n%40googlegroups.com.
Hello.
I tested this patch on my Kunpeng920 and Kunpeng930 systems (arm64) and in all cases got a degradation. The main reason is branch misprediction.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/CA%2BZMcOO_%3D%2BaJBTWiku0enKX1tiYJDdcu7YBhwcCgPik7iFTeVg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/b41aeaa8-bbc3-466d-a242-16b1fd892e0bn%40googlegroups.com.
Ok, things look mostly positive, thanks.
I've made a more-real stack of CLs. It should lower the overhead somewhat.Based on the results you all have provided, I've also unrolled the memmove loop more for the larger sizes. Hopefully that will mitigate some of the slowdowns we saw.CL stack at https://go-review.googlesource.com/c/go/+/679456
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/04752d08-34b5-48c2-916c-030d2a091a0fn%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-dev/c84e19a9-9dac-42bd-84af-f1ac974e9ed4n%40googlegroups.com.