Hi,
Here is a small update with some preliminary sweet benchmark results.
Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
│ base.results │ slp.results │
│ sec/op │ sec/op vs base │
BleveIndexBatch100-4 5.867 ± 2% 5.842 ± 2% ~ (p=0.971 n=10)
ESBuildThreeJS-4 685.0m ± 2% 686.9m ± 1% ~ (p=0.853 n=10)
ESBuildRomeTS-4 179.6m ± 1% 180.2m ± 1% ~ (p=0.353 n=10)
EtcdPut-4 138.7m ± 37% 134.6m ± 19% ~ (p=0.315 n=10)
EtcdSTM-4 526.0m ± 4% 531.9m ± 2% ~ (p=0.579 n=10)
GoBuildKubelet-4 145.5 ± 1% 146.5 ± 0% +0.73% (p=0.000 n=10)
GoBuildKubeletLink-4 9.647 ± 0% 9.670 ± 0% ~ (p=0.739 n=10)
GoBuildIstioctl-4 117.4 ± 1% 118.3 ± 0% +0.80% (p=0.000 n=10)
GoBuildIstioctlLink-4 10.13 ± 0% 10.09 ± 0% -0.41% (p=0.019 n=10)
GoBuildFrontend-4 41.29 ± 1% 41.80 ± 0% +1.22% (p=0.000 n=10)
GoBuildFrontendLink-4 1.613 ± 2% 1.619 ± 1% ~ (p=0.971 n=10)
GoBuildTsgo-4 66.31 ± 1% 66.02 ± 1% ~ (p=0.052 n=10)
GoBuildTsgoLink-4 842.0m ± 1% 838.3m ± 1% ~ (p=0.089 n=10)
GopherLuaKNucleotide-4 27.62 ± 1% 27.63 ± 1% ~ (p=0.739 n=10)
MarkdownRenderXHTML-4 237.6m ± 1% 239.7m ± 1% +0.89% (p=0.001 n=10)
Tile38QueryLoad-4 532.3µ ± 0% 523.3µ ± 0% -1.69% (p=0.000 n=10)
geomean 2.391 2.390 -0.03%
HiSilicon Kunpeng-920
│ base.results │ slp.results │
│ sec/op │ sec/op vs base │
BleveIndexBatch100-4 7.467 ± 1% 7.513 ± 2% ~ (p=0.165 n=10)
ESBuildThreeJS-4 755.1m ± 1% 754.5m ± 1% ~ (p=0.912 n=10)
ESBuildRomeTS-4 195.5m ± 2% 194.0m ± 1% ~ (p=0.123 n=10)
EtcdPut-4 55.08m ± 1% 54.69m ± 1% ~ (p=0.165 n=10)
EtcdSTM-4 292.2m ± 1% 291.4m ± 1% ~ (p=0.436 n=10)
GoBuildKubelet-4 157.7 ± 0% 158.7 ± 0% +0.59% (p=0.000 n=10)
GoBuildKubeletLink-4 12.54 ± 2% 12.51 ± 1% ~ (p=0.247 n=10)
GoBuildIstioctl-4 123.8 ± 0% 124.0 ± 0% +0.17% (p=0.011 n=10)
GoBuildIstioctlLink-4 8.517 ± 1% 8.525 ± 0% ~ (p=0.529 n=10)
GoBuildFrontend-4 45.04 ± 0% 45.55 ± 1% +1.14% (p=0.000 n=10)
GoBuildFrontendLink-4 2.134 ± 1% 2.135 ± 1% ~ (p=0.739 n=10)
GoBuildTsgo-4 75.66 ± 0% 75.74 ± 1% ~ (p=0.796 n=10)
GoBuildTsgoLink-4 1.162 ± 1% 1.165 ± 1% ~ (p=0.631 n=10)
GopherLuaKNucleotide-4 33.30 ± 3% 32.97 ± 1% ~ (p=0.075 n=10)
MarkdownRenderXHTML-4 266.0m ± 0% 267.2m ± 0% +0.45% (p=0.001 n=10)
Tile38QueryLoad-4 607.9µ ± 0% 608.0µ ± 0% ~ (p=0.739 n=10)
geomean 2.450 2.450 +0.03%
On both x86 and Arm64, the current prototype shows essentially no measurable change in overall performance.
There is small degradation in the
go-build benchmark, which is expected, as the current implementation has not yet been optimized for the compile-time overhead.
One result that stood out is
Tile38QueryLoad on x86, which shows a small improvement.
I reran the benchmark separately to verify that the improvement persists, and the result appear stable across the runs:
(rerun of Tile38QueryLoad benchmark on x86)
│ base.results │ slp.results │
│ sec/op │ sec/op vs base │
Tile38QueryLoad-4 531.3µ ± 0% 523.4µ ± 0% -1.49% (p=0.000 n=10)
│ base.results │ slp.results │
│ p50-latency-sec │ p50-latency-sec vs base │
Tile38QueryLoad-4 251.3µ ± 0% 250.6µ ± 0% -0.25% (p=0.005 n=10)
│ base.results │ slp.results │
│ p90-latency-sec │ p90-latency-sec vs base │
Tile38QueryLoad-4 862.6µ ± 0% 847.7µ ± 0% -1.73% (p=0.000 n=10)
│ base.results │ slp.results │
│ p99-latency-sec │ p99-latency-sec vs base │
Tile38QueryLoad-4 4.998m ± 1% 4.803m ± 1% -3.89% (p=0.000 n=10)
│ base.results │ slp.results │
│ ops/s │ ops/s vs base │
Tile38QueryLoad-4 5.646k ± 0% 5.731k ± 0% +1.51% (p=0.000 n=10)
Here is how I ran the benchmarks:
taskset -c 44-47 ./sweet run -shell -work-dir `pwd`/tmp config.toml 2>&1 | tee sweet.log
# Separate rerun for tile38 on x86
taskset -c 44-47 ./sweet run -run=tile38 -shell -work-dir `pwd`/tmp config.toml 2>&1 | tee sweet.log
config.toml:
[[config]]
name = "base"
goroot = "/home/asamoylov/go-upstream"
[[config]]
name = "slp"
goroot = "/home/asamoylov/go-slp"
envbuild = ["GOFLAGS=-d=ssa/slp/debug=2"]
Notes:
- In my environment cocroachdb benchmark fails for some reason (regardless of the SLP)
- For the go-build benchmark, I had to use a separate compiler build with SLP enabled by default (thats why "slp" config has different goroot and doesn't need GOEXPERIMENT=slp)
- GOFLAGS=-d=ssa/slp/debug=2 is not particular useful without `=all`, which it does not accept
Also, my previous reply sent through Groups web interface appears to have gotten stuck and was only published after I replied via Gmail.
As a result, the thread now contains three copies of the same reply.
The Gmail version (the third one) does not contain proper links, so please refer to one of the first two copies.
I hope benchmarks tables will be displayed correctly. Pasting them as plain text looks rather messy.