Ilya Tocar uploaded a change:
math: fix sqrt regression on AMD64
1.7 introduced a significant regression compared to 1.6:
SqrtIndirect-4 2.32ns ± 0% 7.86ns ± 0% +238.79% (p=0.000 n=20+18)
This is caused by sqrtsd preserving upper part of destination register.
Which introduces dependency on previous value of X0.
In 1.6 benchmark loop didn't use X0 immediately after call:
In 1.7 however xmm0 is used just after call:
I've verified that this is caused by dependency, by inserting
XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on
Splitting SQRTSD mem,reg into:
Removes dependency, because MOVSD (load version)
doesn't need to preserve upper part of a register.
And reg,reg operation is solved by renamer in CPU.
As a result of this change regression is gone:
SqrtIndirect-4 7.86ns ± 0% 2.33ns ± 0% -70.36% (p=0.000 n=18+17)
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/math/sqrt_amd64.s b/src/math/sqrt_amd64.s
index f8d825d..d72000f 100644
@@ -5,7 +5,8 @@
// func Sqrt(x float64) float64
- SQRTSD x+0(FP), X0
- MOVSD X0, ret+8(FP)
+TEXT ·Sqrt(SB), NOSPLIT, $0
+ MOVSD x+0(FP), X0
+ SQRTSD X0, X1
+ MOVSD X1, ret+8(FP)