[go] math: fix sqrt regression on AMD64

83 views
Skip to first unread message

Ilya Tocar (Gerrit)

unread,
Sep 1, 2016, 6:05:00 PM9/1/16
to Ian Lance Taylor, golang-co...@googlegroups.com
Ilya Tocar uploaded a change:
https://go-review.googlesource.com/28392

math: fix sqrt regression on AMD64

1.7 introduced a significant regression compared to 1.6:

SqrtIndirect-4 2.32ns ± 0% 7.86ns ± 0% +238.79% (p=0.000 n=20+18)

This is caused by sqrtsd preserving upper part of destination register.
Which introduces dependency on previous value of X0.
In 1.6 benchmark loop didn't use X0 immediately after call:

callq *%rbx
movsd 0x8(%rsp),%xmm2
movsd 0x20(%rsp),%xmm1
addsd %xmm2,%xmm1
mov 0x18(%rsp),%rax
inc %rax
jmp loop

In 1.7 however xmm0 is used just after call:

callq *%rbx
mov 0x10(%rsp),%rcx
lea 0x1(%rcx),%rax
movsd 0x8(%rsp),%xmm0
movsd 0x18(%rsp),%xmm1

I've verified that this is caused by dependency, by inserting
XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on
1.6 level.

Splitting SQRTSD mem,reg into:
MOVSD mem,reg
SQRTSD reg,reg

Removes dependency, because MOVSD (load version)
doesn't need to preserve upper part of a register.
And reg,reg operation is solved by renamer in CPU.

As a result of this change regression is gone:
SqrtIndirect-4 7.86ns ± 0% 2.33ns ± 0% -70.36% (p=0.000 n=18+17)

Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
---
M src/math/sqrt_amd64.s
1 file changed, 4 insertions(+), 3 deletions(-)



diff --git a/src/math/sqrt_amd64.s b/src/math/sqrt_amd64.s
index f8d825d..d72000f 100644
--- a/src/math/sqrt_amd64.s
+++ b/src/math/sqrt_amd64.s
@@ -5,7 +5,8 @@
#include "textflag.h"

// func Sqrt(x float64) float64
-TEXT ·Sqrt(SB),NOSPLIT,$0
- SQRTSD x+0(FP), X0
- MOVSD X0, ret+8(FP)
+TEXT ·Sqrt(SB), NOSPLIT, $0
+ MOVSD x+0(FP), X0
+ SQRTSD X0, X1
+ MOVSD X1, ret+8(FP)
RET

--
https://go-review.googlesource.com/28392

Ilya Tocar (Gerrit)

unread,
Sep 1, 2016, 6:05:16 PM9/1/16
to golang-co...@googlegroups.com
Ilya Tocar has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1: Run-TryBot+1

--
https://go-review.googlesource.com/28392
Gerrit-Reviewer: Ilya Tocar <ilya....@intel.com>
Gerrit-HasComments: No

Gobot Gobot (Gerrit)

unread,
Sep 1, 2016, 6:05:22 PM9/1/16
to Ilya Tocar, golang-co...@googlegroups.com
Gobot Gobot has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1:

TryBots beginning. Status page: http://farmer.golang.org/try?commit=24bbc144

Gobot Gobot (Gerrit)

unread,
Sep 1, 2016, 6:20:57 PM9/1/16
to Ilya Tocar, golang-co...@googlegroups.com
Gobot Gobot has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1: TryBot-Result+1

TryBots are happy.

--
https://go-review.googlesource.com/28392
Gerrit-Reviewer: Gobot Gobot <go...@golang.org>

Keith Randall (Gerrit)

unread,
Sep 2, 2016, 3:13:45 PM9/2/16
to Ilya Tocar, Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Keith Randall has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1:

> TryBots are happy.

I'm not convinced this is necessary.

movsd also only writes the low 64 bits of the register. So that's not the
fix that is happening here. It just happens that you're using X1 as the
output of the sqrt instruction instead of X0. If you redo your patch to do:
MOVSD x+0(FP), X0
SQRTSD X0, X0
MOVSD X0, ret+8(FP)
I think you'll get the same bad behavior. Using XORPS on the destination
register first might be a more robust fix to this.

That said, I think the real problem is the benchmark, not the math.Sqrt
code. The benchmark is testing the throughput of the sqrt instruction, and
the "bug" in 1.7 is that because of partial register nonsense we end up
measuring the latency, not throughput. I think the benchmark should be
changed to measure latency all the time. So something like:
x := 1.0
f := math.Sqrt
for i := 0; i < b.N; i++ {
x = f(x)
}
Global = x

Should do that.

Maybe do x = f(x)+1 to make sure we aren't testing some sqrt(1) special
case.

--
https://go-review.googlesource.com/28392
Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
Gerrit-Reviewer: Ilya Tocar <ilya....@intel.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-HasComments: No

Ilya Tocar (Gerrit)

unread,
Sep 2, 2016, 3:25:41 PM9/2/16
to Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Ilya Tocar has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1:

> I'm not convinced this is necessary.
>
> movsd also only writes the low 64 bits of the register.
Nope. Load form of movsd zeros out upper 64 bits.
From Intel® 64 and IA-32 Architectures Software Developer’s Manual:

MOVSD (128-bit Legacy SSE version: MOVSD XMM1, m64)
DEST[63:0]  SRC[63:0]
DEST[127:64]  0 // Upper part is not preserved
DEST[VLMAX-1:128] (Unmodified)


> If you redo your patch to do:
> MOVSD x+0(FP), X0
> SQRTSD X0, X0
> MOVSD X0, ret+8(FP)
> I think you'll get the same bad behavior.
I've checked and both variants have the same performance


> That said, I think the real problem is the benchmark, not the
> math.Sqrt code. The benchmark is testing the throughput of the
> sqrt instruction, and the "bug" in 1.7 is that because of partial
> register nonsense we end up measuring the latency, not throughput.
> I think the benchmark should be changed to measure latency all the
> time.
What about other sqrt benches (sqrt,sqrtgo)?

Keith Randall (Gerrit)

unread,
Sep 2, 2016, 3:36:41 PM9/2/16
to Ilya Tocar, Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Keith Randall has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 1: Code-Review+2

> > I'm not convinced this is necessary.
> >
> > movsd also only writes the low 64 bits of the register.
> Nope. Load form of movsd zeros out upper 64 bits.
> From Intel® 64 and IA-32 Architectures Software Developer’s Manual:
>
> MOVSD (128-bit Legacy SSE version: MOVSD XMM1, m64)
> DEST[63:0]  SRC[63:0]
> DEST[127:64]  0 // Upper part is not preserved
> DEST[VLMAX-1:128] (Unmodified)
>
Ah, sorry, I misread the manual. The reg->reg MOVSD doesn't zero the high
bits, but the mem->reg one does.
So this is a good fix then, sorry. Always good to avoid partial register
stalls.

>
> > If you redo your patch to do:
> > MOVSD x+0(FP), X0
> > SQRTSD X0, X0
> > MOVSD X0, ret+8(FP)
> > I think you'll get the same bad behavior.
> I've checked and both variants have the same performance
>
>
> > That said, I think the real problem is the benchmark, not the
> > math.Sqrt code. The benchmark is testing the throughput of the
> > sqrt instruction, and the "bug" in 1.7 is that because of partial
> > register nonsense we end up measuring the latency, not
> throughput.
> > I think the benchmark should be changed to measure latency all
> the
> > time.
> What about other sqrt benches (sqrt,sqrtgo)?

Yes, we should probably fix all of these benchmarks. For instance,
BenchmarkSqrt doesn't actually benchmark square roots - they all get
constant-folded away. It's benchmarking add.

Ilya Tocar (Gerrit)

unread,
Sep 5, 2016, 2:11:36 PM9/5/16
to Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Reviewers: Keith Randall, Gobot Gobot

Ilya Tocar uploaded a new patch set:
This also removes old Sqrt benchmarks, in favor of benchmarks measuring
latency.
Only SqrtIndirect is kept, to show impact of this patch.

Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
---
M src/math/all_test.go
M src/math/sqrt_amd64.s
2 files changed, 24 insertions(+), 14 deletions(-)

Ilya Tocar (Gerrit)

unread,
Sep 5, 2016, 2:12:02 PM9/5/16
to Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Ilya Tocar has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 2: Run-TryBot+1

--
https://go-review.googlesource.com/28392
Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
Gerrit-Reviewer: Ilya Tocar <ilya....@intel.com>
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-HasComments: No

Gobot Gobot (Gerrit)

unread,
Sep 5, 2016, 2:12:22 PM9/5/16
to Ilya Tocar, Keith Randall, golang-co...@googlegroups.com
Gobot Gobot has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 2:

TryBots beginning. Status page: http://farmer.golang.org/try?commit=5b30fac5

Gobot Gobot (Gerrit)

unread,
Sep 5, 2016, 2:18:23 PM9/5/16
to Ilya Tocar, Keith Randall, golang-co...@googlegroups.com
Gobot Gobot has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 2:

Build is still in progress...
This change failed on darwin-amd64-10_10:
See
https://storage.googleapis.com/go-build-log/5b30fac5/darwin-amd64-10_10_1186cd15.log

Consult https://build.golang.org/ to see whether it's a new failure. Other
builds still in progress; subsequent failure notices suppressed until final
report.

Gobot Gobot (Gerrit)

unread,
Sep 5, 2016, 2:23:32 PM9/5/16
to Ilya Tocar, Keith Randall, golang-co...@googlegroups.com
Gobot Gobot has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 2: TryBot-Result-1

1 of 18 TryBots failed:
Failed on darwin-amd64-10_10:
https://storage.googleapis.com/go-build-log/5b30fac5/darwin-amd64-10_10_1186cd15.log

Consult https://build.golang.org/ to see whether they are new failures.

Keith Randall (Gerrit)

unread,
Sep 5, 2016, 2:48:25 PM9/5/16
to Ilya Tocar, Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Keith Randall has posted comments on this change.

math: fix sqrt regression on AMD64

Patch Set 2:

LGTM.
The trybot failure looks unrelated.

Ilya Tocar (Gerrit)

unread,
Sep 6, 2016, 3:45:09 PM9/6/16
to golang-...@googlegroups.com, Keith Randall, Gobot Gobot, golang-co...@googlegroups.com
Ilya Tocar has submitted this change and it was merged.
This also removes old Sqrt benchmarks, in favor of benchmarks measuring
latency.
Only SqrtIndirect is kept, to show impact of this patch.

Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
Reviewed-on: https://go-review.googlesource.com/28392
Run-TryBot: Ilya Tocar <ilya....@intel.com>
Reviewed-by: Keith Randall <k...@golang.org>
---
M src/math/all_test.go
M src/math/sqrt_amd64.s
2 files changed, 24 insertions(+), 14 deletions(-)

Approvals:
Keith Randall: Looks good to me, approved
Ilya Tocar: Run TryBots

Objections:
Gobot Gobot: TryBots failed
Reply all
Reply to author
Forward
0 new messages