float behaviour on arm64 v amd64

Dan Kortschak

unread,

Jan 12, 2020, 11:19:54 PM1/12/20

to golang-nuts

I am going through failures that I see in Gonum tests when we build on
arm64 (Travis now provide this).

In many cases there are slight differences that I'm OK with adding a
tolerance to accept, but in one case (stat.ROC[0][1]) I see an error
that can be completely avoided by changing the expression from what is
at [1] to

```
for i := range tpr {
tpr[i] = 1 - tpr[i]/nPos
fpr[i] = 1 - fpr[i]/nNeg
}
```

Should I expect `inv := 1/c; v *= inv` and `v /= c` to give the same
results for reasonable cases? (or at least to match the behaviour on
amd64/386/arm - which all agree).

thanks
Dan

[0]https://godoc.org/gonum.org/v1/gonum/stat#ROC
[1]
https://github.com/gonum/gonum/blob/683ee363d56e77121c6640345bb9d40644f02a1f/stat/roc.go#L107-L114

Keith Randall

unread,

Jan 13, 2020, 6:45:17 PM1/13/20

to golang-nuts

Note: discussion at https://github.com/golang/go/issues/36536 . TL;DR fused floating point multiply-add gives higher precision results.

Dan Kortschak

unread,

Jan 13, 2020, 8:46:05 PM1/13/20

to Keith Randall, golang-nuts

Thanks for linking this here.

One thing that I did not follow up at the issue; why do we see the FMA
being applied when the value is a slice element, but not when it's a
single float64 value?

Second query, are there plans for adding FMA support to amd64 akin to
how it is on arm64?

Dan

On Mon, 2020-01-13 at 15:45 -0800, 'Keith Randall' via golang-nuts
wrote:

Keith Randall

unread,

Jan 13, 2020, 9:53:20 PM1/13/20

to Dan Kortschak, golang-nuts

On Mon, Jan 13, 2020 at 5:45 PM Dan Kortschak <d...@kortschak.io> wrote:

Thanks for linking this here.

One thing that I did not follow up at the issue; why do we see the FMA
being applied when the value is a slice element, but not when it's a
single float64 value?

I'd have to see an example to be sure. Possibly everything gets constant-propagated in the compiler, which has its own very high accuracy floating point (256 bits of mantissa, I think?).

Second query, are there plans for adding FMA support to amd64 akin to
how it is on arm64?

Not from code like this. Our minimum amd64 architecture has no FMA, so anything generated by the compiler would need to be conditioned by a runtime test, which would probably defeat the point of the optimization. The optimization only saves one instruction, and having to add a load / compare / branch, plus a fallback runtime call, it almost certainly isn't worth it.

Our arm64 minimum architecture has FMA.

If you use the new math.FMA, you will get the conditioned hardware instruction. Use of this function is advised when you really need the extra bits, not just when it might be faster than a separate multiply and add (which on amd64, it probably isn't).

Dan Kortschak

unread,

Jan 13, 2020, 10:01:53 PM1/13/20

to Keith Randall, golang-nuts

Thanks. The example that shows the difference is at the issue.

```
package main

import "fmt"

func main() {
nNeg := 10.0
fpr := 10.0
invNeg := 1 / nNeg
fpr = 1 - fpr*invNeg
fmt.Println(fpr)
}
```

And, yes, it looks like it could be constant propagation.

For the generation of FMA, the reason I ask is to know whether we're
going to hit the same kind of effects between Go versions that we are
now bumping into between architecture. I'm happy to not see this happen
for the near term.

Dan

Reply all

Reply to author

Forward