Fused Multiply Add; exact vs tolerant tests for floating point computation

Nigel Tao

unread,

Aug 31, 2017, 10:00:27 PM8/31/17

to golang-dev

The golang.org/x/image/vector package contains a vector graphics
rasterizer, heavy on floating point computation.

It contains tests that ensure that the package's output is the same on
all GOARCHes. Some GOARCH-dependent code paths are asm-accelerated,
some are not. For example, in the amd64 assembly (using SIMD), we are
careful to set the MXCSR rounding bits so that the output is
identical, not just close.

Those tests haven't changed for about a year, but
https://golang.org/issue/21460, filed a couple of weeks ago, notes
that the tests fail on GOARCH=s390x and on GOARCH=ppc64le. The
speculation is that this is due to the new fused multiply add (FMA)
support on these architectures.

The Go spec (https://golang.org/ref/spec#Floating_point_operators)
says that "An implementation may combine multiple floating-point
operations into a single fused operation, possibly across statements,
and produce a result that differs from the value obtained by executing
and rounding the instructions individually."

Can someone with compiler or spec expertise confirm that this means
that the x/image/vector package should not test for identical results,
only for close results within some tolerance band? Even pure Go code
can produce different values on different GOARCHes, or on different
compilers for the same GOARCH, right?

What is the status of FMA support across the GOARCHes? I haven't been
following it closely, and I'm not very familiar with FMA.

Specifically, I would like to keep testing that the asm and no-asm
floating point computations produce identical (not just close) results
on amd64, even if I can't test that across all GOARCHes. Does (pure
Go, no asm) amd64 do FMA now? If not, is it likely to do FMA in the
future? Can FMA versus non-FMA produce different values on amd64?

Or is the solution to replace "x*y" with "float64(x*y)" throughout the
x/image/vector code, to explicitly disallow FMA, as per the examples
in the Go spec section linked to above?

As an aside, if an expression "x*y" has type T, I'm a little surprised
that "x*y" and "T(x*y)" can have different semantics, and the
conversion from T to T is not a no-op, so code simplifiers can't
always remove the conversion as redundant, but I guess that boat has
sailed.

Brendan Tracey

unread,

Aug 31, 2017, 10:34:43 PM8/31/17

to golang-dev

Can someone with compiler or spec expertise confirm that this means
that the x/image/vector package should not test for identical results,
only for close results within some tolerance band? Even pure Go code
can produce different values on different GOARCHes, or on different
compilers for the same GOARCH, right?

...

As an aside, if an expression "x*y" has type T, I'm a little surprised
that "x*y" and "T(x*y)" can have different semantics, and the
conversion from T to T is not a no-op, so code simplifiers can't
always remove the conversion as redundant, but I guess that boat has
sailed.

In particular, based on my reading, this has been true at least since Go 1.0. Read the following comment and the one after it.

https://github.com/golang/go/issues/17895#issuecomment-293417087

Keith Randall

unread,

Sep 1, 2017, 1:07:38 AM9/1/17

to Nigel Tao, golang-dev

On Thu, Aug 31, 2017 at 7:00 PM, Nigel Tao <nige...@golang.org> wrote:

The golang.org/x/image/vector package contains a vector graphics
rasterizer, heavy on floating point computation.

It contains tests that ensure that the package's output is the same on
all GOARCHes. Some GOARCH-dependent code paths are asm-accelerated,
some are not. For example, in the amd64 assembly (using SIMD), we are
careful to set the MXCSR rounding bits so that the output is
identical, not just close.

Those tests haven't changed for about a year, but
https://golang.org/issue/21460, filed a couple of weeks ago, notes
that the tests fail on GOARCH=s390x and on GOARCH=ppc64le. The
speculation is that this is due to the new fused multiply add (FMA)
support on these architectures.

The Go spec (https://golang.org/ref/spec#Floating_point_operators)
says that "An implementation may combine multiple floating-point
operations into a single fused operation, possibly across statements,
and produce a result that differs from the value obtained by executing
and rounding the instructions individually."

Can someone with compiler or spec expertise confirm that this means
that the x/image/vector package should not test for identical results,
only for close results within some tolerance band? Even pure Go code
can produce different values on different GOARCHes, or on different
compilers for the same GOARCH, right?

Right.

What is the status of FMA support across the GOARCHes? I haven't been
following it closely, and I'm not very familiar with FMA.

It is only implemented for PPC64 and S390X.

Specifically, I would like to keep testing that the asm and no-asm
floating point computations produce identical (not just close) results
on amd64, even if I can't test that across all GOARCHes. Does (pure
Go, no asm) amd64 do FMA now?

No.

If not, is it likely to do FMA in the
future?

I don't know of anyone working on it.

It is unlikely that the compiler would generate FMA on amd64 because the instructions involved are not guaranteed to be available on every amd64 we support. Any FMA would need to be guarded by CPUID instructions.

Guarding in assembly is easier because we can do it at larger granularity. Again, I know of no one working on it.

Can FMA versus non-FMA produce different values on amd64?

Yes, if we ever implement it.

Or is the solution to replace "x*y" with "float64(x*y)" throughout the
x/image/vector code, to explicitly disallow FMA, as per the examples
in the Go spec section linked to above?

Yes, that would work also. Seems unfortunate to pessimize code just to test it.

As an aside, if an expression "x*y" has type T, I'm a little surprised
that "x*y" and "T(x*y)" can have different semantics, and the
conversion from T to T is not a no-op, so code simplifiers can't
always remove the conversion as redundant, but I guess that boat has
sailed.

Yeah, it's not beautiful.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Jones

unread,

Sep 1, 2017, 6:44:06 PM9/1/17

to Keith Randall, Nigel Tao, golang-dev

Arguably you're not "pessimizing" code if you want it be bit-exact across IEEE-754 implementations -- in such a case you must avoid any code interpretation / organization / generation flexibilities that could possibly make a difference.

If you had two versions, the "natural" version and the "strict" version and the strict was bit exact and the natural was within a small difference (due to less error) then the pair could be judged good.

If you have only the natural version (admitting FusedMultiplyAdd aka MultiplyAccumulate) then you could test that against golden reference data.

--

Michael T. Jones
michae...@gmail.com

Dmitri Shuralyov

unread,

Sep 8, 2017, 10:27:01 AM9/8/17

to golang-dev

As an aside, if an expression "x*y" has type T, I'm a little surprised

that "x*y" and "T(x*y)" can have different semantics, and the

conversion from T to T is not a no-op, so code simplifiers can't

always remove the conversion as redundant, but I guess that boat has

sailed.

This was mentioned/considered in https://github.com/golang/go/issues/17895#issuecomment-293417087.

I've also filed https://github.com/mdempsky/unconvert/issues/24 about it.

Akhil Indurti

unread,

Dec 16, 2019, 7:55:13 PM12/16/19

to golang-dev

Sorry to revive an old thread, but now that math.FMA has been merged into tip for 1.14, could the float rasterizer use that function instead? The caveat is that math.FMA takes in float64 values, which seems to be okay given the already existent use of float64 in raster_floating.go.

func fma(x, y, z float32) float32 {

return float32(math.FMA(float64(x), float64(y), float64(z)))

Keith Randall

unread,

Dec 16, 2019, 8:17:23 PM12/16/19

to Akhil Indurti, golang-dev

On Mon, Dec 16, 2019 at 4:55 PM Akhil Indurti <aind...@gmail.com> wrote:

Sorry to revive an old thread, but now that math.FMA has been merged into tip for 1.14, could the float rasterizer use that function instead? The caveat is that math.FMA takes in float64 values, which seems to be okay given the already existent use of float64 in raster_floating.go.

func fma(x, y, z float32) float32 {
return float32(math.FMA(float64(x), float64(y), float64(z)))
}

Too late for 1.14, but you could send a CL for 1.15.

Why is fma useful here? I think you could get all the no-intermediate-rounding behavior just converting to float64 and back.

(Can a float64 hold all possible results of a float32*float32? I think so, but I'm not entirely sure.)

func fma(x, y, z float32) float32 {

return float32(float64(x) * float64(y) + float64(z))
}

The compiler will ideally use an fma instruction to compute this without an explicit math.FMA call. (It does on arm64.)

--

You received this message because you are subscribed to the Google Groups "golang-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-dev/6f5be9ae-4792-4e49-931a-3350f6b23c1a%40googlegroups.com.

Akhil Indurti

unread,

Dec 16, 2019, 9:07:32 PM12/16/19

to golang-dev

Why is fma useful here? I think you could get all the no-intermediate-rounding behavior just converting to float64 and back.
(Can a float64 hold all possible results of a float32*float32? I think so, but I'm not entirely sure.)

I believe that intermediate rounding can still occur in a rare case for "float32(float64(x) * float64(y) + float64(z))". This can either be overcome by manipulating the bits of the result (likely more efficient) or just calling double-precision fma.

Keith Randall

unread,

Dec 16, 2019, 9:39:47 PM12/16/19

to Akhil Indurti, golang-dev

What case is that?

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-dev/673d9494-8534-44c7-954b-0caa0e96bece%40googlegroups.com.

Akhil Indurti

unread,

Dec 16, 2019, 10:25:06 PM12/16/19

to golang-dev

What case is that?

The tie-breaking rule for round-to-nearest-even. If a double-precision result f falls exactly halfway between two consecutive single-precision values s1 and s2, the one result needs to be adjusted to pick the one who's mantissa is even. Example test cases include:

fma32(-1.9826383e+28,1.04311766e-07,-3.7997959e-06) == -2.068125e+21; want -2.0681251e+21

fma32(2.758646,4.3234556e+18,-1.9912108) == 1.1926884e+19; want 1.1926883e+19

fma32(-9.836287e+37,-0.007827744,-1.3154883e-18) == 7.699594e+35; want 7.699593e+35

fma32(0.30859375,1.0737377e+09,1.2915433e-38) == 3.3134874e+08; want 3.3134877e+08

fma32(-16384.373,-3.8942226e+33,16.000484) == 6.3804394e+37; want 6.38044e+37

fma32(1.9384766,1.6648816e+35,8127.9834) == 3.2273338e+35; want 3.227334e+35

fma32(124.015625,2.1469594e+09,-2.3877207e-38) == 2.6625652e+11; want 2.662565e+11

fma32(3.6281894e+27,2.8311552e+07,6.89733e-40) == 1.0271967e+35; want 1.0271968e+35

fma32(1.2461512e+35,0.1071931,16.007748) == 1.335788e+34; want 1.3357882e+34

fma32(4.000503,4.1231686e+11,1.4952093e-08) == 1.6494748e+12; want 1.6494749e+12

Akhil Indurti

unread,

Dec 16, 2019, 10:26:07 PM12/16/19

to golang-dev

I misspoke earlier, simply calling double-precision FMA won't work here, although it will produce consistently rounded output.

Keith Randall

unread,

Dec 16, 2019, 11:22:46 PM12/16/19

to Akhil Indurti, golang-dev

I'm confused. This program reports no errors:

package main

import "math"
import "fmt"

func f(x, y, z float32) float32 {
return float32(float64(x)*float64(y) + float64(z))
}

func g(x, y, z float32) float32 {
return float32(math.FMA(float64(x), float64(y), float64(z)))
}

func main() {
tests := []struct{ x, y, z float32 }{
{x: -1.9826383e+28, y: 1.04311766e-07, z: -3.7997959e-06},
{x: 2.758646, y: 4.3234556e+18, z: -1.9912108},
{x: -9.836287e+37, y: -0.007827744, z: -1.3154883e-18},
{x: 0.30859375, y: 1.0737377e+09, z: 1.2915433e-38},
{x: -16384.373, y: -3.8942226e+33, z: 16.000484},
{x: 1.9384766, y: 1.6648816e+35, z: 8127.9834},
{x: 124.015625, y: 2.1469594e+09, z: -2.3877207e-38},
{x: 3.6281894e+27, y: 2.8311552e+07, z: 6.89733e-40},
{x: 1.2461512e+35, y: 0.1071931, z: 16.007748},
{x: 4.000503, y: 4.1231686e+11, z: 1.4952093e-08},
}
for _, t := range tests {
if f(t.x, t.y, t.z) != g(t.x, t.y, t.z) {
panic(fmt.Sprintf("bad test %v\n", t))
}
}
}

So it looks to me like FMA is not needed, at least for the examples you've provided.

--

You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-dev/34f504e6-1469-41dd-b56a-3a89efd16f4d%40googlegroups.com.

Akhil Indurti

unread,

Dec 16, 2019, 11:38:55 PM12/16/19

to golang-dev

I'm confused. This program reports no errors:

Right. If what we want are bit-identical results across platforms, then float32(float64(x)*float64(y) + float64(z)) is fine. I can submit a CL for this.

However, if what we want is an accurate 32-bit FMA, then the test above would look more like this:

package main

import "fmt"

func f(x, y, z float32) float32 {

return float32(float64(x)*float64(y) + float64(z))

}

func main() {

tests := []struct{ x, y, z, want float32 }{

{x: -1.9826383e+28, y: 1.04311766e-07, z: -3.7997959e-06, want: -2.0681251e+21},

{x: 2.758646, y: 4.3234556e+18, z: -1.9912108, want: 1.1926883e+19},

{x: -9.836287e+37, y: -0.007827744, z: -1.3154883e-18, want: 7.699593e+35},

{x: 0.30859375, y: 1.0737377e+09, z: 1.2915433e-38, want: 3.3134877e+08},

{x: -16384.373, y: -3.8942226e+33, z: 16.000484, want: 6.38044e+37},

{x: 1.9384766, y: 1.6648816e+35, z: 8127.9834, want: 3.227334e+35},

{x: 124.015625, y: 2.1469594e+09, z: -2.3877207e-38, want: 2.662565e+11},

{x: 3.6281894e+27, y: 2.8311552e+07, z: 6.89733e-40, want: 1.0271968e+35},

{x: 1.2461512e+35, y: 0.1071931, z: 16.007748, want: 1.3357882e+34},

{x: 4.000503, y: 4.1231686e+11, z: 1.4952093e-08, want: 1.6494749e+12},

}

for _, t := range tests {

if f(t.x, t.y, t.z) != t.want {

panic(fmt.Sprintf("bad test %v\n", t))

}

I misunderstood your previous reply and thought we wanted 32-bit FMA.

Keith Randall

unread,

Dec 17, 2019, 12:02:24 AM12/17/19

to Akhil Indurti, golang-dev

So I guess my question is: where are you getting the "want" fields in this test?

So does the difference come down to the fact that g round to a float64 (after the + operation), then round again to a float32, and that's not the same as a hypothetical math.FMA32 which does both of those roundings in a single step?

--

You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-dev/ea3b0b5c-5c24-4400-b478-f6145b70de85%40googlegroups.com.

Akhil Indurti

unread,

Dec 17, 2019, 12:30:17 AM12/17/19

to golang-dev

So I guess my question is: where are you getting the "want" fields in this test?

I am running Berkeley TestFloat-3e http://www.jhauser.us/arithmetic/TestFloat.html, which exhaustively tests conformance against IEEE-754.

So does the difference come down to the fact that g round to a float64 (after the + operation), then round again to a float32, and that's not the same as a hypothetical math.FMA32 which does both of those roundings in a single step?

Yes. The result must be adjusted in the event of a tie-breaker.

Keith Randall

unread,

Dec 17, 2019, 12:58:41 AM12/17/19

to Akhil Indurti, golang-dev

So for x/image/vector I don't see any need to involve FMA. I don't think bit-correct accuracy is required, although as Nigel noted it would be nice to use something that doesn't vary across architectures (though that's hard to absolutely guarantee going forward, we could at least do it for a particular release).

Separately, the tests you provided argue for adding FMA32 to the math package, as it might be hard to do in pure Go. But I'm kind of skeptical that anyone working with float32 instead of float64 really needs the accuracy of FMA.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-dev/331f5825-d229-4d4f-887f-a3bbc01279a3%40googlegroups.com.

Reply all

Reply to author

Forward