On Wednesday, March 28, 2018 at 11:59:25 AM UTC-4, Bonita Montero wrote:
> > Java running on the same computer using the same sequence of math
> > ops can produce a different total than the same sequence coded in C++.
>
> Where's the problem? Java is a different language and may have a
> different arithmetic behaviour.
That is the problem. It's using the same IEEE-754 engine under the
hood. The result of a calculation using the same source code copied
from a C++ app into a Java app (something like "c = a + b", but more
complex) can produce different results in the Java app than it can
in the C++ app, or vice-versa.
This is directly the result of the IEEE-754 standard being insufficient
in and of itself to perform math properly. It requires coddling to get
it to produce identical results across platforms (and even sometimes on
the same machine across processes which are scheduled on different
cores).
> > It is not supposed to, but C++ will inject periodic stores and reloads
> > to ensure proper rounding at times.
>
> Only with the x87-FPU. With SSE or AVX, there are explicit operations
> with less precsion. And other architectures behave also such instruc-
> tions.
Various SIMD extensions have improved upon the issue relative to
the x87 FPU, but the results still remain. You cannot guarantee
that an ARM-based CPU will produce the same result as an AMD64-
based CPU if they both use the same sequence of operations. You
have to manually validate it, and that's the issue being discussed
by math-based hardware people and IEEE-754, and specifically at
the present time, John Gustafson.
> > This means the IEEE-754 "standard" allows for variability
> > in computationn based on when certain steps are performed.
>
> But a consistent behaviour as one would naively would expect
> can be configured.
Not without manual effort. You can't take a CPU that sports a fully
IEEE-754 compliant FPU and guarantee the results will be the same.
They likely will be, or as David points out, will be close enough
that no one would ever see the variance ... but I'm talking about
the actual variance which is there.
In the last few bits, rounding at various levels enters in and it
does change the result. It may have no real impact on the final
result rounded to four base-10 decimal places, but if you look at
the result the hardware computed, it is (can be) different.
> > It is not mathematically accurate, nor is it required to
> > produce identical results across architectures.
>
> You can easily configure different compilers of the same language
> on different architectures to give identical binary results.
This has not been my experience. I have listened to many lectures
where they state this is one of the biggest problems with IEEE-754.
In fact, one of the authors of Java was absolutely stunned to learn
that IEEE-754 on one machine may produce different results than the
same series of calculations on another. It floored him that such a
result was possible in a "standard" such as IEEE-754:
https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf
> >> FMA produces the same results as expliticit instructoins with
> >> some special behaviour when some operands are -0, Inf or NaN.
>
> > See section 2.3. FMA performs one round. Separate multiply-add
> > perform two. The results arwe different:
> > 2.3. The Fused Multiply-Add (FMA)
> >
http://docs.nvidia.com/cuda/floating-point/index.html
>
> NVidia GPUs aren't general purpose CPUs. And for the purpose these
> GPUs are designed, this behaviour isn't a restriction.
The document I posted outlines the issue. It is prevalent in any
implementation of the IEEE-754-2008 FMA extension. The operation
requires only one rounding, whereas separate multiply-add ops will
have two. This invariably produces different results, be it on a
GPU, CPU, or with trained mice mimicking FPU gates switching on
and off in a laboratory.
FMA is a different mathematical operation than two separate multi-
ply add operations. That's the end of that discussion.
The results from FMA are typically better than those of the two
separate multiply add operations, but the results are different.
One of the biggest arguments is whether or not a compiler should
be able to replace a multiply followed by an add with a single
fused_multiply_add operation. Most people think it shouldn't,
but to the compiler author, and in a pure mathematical sense,
it should not make any difference, so they optimize in that way.
But it does make a difference, and they shouldn't.
Some modern compilers are wising up to that realization. Intel's
compiler, for example, will only use FMA under certain flags for
optimization.
> >> That's not true. You can control how the FPU behaves through
> >> instructing the compiler or the FPU control word.
>
> > But you cannot guarantee the same operation across platforms.
>
> It can be configured so.
It cannot be guaranteed. You can work around it. You can wriggle
your way into it, but you cannot know with certainty if a particular
IEEE-754 compliant FPU will produce the same result on architectures
A, B, C, and D, without testing them.
IIRC, even some Pentium III-era and older CPUs produce some different
results than modern Intel-based hardware (and not due to the Pentium
FDIV bug), but to the way Intel removed "legacy DOS support" in the
Pentium 4 and later. You must now explicitly enable backward compati-
bility flags in the Pentium 4's x87 FPU and later CPUs to get the same
results in your debugger as you would've observed previously.
Such things have been altered for performance, but they impact the
fundamental operations of the x87 FPU.
With all things floating point, you must rigorously test things to
see if they are the same across platforms, and even arguably revisions
within platforms.
--
Rick C. Hodgin