Michael S
unread,Sep 14, 2021, 11:22:59 AM9/14/21You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
On Friday, April 24, 2020 at 12:10:58 AM UTC+3, MitchAlsup wrote:
> On Thursday, April 23, 2020 at 2:54:54 PM UTC-5, Terje Mathisen wrote:
> > David Brown wrote:
> > > On 23/04/2020 10:25, Tom Gardner wrote:
> > >> On 23/04/20 08:44, David Brown wrote:
> > >>> On 22/04/2020 20:49, Tom Gardner wrote:
> > >>>>
> > >>>> The HPC mob have complained that is problematic for
> > >>>> them w.r.t. floating point behaviour.
> > >>>>
> > >>>
> > >>> If the language makes restrictions on floating point details, then I
> > >>> can see how that might annoy people. Some floating point users need
> > >>> highly predictable results that match IEEE specifications across
> > >>> different platforms - others want the highest possible speeds and
> > >>> will accept minor differences in results. If a language is
> > >>> specified to suit one of these groups, it is bound to annoy the other.
> > >>
> > >> There's an argument that if you are using floating point is never
> > >> exact and it does no harm to expose that inexactitude.
> > >
> > > Yes. That's my attitude for the code I write.
> >
> > Good for you!
> >
> > >
> > > There is another argument that says floating point should work exactly
> > > the same each time, and on all platforms - it is not an exact model of
> > > real numbers, but needs to be consistent and tightly specified nonetheless.
> > >
> > > Both attitudes are fine IMHO, and have their uses.
> >
> > Individual operations needs to be both consistent and reproducible, but
> > HPC style programming in the large have to accept that when the order of
> > operation is effectively random, you have to accept some limits to the
> > reproducibility of the final results.
> > >
> > >>
> > >> Caution: I am not an FP expert; so this is from memory
> > >> and is possibly misunderstood...
> > >> More importantly, if you have higher "internal precision"
> > >> during operations (e.g. 80-bit for 64-bit FP numbers), then
> > >> the end result of a calculation can be more exact - but the
> > >> IEEE spec precludes that.
> >
> > It does not! Java however do so.
> >
> > I.e. partly because the ieee754 spec was written in parallel with the
> > 8087 fpu, more or less everything Intel's 80-bit format can do is legal. :-)
> >
> >
> > OTOH, since all x64 fp math is done using the SSE regs, even that
> > platform is now in the exact reproducibility camp.
> >
> > Actually, there are still a few open slots for ulp offsets to occur
> > legally: The one I know best is related to rounding of
> > subnormal/denormal numbers where you can round first, before detecting
> > that the result is subnormal, or you can do it after normalization.
> >
> > The latter is the only "Right Way" (TM) to do it imho, but since both
> > exists in current cpus, both options are grandfathered in.
> >
> >
> > > Yes. 80-bit FP dates mainly back to floating point coprocessors, I
> > > think (I'm sure others here know vastly more of floating point history
> > > than I do), which are pretty much gone now. But you do get other cases
> > > like fused MAC that can give more accurate results than separate
> > > operations.
> >
> > FMAC is completely specified, but compilers can't really use it unless
> > language lawyers allow them to, i.e when you do
> >
> > s = a*b+c;
> >
> > without an internal rounding after the multiplication but instead keep
> > the exact product and feed it into the adder, you are explicitly
> > breaking the "as-if" rule since the results you get are different.
> Fast Fourier Transforms have an accuracy dependency that precludes
> using FMACs for the last 2 Multiply ADDs in the inner loop::
>
> DO 30 I = J, N, N1
> XT = RE(I) - RO(I)
> YT = IM(I) - IN(I)
> RE(I) = RE(I) + RO(I)
> IM(I) = IM(I) + IN(I)
> RO(I) = XT*C - YT*S // no FMAc here
> IN(I) = XT*S + YT*C // no FMAC here
> 30 CONTINUE
>
> The problem is that the COS() term and the SIN() term are balanced in
> a way* that causes FMAC to lose accuracy over the 2 MUL and 1 ADD
> version. This balance is inherent in S**2 + C**2 = 1.000000000000000
I don't understand why what's written above is bigger problem than any other loss of accuracy that inevitably happens during implementation of FFT via fixed-width FP arithmetic.
Mitch, can you provide an example of input vector for which FFT with FMA really breaks accuracy in significant/unusual way?
Also, your loop appears to be of Radix2 Decimation-In-Frequency variety.
If the problem really exist, does it apply to other common FFT algorithms,
e.g. Radix4 Decimation-In-Frequency or Radix2 Decimation-In-Time?