I have been noticing differences in the results of some code involving
only single precision REAL numbers when g77 (version included in gcc
2.95.3 release) optimizes (any level -O and up). I have attempted to
find perhaps one specific optimization accountable for this by turning
of as many optimize options as possible, but the only option that seems
to affect it is the -O.
Upon closer examination of intermediate results, and by printing out the
values to a greater precision than implied by REAL, it appears to me
that the optimized code _may_ be at times producing more precise
results, though sometimes the results matched exactly (at least to the
precision I printed out). I also took at look at the respective
assembly code. Not being too familiar with assembly, I have not made
much of it except that the optimized code uses the chip's "fsqrt" while
the unoptimized does a "call sqrt." Other than that, I did not notice
any double precision operations in either case. Interestingly enough,
the differences start appearing on a simple
multiply-by-2-and-subtract-by-1 operation, so it can't be just the
version of the SQRT that is used.
If I convert all intermediate values to DOUBLE PRECISION using
appropriate intrinsic calls and convert back to REAL at the end, the
unoptimized results seem to match the optimized single precision results
(once again, to the precision I chose to output). Also, optimization
does not seem to affect the double precision results...but I should
really look at it with more digits in the output to be sure.
Unfortunately, I have not been able to reproduce this in my own test
code _yet_, otherwise I'd post it.
So my question is, can anyone either confirm or refute my hypothesis
that the optimized code is more precise? And, in any case, is there
anything to be done, or any thing that _should_ be done? It is
disconcerting to me, but perhaps more so to others, that optimization
leads to different results, especially when other compilers (on other
platforms) do not appear to exhibit this behavior.
Thanks in advance, and sorry for such a long post.
--
Jonathan DeSena
Johns Hopkins University Applied Physics Laboratory
Power Projection Systems Department
Aviation Systems Group
In article <3B17F047...@jhuapl.edu>, <jonatha...@jhuapl.edu> wrote:
>This touches on some of what others have been discussing in the "mixed
>mode arithmetic" thread, but has not been addressed directly that I know
>of, so I'll post a new question here. I should also mention that this
>is a g77 (perhaps on Linux/Intel only?) specific question.
>
>I have been noticing differences in the results of some code involving
>only single precision REAL numbers when g77 (version included in gcc
>2.95.3 release) optimizes (any level -O and up). I have attempted to
>find perhaps one specific optimization accountable for this by turning
>of as many optimize options as possible, but the only option that seems
>to affect it is the -O.
--
Bill William Staniewicz
Amsterdam, NL
> Does this imply F77 is more precise?
What do you mean by "f77" and what are you comparing it to? It is not
at all clear from context.
The original poster was asking about possible precision differences
between optimized and unoptimized results with g77. If you are trying
to identify some other compiler by the term "f77", then I'm afraid it
is insufficient identification. There must be hundreds of compilers
named f77. My best guess (since the original poster was on a linux
system) is that you might be referring to the f77 script that comes in
some linux distributions and runs f2c+gcc. But that is just a guess.
If by "f77" you are referring to the Fortran 77 language, then I
haven't a clue what the comparison implied by the "more" is intended
to be too. I suppose that's probably not what you meant.
In either case, no I don't see that the original poster's data
implies anything related. (So perhaps I *REALLY* misunderstand
this question?)
On the original posters question, I certainly don't know for sure, but
my guess from the data given would be that it is just an artifact of
the particulat code that the optimized result is more precise than the
unoptimized one. In other codes, it could be the other way around.
Yes, optimization can change answers by reordering computations. For
a trivial example, mathematically a+b+c could be evaluated as either
(a+b)+c or a+(b+c). With finite precision arithmetic, however, these
two could give different results. It depends on the specific numbers
which is more accurate (or if both ways have the same accuracy, which
can also happen). You could easily have the compiler do one of those
computations optimized and the other unoptimized. If you know that it
matters, you should use parens to force the order. If you have an
algorithm that is overly sensitive to this difference and you don't
know about the sensitivity, then you may have deeper problems (such as
a numerically unstable algorithm). As long as the differences are
negligable, it probably doesn't matter - finite precision is like
that.
This is just my guess. If you had a small sample that reproduced the
phenomenon, it should be possible to get a more definitive analysis.
Lacking that, all I can do is guess like this.
--
Richard Maine | Good judgment comes from experience;
email: my last name at host.domain | experience comes from bad judgment.
host: altair, domain: dfrc.nasa.gov | -- Mark Twain
The following bit of code demonstrates what can happen:
REAL EPS,X
EPS=0.5
10 CONTINUE
X=EPS+1.0
IF (X .GT. 1.0) THEN
EPS=EPS/2
GOTO 10
ENDIF
EPS=EPS*2
PRINT *,EPS
END
If this is compiled with
g77 -ffloat-store eps.f -o eps
It will generate the output
1.1920929E-07
If this is compiled with
g77 -O eps.f -o eps
It will generate the output
1.08420217E-19
The machine epsilon of 1e-19 indicates that calculations are being done
with the Intel processor's extended precision (64 bit fraction, 16 bit
exponent.)
An examination of the assembler code generated in each case (you can
see the assembler code by using the "-S" compiler switch) shows that
with -O the variables are being kept in the floating point registers,
while with -ffloat-store, the result of "X=EPS+1.0" is stored to
a 4 byte location in memory and the rounding to single precision
is accomplished at that point.
Note that if you remove the "X=EPS+1.0" and just put "EPS+1.0" in
place of the X in the IF statement, you'll also get 1.0e-19. The
reason for this is that -ffloat-store only forces the rounding off
when it stores the result of an assignment- it doesn't store to memory
on every intermediate step in a calculation.
The Intel processors do have a user accessible control register that
could be set to force rounding to single precision after every
arithmetic operation. You can write a short assembler routine to
set the value of this control register.
I've done this in the past with simple code (+, -, *, /) and gotten
the results that I expected. However, I doubt that this would work
well with the floating point library routines (SIN, COS, LOG, EXP, etc.)
> This touches on some of what others have been discussing
> in the "mixed mode arithmetic" thread
> but has not been addressed directly that I know of
> so I'll post a new question here.
> I should also mention that this is a g77
> (perhaps on Linux/Intel only?) specific question.
I'll assume that you are compiling with g77
under Linux on a Pentium processor.
The Floating-Point Unit (FPU) has a stack of eight
80 bit extended precision floating-point registers.
Both single and double precision floating-point numbers
are always converted to this extended precision format
when they are loaded from memory.
All operations are performed in extended precision arithmetic.
The extended precision floating-point numbers are rounded
only when they stored back into memory as single or double
precision floating-point numbers.
> I have been noticing differences in the results of some code
> involving only single precision REAL numbers
> when g77 (version included in gcc 2.95.3 release)
> optimizes (any level -O and up). I have attempted to find
> perhaps one specific optimization accountable for this
> by turning of as many optimize options as possible
> but the only option that seems to affect it is the -O.
> Upon closer examination of intermediate results
> and by printing out the values
> to a greater precision than implied by REAL,
> it appears to me that the optimized code _may_ be at times
> producing more precise results,
> though sometimes the results matched exactly
> (at least to the precision I printed out).
> I also took at look at the respective assembly code.
> Not being too familiar with assembly,
> I have not made much of it except that
> the optimized code uses the chip's "fsqrt"
This is an 80 bit extended precision operation.
> while the unoptimized does a "call sqrt."
I believe that this is a 64 bit double precision operation.
> Other than that,
> I did not notice any double precision operations in either case.
> Interestingly enough, the differences start appearing
> on a simple multiply-by-2-and-subtract-by-1 operation,
I have no idea what you are talking about here.
> so it can't be just the version of the SQRT that is used.
>
> If I convert all intermediate values to DOUBLE PRECISION
> using appropriate intrinsic calls and convert back to REAL at the end,
> the unoptimized results seem to match the optimized single precision results
> (once again, to the precision I chose to output).
> Also, optimization does not seem to affect the double precision results...
> but I should really look at it with more digits in the output to be sure.
>
> Unfortunately, I have not been able to reproduce this
> in my own test code _yet_, otherwise I'd post it.
>
> So my question is, can anyone either confirm or refute my hypothesis
> that the optimized code is more precise?
> And, in any case, is there anything to be done
> or any thing that _should_ be done?
> It is disconcerting to me but perhaps more so to others
> that optimization leads to different results
> especially when other compilers (on other platforms)
> do not appear to exhibit this behavior.
The GNU compilers (and some other compilers too)
go to great pains to emit code that is faithful
to the appearance of your Fortran 77 code
so that it is easier to debug using tools like gdb.
The compiler inserts additional instructions
to ensure that, for example, a single precision result
actually gets rounded to single precision --
the compiler simply emits code to store a result
in an extended precision floating-point register
as a single precision result on the stack
then loads it back into the register again.
All of this nonsense disappears
when you select any optimization option.
Always use the highest level of optimization
unless you need to a version of your code
that you can use with the debugger.
Sorry, I should have used a code sample. What I meant was that the
operation which starts to introduce differences in this case (there are
most definitely other that i have not identified yet) is as follows:
X=2.0*Y-1.0 <== X,Y are type REAL
After this line, X differs in the optimized code.
After reviewing all the replies I think I have a decent understanding of
what is going on. It seems to me then that I should also notice some
differences when optimizing with DOUBLE PRECISION numbers due to the 80
to 64 bit conversion. Albeit these differences should be much smaller
than in the REAL case. Yes? So far I have not checked to see if this is
the case.
The question I am left with is why I do not see this with Sun's f77 on
Solaris/SPARC. The same principles should apply, except the
implementation may be 32 bit REALs with 64 bit registers (for example --
I do not know what SPARC II's FPU uses.
I also still wonder whether this precision variation answer will satisfy
others who have been conditioned to expect the exact same results after
optimization. My experience has been that the variation I've described
is often cited as a reason that Linux/g77 may not be suitable for
production runs. The ironic thing is that most recommend not to
optimize as a solution, but from what you're saying, the optimized code
may be "better."
I'd appreciate any more insights/comments into this question, especially
from some who work with other compiler/platform combinations for which
optimization produces the same results. Thanks for the help to all
those who've replied.
> The question I am left with is why I do not see this with Sun's f77 on
> Solaris/SPARC. The same principles should apply, except the
> implementation may be 32 bit REALs with 64 bit registers (for example --
> I do not know what SPARC II's FPU uses.
Proper IEEE implementations will use only as many bits as the code actually
specifies - that is, if you load a 32-bit real into an FPU register, all
subsequent oeprations on that register should, implicitly or explicitly, be
of single precision. It is a compplicated historical accident involving both
hardware and software that this, while I believe achievable on x86 systems,
it is not the default type of behaviour, and may actually involve OS assitence
in order to achieve it consistently.
Jan
> Sorry, I should have used a code sample. What I meant was that the
> operation which starts to introduce differences in this case (there are
> most definitely other that i have not identified yet) is as follows:
>
> X=2.0*Y-1.0 <== X,Y are type REAL
>
> After this line, X differs in the optimized code.
Can you provide a self-contained snippet along with the compiler options
and
the platform type ?
Regards,
Arnaud
I also must make a BIG APOLOGY to those following this thread. I
mistakenly associated the output of my program with the wrong versions!
In other words, the "error" (quotes because I am only assuming it to be
an error as of yet) is with the OPTIMIZED code. The unoptimized (or up
to -O) case produces the expected results. This COMPLETELY invalidates
the claim I made in my original post. Sorry for the confusion. The
comments on the orig. post were in fact useful to me in any case.
It seems to be a quite specific case. Changing a few seemingly minor
details in the code may remove the difference for a particular case,
though using different starting numbers may result again in the "errors"
for several items. (See code sample.)
Compiling on Solaris/SPARC with g77 and Sun's f77 produces same results
regardless of optimization, which are identical to those of Linux/Intel
unoptimized.
Some details...
Platform: Linux (kernel 2.4.3:probably not important), Intel Pentium III
processor
Compilers: g77 from gcc release 2.95.3 and
prelease 2.95.4 (g77 version 0.5.25 20010319)
Compile Command : optimized version: "g77 -O2 -o testopt test.f"
unoptimized: "g77 -o test test.f"
C********************* Begin Code Sample ***************
PROGRAM TEST
C *This program demonstrates a difference in the results
C *of optimized versus unoptimized code on Linux/x86.
C *Optimized code must be compiled with -O2 or higher.
INTEGER I1,I2
C *the following ints will cause the problem, others may or may not
I1=72132
I2=41951
CALL TESTME(I1)
CALL TESTME(I2)
STOP
END
C *A separate subroutine is required for some reason
SUBROUTINE TESTME(I)
C *The following parameter does not have to be the same as used
C *here, but not all numbers will result in the problem given a
C *certain integer, I, either.
PARAMETER (S=3.8115939E-6)
INTEGER I
REAL A,B
C *The next line seems to be required for the problem to occur.
C *Note that the result, A, is the same optimized or not.
A=FLOAT(I)*S
C *Now do some simple arithmetic. this should not let the compiler
C *do any reordering of operations: it _must_ multiply and then
C *subtract. This step is required. Multiplication or
C *addition/subtraction alone are not enough. The choice of
C *constants (2.0,1.0) is somewhat arbitrary, but changing them will
C *cause certain integer and S combinations to "work" or not.
B=2.0*A-1.0
C *Print out the results
PRINT 1, S,A,B
C *Format with more digits than implied by DOUBLE PRECISION
1 FORMAT (3E26.18)
RETURN
END
C************** End Code Sample ********************************
Here's the results:
unoptimized:
0.381159384232887533E-05 0.274937897920608521E+00
-0.450124204158782959E+00
0.381159384232887533E-05 0.159900173544883728E+00
-0.680199623107910156E+00
optimized:
0.381159384232887533E-05 0.274937897920608521E+00
-0.450124233961105347E+00
0.381159384232887533E-05 0.159900173544883728E+00
-0.680199682712554932E+00
differences underlined here
-----------
While the differences are right about at the limit for the precision of
a 32 bit floating point number, they imply that the value in memory is
different, which I believe to be a bug. More specifically, the lowest
bit appears to be being altered somehow.
Converting -0.450124204158782959E+00 to the bits in memory (in hex) we
get 0xbee676ae.
But the optimized result, -0.450124233961105347E+00, is 0xbee676af, the
last bit now a 1 rather than a 0.
Again -0.680199623107910156E+00 is 0xbf2e2190 (last bit=0)
and -0.680199682712554932E+00 is 0xbf2e2191 (last bit=1)
I am not sure what is going on here exactly, but perhaps I should send
off a bug report to the g77 people.
Thanks again.
>Compiling on Solaris/SPARC with g77 and Sun's f77 produces same results
>regardless of optimization, which are identical to those of Linux/Intel
>unoptimized.
> A=FLOAT(I)*S
RISC ISAs have enough opcode space to specify single- or double-
precision operations within the opcodes themselves but the x87
ISA doesn't distinguish between precisions within the opcode. To
obtain a correctly rounded single-precision value for A here an x87
machine must either change the FPU control word by saving it to
memory, clearing bits 8:9, storing the resulting mask to another
memory location, loading the new control word, carrying out the
multiplication, and finally restoring the saved control word or
store the intermediate value to memory and reload it into the
FP stack. Both of these approaches take time and so x87
compilers won't use either of them for optimized code unless
you explicitly tell them to. The optimized results are likely
more accurate because the rounding of the intermediate value
doesn't happen. I get the same results for debug as for release
version on CVF 6.5A so you can tell from that whether my home
PC is RISC or not. Perhaps am SSE-aware compiler could get the
results you want without slowing down.
The error is actually with the OPTIMIZED code, as I mentioned in my
UPDATE posting. Once again I am sorry for the mix up. Initially I
thought the optimized results were more correct, thus the title of the
original post.
The OPTIMIZED code is erring on the last bit of the floating point
number (see previous posting). The unoptimized code matches what you
would expect if you did the arithmetic operation by hand, or with a
calculator, or another computer platform.
Also, I do not believe the FLOAT call to be the sole cause, since the
results are exactly the same after it (A is same for both cases).
However, the FLOAT call seems to be important, since if I replace it
with
A=0.274937897920608521 (which is the result of A=FLOAT(I)*S)
then the problem "goes away."
> I am not sure what is going on here exactly, but perhaps I should send
> off a bug report to the g77 people.
Hi,
a bug report isn't necessary. The problem is that the floating-point
unit (FPU) in your Linux machine is doing the calculations in extended
precision (80-bit mode), while the results are stored to memory in
less precision. Depending on the level of optimization, intermediate
results may or may not be stored in memory (and thus truncated) before
the calculation is done. In your case, the value of A could differ
if moved temporarily to memory rather than left in a FPU register
before calculating B. This is known behaviour on x86 machines. On
Sparc machines, the memory and FPU use the same number of bits, so
there is no truncation if values are temporarily moved between the two.
On x86, there are some ways to get around this. The `-ffloat-store'
option of g77 kind of works by trying to force temporary values to
memory (with mixed success, since I think it misses some). You can
also set the FPU to work in double precision mode rather than extended
precision mode (but I believe that only affects the mantissa, and
not the exponent). That works better than using `-ffloat-store',
but doesn't work with transcendental functions (since they always
return 80-bit numbers in an FPU register, regardless of FPU
precision setting). I've included a snippet of C code below that
you can compile with gcc and link to your Fortran code to set the
mode of the FPU to double precision, and thus get fairly close to
`workstation' behaviour. You have to `call initialize_387_to_ieee'
at some point before starting your calculations.
On the other hand, maybe you don't care that the results are different
(not wrong) in the last 1 or 2 bits, and let the program be ;)
(*Note: The following comments in the code are Chris Hanson's. His
comment that the extended precision mode is non-IEEE is not entirely
true. Extended precision modes were recognized in the IEEE-854 standard
that followed the IEEE-754 standard. They are, however, `non-standard'
on most workstations.)
(Snip here) 8<----------------------------------------------------------
/*
Code to initialize the ix87 FP coprocessor control word.
This code must be run once before starting a computation.
Bit(s) Description
------ -----------
0 invalid operation (FP stack overflow or IEEE invalid arithmetic op)
1 denormalized operand
2 zero divide
3 overflow
4 underflow
5 precision (indicates that precision was lost; happens frequently)
The first 6 bits control how the chip responds to various
exceptions. If a given mask bit is 0, then that exception
will generate a processor trap. If the mask bit is 1, then
that exception will not trap but is handled by a "default
action", usually substituting an infinity or NaN.
Default is all masks set to 1.
8/9 precision control
00 IEEE single precision
01 (reserved)
10 IEEE double precision
11 non-IEEE extended precision
Default is non-IEEE extended precision.
10/11 rounding control
00 round to nearest or even
01 round toward negative infinity
10 round toward positive infinity
11 truncate toward zero
Default is round to nearest or even.
This code (0x0220) sets these bits as follows:
1. Precision mask 1, all others 0.
2. Precision control: IEEE double precision.
3. Rounding control: round to nearest or even.
*/
#ifdef __GNUC__
#if #cpu (i386)
void
initialize_387_to_ieee__ (void)
{
unsigned short control_word;
asm ("fclex" : : );
asm ("fnstcw %0" : "=m" (control_word) : );
asm ("andw %2,%0" : "=m" (control_word) : "0" (control_word), "n" (0xf0e0));
asm ("orw %2,%0" : "=m" (control_word) : "0" (control_word), "n" (0x0220));
asm ("fldcw %0" : : "m" (control_word));
}
#endif /* #cpu (i386) */
#endif /* __GNUC__ */
>The OPTIMIZED code is erring on the last bit of the floating point
>number (see previous posting). The unoptimized code matches what you
>would expect if you did the arithmetic operation by hand, or with a
>calculator, or another computer platform.
Let's test the thesis of the above paragraph with MS Calculator:
1) Punch in S = 3.81159384232e-6 (This is an approximation of
the single-precision value that the program will be using for S.
2) Punch in I1 = 72132
3) Multiply to get A = 0.2749378870342
4) Multiply by 2 and subtact 1 to get B = -0.4501242259315
Note that this is closer to the optimized -0.4501242339611
than to the unoptimized -0.4501242041588
I think what you are doing is punching in the single precision rounded
A = 0.274937897920 to get B = -0.45012420416. In this case the value
of B does indeed agree better with that of A than the optimized
version produces, but the value of B you get from the optimized
version is in fact more consistent with the initial parameters
(I1 and S) than the value from the unoptimized version because the
optimized version avoids the intermediate store to memory hence
rounding to single precision of A before calculating B.
[snip]
g77 -O2 -ffloat-store makes it possible to get the same results as
without
optimization. I agree this behaviour is not satisfying. I think that
this
type of problem is a known issue of gcc on x86 but nobody seems keen to
work on it.
More specifically, looking at the assembler produced, the line
A=REAL(I)*S
is translated to "fimul" in the optimised code instead of "fmulp".
This lead to different rounding.
As your example is both small and self contained, please send it to
gcc-...@gcc.gnu.org so that it is properly archived.
See below as well for some minor fixes to your code.
Thanks for your post.
Regards,
Arnaud
> Compilers: g77 from gcc release 2.95.3 and
> prelease 2.95.4 (g77 version 0.5.25 20010319)
> Compile Command : optimized version: "g77 -O2 -o testopt test.f"
> unoptimized: "g77 -o test test.f"
>
> C********************* Begin Code Sample ***************
> PROGRAM TEST
> C *This program demonstrates a difference in the results
> C *of optimized versus unoptimized code on Linux/x86.
> C *Optimized code must be compiled with -O2 or higher.
>
> INTEGER I1,I2
> C *the following ints will cause the problem, others may or may not
> I1=72132
> I2=41951
> CALL TESTME(I1)
> CALL TESTME(I2)
> STOP
> END
>
> C *A separate subroutine is required for some reason
> SUBROUTINE TESTME(I)
>
> C *The following parameter does not have to be the same as used
> C *here, but not all numbers will result in the problem given a
> C *certain integer, I, either.
Add:
REAL S
INTRINSIC REAL
> PARAMETER (S=3.8115939E-6)
> INTEGER I
> REAL A,B
>
> C *The next line seems to be required for the problem to occur.
> C *Note that the result, A, is the same optimized or not.
Use REAL instead of FLOAT. FLOAT is a specific version of the generic
REAL.
> A=FLOAT(I)*S
A=REAL(I)*S
Does this program have to be called before each floating point
operation, or can it be run just once at the beginning of a long
program? The comment in the code seems ambiguous to me.
Anyway, the more I think it through and get feedback, the more I am
convinced that this is not a problem that there is much to do about.
While I understand that one answer is no better than another, just
different, variability due to the level of optimization does not "feel
right." Still, as long as it can be defended and I have an
understanding of what's going on, I suppose I can live with it. I did
put a bug report in already, but I do not expect much to come of it.
I also tried the test and the full program using DOUBLE PRECISION. It
seems that if Intel is using 80 bit registers and a double is 64 bit,
that differences should be evident as well, even if it is really small.
But so far, no differences were perceptible, either with or without
optimization or cross platforms. (Why not?) So perhaps the code I've
been looking at is better suited for DOUBLE PRECISION, at least for
portability. Unfortunately, the whole program should probably be
converted to use DOUBLE PRECISION, which would be a time consuming task
that nobody really wants to do.
Thanks for all the help.
Chris
[about converting precision]
> Converting shouldn't be that hard if the compiler supports
> generic functions (i.e. function determined by type of
> arguments during compile); just add implicit real*8 (a-h,o-z)
> in all routines.
I didn't see a smiley, so I'll assume that the above was meant
seriously. :-(
Whereas I'd agree that converting *SHOULDN'T* be hard, I'd
also say that there *SHOULD* be peace on earth and that
world hunger *SHOULD* not exist. Alas, things are not always
as they should be. :-(
Trust me. I have personally spent many man-months converting
precisions of codes. Some codes are really trivial. Others
are easier to rewrite from scratch than to convert. I would say
it was an unusual code that could be converted just by adding
implicit real*8 declarations. Such codes exist, but I'd say
they were a pretty tiny minority.
To even describe all the possible complications would take
longer than I feel like spending at the moment (particularly
as I'm doing this from a hotel room on travel after a longish
day).
Yes, some codes are easy. My personal current f90 codes should be
convertable in a snap. But some are really, really hard. (The
ones that are most "fun" are full of equivalences and the like
that depend on the relative sizes of different types.)
I had the same question when I first saw this code. It only has
to be called once before the calculations are started. It also
only affects the process (program) it was called in. Each process
uses its own value of the FPU control word.
> Anyway, the more I think it through and get feedback, the more I am
> convinced that this is not a problem that there is much to do about.
> While I understand that one answer is no better than another, just
> different, variability due to the level of optimization does not "feel
> right." Still, as long as it can be defended and I have an
> understanding of what's going on, I suppose I can live with it. I did
> put a bug report in already, but I do not expect much to come of it.
> I also tried the test and the full program using DOUBLE PRECISION. It
> seems that if Intel is using 80 bit registers and a double is 64 bit,
> that differences should be evident as well, even if it is really small.
> But so far, no differences were perceptible, either with or without
> optimization or cross platforms. (Why not?) So perhaps the code I've
> been looking at is better suited for DOUBLE PRECISION, at least for
> portability. Unfortunately, the whole program should probably be
> converted to use DOUBLE PRECISION, which would be a time consuming task
> that nobody really wants to do.
My guess is that if you initialize double precision variables with
single precision values (ie. only fill <=23 bits of the 53 bit mantissa),
then you can do a few arithmetic operations before the non-zero bits
in the mantissa grow to over 53 bits (and you start to see truncation
in going from 80 to 64 bit precision). The `problem' is still there,
only smaller.
So, your bug report probably won't be addressed because many people
don't view this as a bug. I seem to recall some bitter battles between
the `more bits have to be better' and the `different answers go against
the holy IEEE-754 specs' camps a few years ago when Craig Burley looked
into addressing this in g77. In the end, I think he just threw his
hands up in despair because there was no consensus even among highly
experienced number-crunchers.
Cheers,
Rob Komar
In our case, it would be _nice_ if results did not vary across
platforms. Getting the "right answer" is not as important .
> In our case, it would be _nice_ if results did not vary across
> platforms. Getting the "right answer" is not as important .
>
> --
> Jonathan DeSena
Hello,
I can't agree with that statement in any case:
If uniformity is an issue, why not just output some arbitrary fixed
integer numbers?
I would regard varying results as indication, that the numbers
shouldn't be trusted too much. If the variations are small, that might
be an indication that the numbers are approximately right (at least
different arithmetic has no major influence on the algorithm).
Alois
--
Alois Steindl, Tel.: +43 (1) 58801 / 32558
Inst. for Mechanics II, Fax.: +43 (1) 58801 / 32598
Vienna University of Technology,
A-1040 Wiedner Hauptstr. 8-10 Email: Alois.Ste...@tuwien.ac.at
> Wasn't there a proviso in the Fortran standard at one time that required
> all data types to be of the same size, so there may have been some excuse
> for this assumption?
From at least F77, default integer and default real have to be of the same
size, and double precision twice this size. This is to make EUIQVALENCE
somewhat portable.
However, there are other problems that make simply recompiling often not a
viable solution, for instance in the use of intrinsic functions.
Jan
Chris
<snip>
I disagree. The main problem is that the gcc backend doesn't have the
ability to preserve the precision of floating point operations with
respect of the optimisation problem.
Yes, it is a known limitation (see gcc FAQ
http://gcc.gnu.org/fom_serv/cache/49.html). However gcc could do
the better job if somebody tackles the issue. Refer to the recent
thread called "Kahan's Floating Point Test" in May 2001 of the
g...@gnu.org (http://gcc.gnu.org/ml/gcc/2001-05/threads.html).
See especially http://gcc.gnu.org/ml/gcc/2001-05/msg01624.html.
In any case Jonathan DeSena's example is small enough to demonstrate
a problem and deserved IMO to be archived
Cheers,
Arnaud
> In our case, it would be _nice_ if results did not vary across
> platforms. Getting the "right answer" is not as important .
That is not easily achievable, trust me, even for well-behaved algorithms and
IEEE-conforming FP. Take the variations you see as an indication that results,
having been obtained via (slightly) different routes but still "similar" - a
relative error of one or two LSB in the mantissa, say - are "reliable".
Jan
Even I can remember when the competence of the programmer was measured
by how big a model he could squeeze into the available machine...
Catherine.
--
Catherine Rees Lay
Arnaud Desitter wrote:
> Jonathan DeSena's example is small enough to demonstrate
> a problem and deserved IMO to be archived
Seconded. Variations on the order of floating-point precision don't matter
from a scientific standpoint but they sure make debugging harder. Yet - for
the price, I can't complain.
Catherine Rees Lay wrote:
> There's
> (probably) no reason these days to equivalence real and integer arrays -
Dunno whether my current D1MACH and R1MACH still do this, but it was a
common way to put an exact machine representation into a real variable.
> Arnaud Desitter wrote:
Yes, I spent quite a bit of time a few years ago (when g77 was just becoming
stable) trying to verify our g77/x86 builds of a large Monte Carlo particle
physics simulation program by comparing the results against those generated
by commercial workstations (Sun, Dec OSF/1,...). My theory was that if I
could get the same results with g77/x86 as the workstations got, I could
trust the (still proclaimed as a beta) g77 compiler. I tinkered with the
FPU control word, and even hacked the libc math library to always store/load
the returned values from intrinsic functions to consistently truncate them,
but never got exactly the same results as the other machines. But then
again, the Sun and Dec results also differed somewhat, not only between
each other, but between themselves at different levels of optimization.
My guess was that I was seeing the tiny differences produced by performing
the arithmetic operations in different order at different levels of
optimization for calculations produced by the same compiler, and even
bigger differences caused by intrinsic functions returning different
results on different architectures. Monte Carlo simulations, if run
long enough, will branch into completely different histories if the
calculations aren't identical, so they are particularly sensitive to
even tiny differences in FPU calculations. In the end, I gave up on
trying to verify the results in this fashion, and fell back on checking
that the results matched statistically.
I seem to recall that the Sun compiler had a switch to make sure that the
FPU calculations were identical at various levels of optimization by
fixing the arithmetic operations to some consistent order. That's the
kind of effort you have to go to to achieve `standard' results. If
you don't, how do you know that the small differences (those in the
least significant bit position that started this whole thread) are
due to `bugs' rather than the mundane differences between even IEEE-754
compliant calculations?
Cheers,
Rob Komar
Thanks for your reply. I am glad others have tried similar tests as I
have, with similar results. I too have by now decided to compare results
after many runs as the validation rather than comparing numbers
throughout a single run. I will say however, that comparing each
individual run was useful because it pointed out some places in the code
which are not very forgiving to small differences and allowed me to
clean a few places up.
So far, statistically, g77/x86 and Solaris f77/SPARC seem to be very
close -- I expected a bigger difference. I will do some more tests
though to be sure.
> I seem to recall that the Sun compiler had a switch to make sure that the
> FPU calculations were identical at various levels of optimization by
> fixing the arithmetic operations to some consistent order.
In the version of the Solaris compiler we are using (f77 which came with
Workshop 5.0), optimization is producing exactly the same results as
unoptimized. Perhaps the switch you are reffering to is now the
standard.
> That's the
> kind of effort you have to go to to achieve `standard' results. If
> you don't, how do you know that the small differences (those in the
> least significant bit position that started this whole thread) are
> due to `bugs' rather than the mundane differences between even IEEE-754
> compliant calculations?
That's exactly what disturbed me about the observations I made in the
first place, and the reason I continued to pursue what was causing
differences until I could at least justify them.
>
> I seem to recall that the Sun compiler had a switch to make sure that the
> FPU calculations were identical at various levels of optimization by
> fixing the arithmetic operations to some consistent order. That's the
> kind of effort you have to go to to achieve `standard' results. If
> you don't, how do you know that the small differences (those in the
> least significant bit position that started this whole thread) are
> due to `bugs' rather than the mundane differences between even IEEE-754
> compliant calculations?
>
There is a good paper related to that topic in the Sun's Numerical
Computation
Guide.
http://www.validgh.com/goldberg/addendum.html
Cheers,
Arnaud
jonatha...@jhuapl.edu wrote:
> I will say however, that comparing each individual run was useful because it
> pointed out some places in the code
> which are not very forgiving to small differences and allowed me to
> clean a few places up.
>
John Rice's old book on Matrix Computation suggested running on two different
compilers to test the sensitivity of an algorithm to roundoff errors. The
optimization switch on g77 may (unpredictably) accomplish the same purpose, making
a virtue of necessity.
The old Cray compilers had a TRUNC=nn command line option which caused
the compiler to mask to zero the lower nn bits after almost every
floating point operation. It originally was put in to allow
benchmarkers
to investigate why different machines/compilers gave different answers.
Sometimes the differences between 48 bit mantissas and 47 bit mantissas
was spectacular.
Dick Hendrickson
Dick Hendrickson wrote:
>
>
> The old Cray compilers had a TRUNC=nn command line option which caused
> the compiler to mask to zero the lower nn bits after almost every
> floating point operation. It originally was put in to allow
> benchmarkers
> to investigate why different machines/compilers gave different answers.
If I had my druthers, though, I would prefer a quad precision compiler switch like on
the Lahey compiler.
Off topic, but is there any mention in the F2X Standard for quad precision as a standard?
A lot of compilers have it, but some still don't. Or is it more thought of as a brute
force method to make poorly designed algorithms work?
cheers,
paulv
--
Paul van Delst A little learning is a dangerous thing;
CIMSS @ NOAA/NCEP Drink deep, or taste not the Pierian spring;
Ph: (301)763-8000 x7274 There shallow draughts intoxicate the brain,
Fax:(301)763-8545 And drinking largely sobers us again.
Alexander Pope.
There is no proposal in F2K to require a quad real.
With the OOP features being added, there's more support
than ever before for making your own types.
On Wed, 13 Jun 2001 09:40:16 -0400, Paul van Delst
<paul.v...@noaa.gov> wrote:
<snip requoted>
>
>Off topic, but is there any mention in the F2X Standard for quad precision as a standard?
>A lot of compilers have it, but some still don't. Or is it more thought of as a brute
>force method to make poorly designed algorithms work?
>
>cheers,
>
>paulv
--
Cheers!
Dan Nagle
Purple Sage Computing Solutions, Inc.
Dan Nagle wrote:
> With the OOP features being added, there's more support
> than ever before for making your own types.
But I just want a debugging tool to use with mainly f77 code. Can't beat a compiler switch
for that.
Thanks for the link. I saved a copy of this some time ago, but
the equations were full of gifs that I hadn't saved, making it
pretty hard to read. Thanks also for reminding me about it;
it was worth reading again.
Cheers,
Rob Komar
Tim Prince wrote:
<snip>
> Consider that many Fortran compiler vendors have been forced to implement
> switches to select the size of default integer, logical, and real at compile
> time. There's plenty of valuable code written by authors who designed it
> not to run on a compiler with different defaults from theirs. Wasn't there
> a proviso in the Fortran standard at one time that required all data types
> to be of the same size, so there may have been some excuse for this
> assumption?
<snip>
The Fortran 66 standard allowed users, for the purpose of EQUIVALENCE, to
assume that REAL and INTEGER are the same size and DOUBLE PRECISION and COMPLEX
were twice that size. Fortran 77 added LOGICAL at the same size as REAL. The
Fortran 77 rules still apply to Fortran 95. The swiches work provided there is
no equivalencing of DOUBLE PRECISION entities with entities of other data
types, no attempt to call libraries compiled with different switches, no
attempt to read binary data files generated using other switches, and the
increase in memory useage is no problem. Changing the text of the program from
to convert all REAL entities to REAL(KIND=DBL), has additional problems if REAL
entities are equivalenced with INTEGER or LOGICAL entities, or the code calls a
standard or compiler defined intrinsic by its specific name instead of its
generic name. As the textual change is more hidden from view than the switch it
may also be more prone than the switch for problems with libraries and data
files. In Fortran 90/95 TRANSFER's behavior might change undesirably with
either the swich or the textual change.
Those are the problems (I know of) for legal code. Some codesdo illegal things
with COMMON block layouts in different routines, i.e., having the same named
entity having different types, that in effect assumes that the entities occupy
certain relative amounts of storage, implied by the EQUIVALENCE rules, and will
work for most processors (as the code is illegal it need not work on any of
them) if those size requirements are not broken. The textual change is highly
likely to break that requirement.
> Rob Komar wrote:
> > I seem to recall that the Sun compiler had a switch to make sure that the
> > FPU calculations were identical at various levels of optimization by
> > fixing the arithmetic operations to some consistent order.
> There is a good paper related to that topic in the Sun's Numerical
> Computation
> Guide.
> http://www.validgh.com/goldberg/addendum.html
Which, by sheer coincidence (hah!), has been part of our "Readings"
sections since years :-)
See: http://gcc.gnu.org/readings.html:
Miscellaneous information
What Every Computer Scientist Should Know about Floating-Point
Arithmetic by David Goldberg,
including Doug Priest's supplement (PostScript format)
Differences Among IEEE 754 Implementations by Doug Priest
(included in the PostScript-format document above)
Cheers,
--
Toon Moene - mailto:to...@moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)