Wolfgang Kilian <
kil...@invalid.com> wrote:
(snip)
> For current 64bit hardware one might naively expect a 64bit REAL type as
> default ... so why not do it the other way round: if there is good
> reason for compressed storage, one can declare and work with arrays of a
> KIND that is shorter than the default, right?
Seems to me that it was an accident that C did that.
First, C was meant as a systems programming language, for writing
things like operating systems and compilers. It included floating
point, implemented in software on many systems, but assumed that it
would be a small part of the actual computation. (That is, K&R C.)
K&R only supplied double constants, all floating point expressions
were evaluated as double, converted to float if necessary.
By the time the ANSI C standard was ready in 1989, the 8087 and 80287
were ready, and it was not so unusual to do everything as double.
Even so, ANSI added single precision constants (with a trailing f)
and the ability, with a prototype, to pass them to functions.
(K&R converted to double before passing float to a function, and
back again, if needed.)
> However, Fortran programs are supposed to be portable in space and time.
> In the x86 world, 32bit has been a de-facto standard for the default
> REAL, and I don't like relying on compiler options to change such
> things. So I don't see how to avoid a syntax like 1._dp everywhere in
> production code.
In the case of Fortran, DOUBLE PRECISION didn't come until Fortran II,
I believe done in software. (Single precision floating point in
vacuum tube hardware already seems pretty amazing to me.)
As far as I know, Fortran started the E notation for constants.
I am more sure that it started D, as no other language that I know
of uses D. It was already necessary to keep compatible with older
programs. (From a reliable source, it was Fortran that originated
the multiple character variable names. Mathematicians still seem not
to have figured that one out.)
Even a small matrix can have a large condition number, but it
is definitely easier with larger ones. Much work has been done
over the years on numerical algorithms to reduce the problems of
precision loss in matrix calculations. We use LU-decomposition
instead of matrix inversion. But as problems get larger, there
is a natural need for more precision on intermediate values,
I suppose it could have been done when then Fortran 66 standard
was written, but it seems to me that they pretty much wrote
what existed into the standard, maybe a little less.
I might have written this here too many times, but there is
a feature on the IBM 7030/Stretch that seems not to have been
used much since. One can select (maybe from a front panel switch)
whether to shift in zeros or ones on post normalization.
That is, round up or down. The idea was that a calculation would
be done both ways, and differences would be due to precision loss.
Now, there is one obvious problem, in that subtracting two such
numbers can cancel out the effect. But overall it seems a good
idea that hasn't been used much.
Otherwise, IBM starting with the 360/85 and continuing into all
S/370 and later machines, has hardware support for extended
precision (128 bit) floating point, except for divide.
(DXR didn't appear in hardware until much later.) VAX has H-float,
a 128 bit floating type, but not in hardware on many processors.
(I believe extra cost microcode for some, and standard on the
bottom end 11/730.)
But yes, a default of "double precision", considering the popularity
of the IEEE standard forms, does seem reasonable now.
-- glen