Real number precision in fortran

Kim Hanjoon

unread,

Aug 27, 2023, 9:52:41 AM8/27/23

to

Hi, I'm new at Fortran. I'm studying numerical analysis and I got question about real number precision

program test
real*8 x

x = 1.0/3.0

write(*,*) x
stop
end

Above code gives result 0.33333334326744080. I know why this happens because computer can not represent real number exactly but only gives approximation.

However, similar code in Python or C++ gives 0.3333333333333333 which is more closer to 1/3 than 0.33333334326744080. Somebody said their difference is small so I can ignore, but I want to represent as possible as I can. I wonder how I can fix that rough approximation.

I used gfortran in windows10.

Thank you.

gah4

unread,

Aug 27, 2023, 10:09:45 AM8/27/23

to

On Sunday, August 27, 2023 at 6:52:41 AM UTC-7, Kim Hanjoon wrote:
> Hi, I'm new at Fortran. I'm studying numerical analysis and I got question about real number precision
>
> program test
> real*8 x
>
> x = 1.0/3.0
>
> write(*,*) x
> stop
> end

> Above code gives result 0.33333334326744080. I know why this happens
> because computer can not represent real number exactly but only gives approximation.

The first answer to your question is x = 1.d0/3.d0

(Since you have REAL*8, you don't need the modern version yet.)

> However, similar code in Python or C++ gives 0.3333333333333333 which is
> more closer to 1/3 than 0.33333334326744080.

I believe that Python only does double precision, but someone can say if
that is wrong.

In C, I believe inherited by C++, floating point constants default to double.

C, as defined by K&R, evaluated all expressions in double precision,
though still allowing for float (single precision) variables.

That works well with processors that automatically evaluate
to full precision, like x87.

But Fortran constants default to single precision, and so 1.0/3.0
is evaluated in single precision, even though you assign to a REAL*8
variable.

It is one thing that Fortran programmers learn fast.

In actual fact, though, a large number of programs need double
precision intermediates to get single precision results.
(That is, to avoid precision loss in subtraction.)

Invert a 10x10 matrix in double precision, you will likely get
single precision results. Close enough, though.

Gary Scott

unread,

Aug 27, 2023, 1:20:16 PM8/27/23

to

Sorry, newly installed news reader went to email.

REXX is user selectable precision.

Ron Shepard

unread,

Aug 27, 2023, 1:38:26 PM8/27/23

to

On 8/27/23 8:52 AM, Kim Hanjoon wrote:
> Hi, I'm new at Fortran. I'm studying numerical analysis and I got question about real number precision
>
> program test
> real*8 x
>
> x = 1.0/3.0
>
> write(*,*) x
> stop
> end

First, let me rewrite your code in modern fortran and in an expanded form.

program testfp
use, intrinsic :: iso_fortran_env, only: real32, real64
real(real64) :: s
s = real( 1.0_real32 / 3.0_real32, kind=real64 )
write(*,*) s
end program testfp

There are of course many other ways to write this code, but this shows
explicitly what your code is doing. In the original expression, these
steps were all done implicitly by the compiler, so they might not be
obvious to a new fortran programmer.

Note that the assignment expression (the division) is evaluated in
real32 precision, and then that value is converted and assigned to the
real64 variable. That initial evaluation is where the precision was lost.

An important feature in fortran is that expressions are always evaluated
the same way, regardless of the type and kind of the lhs variable. Any
conversions, if necessary, are done just at the final assignment step.

If you want the expression to be evaluated entirely with real64
precision, then you would write it as something like

s = 1.0_real64 / 3.0_real64

There is no need for a conversion step, real(...,kind=real64), in this
assignment because the rhs and the lhs are both real64.

Hopefully, that explains the results that you see, and it shows you how
to change things if you want something different to be done.

Fortran gives the programmer complete control over the precision of
expression evaluation, so if the default evaluation and conversion rules
aren't what you want, you can always change the expression to do the
right thing.

$.02 -Ron Shepard

Lynn McGuire

unread,

Aug 27, 2023, 5:44:26 PM8/27/23

to

Use "x = 1.0d0 / 3.0d0".

The d0 forces the compiler to use double precision for floating point
constant values, I advise using this for all floating point constants.
The default in Fortran is to use single precision. Most Fortran
compilers allow you to change them to default double precision by a
command line switch. BTW, modern C defaults to double precision.

Lynn

Kim Hanjoon

unread,

Aug 27, 2023, 8:45:25 PM8/27/23

to

Thank you! I have never expected fortran's default precision is float not double.

Gary Scott

unread,

Aug 27, 2023, 8:51:39 PM8/27/23

to

On 8/27/2023 7:45 PM, Kim Hanjoon wrote:
> Thank you! I have never expected fortran's default precision is float not double.

I was surprised some other languages don't do the same. In the past,
many systems only supported double precision with software which was
painfully slow. I find single precision perfectly fine for the majority
of common programming uses. But then I've always been aware of the
availability of double precision and on some system beyond double precision.

Giorgio Pastore

unread,

Aug 28, 2023, 5:00:15 AM8/28/23

to

Il 28/08/23 02:51, Gary Scott ha scritto:

One should also take into account that a standard for floating number
started to be used and incorporated into programming languages
relatively late. At the beginning of the eighties it was possible to
find real (float) at 32 bits represented with different ratios between
matissa/exponent, or using 48 or 60 bits.

gah4

unread,

Aug 28, 2023, 9:19:15 AM8/28/23

to

On Sunday, August 27, 2023 at 5:45:25 PM UTC-7, Kim Hanjoon wrote:
> Thank you! I have never expected fortran's default precision is float not double.

DOUBLE PRECISION didn't come until Fortran II, in 1958, on the
vacuum tube powered IBM 704 with up to (and often less than)
32K words of 36 bits each. The first machine with hardware floating
point, and only the 36 bit single precision.

C originated as a systems programming language, for writing things
like OS and compilers. Most often those didn't need much floating
point, but C did supply them. All expressions were evaluated in double
precision, and library routines only existed in double precision.

The C types were, and still are, float and double, which I note in your comment.

With the ANSI C standard in 1989, expressions could be evaluated in single
precision, but constants continued to default to double. A trailing f is used
for single precision constants, and leading f for single precision functions.
That was, then, about 31 years after Fortran originated double precision,
in software on a the first machine with floating point hardware.

The first Fortran standard in 1966 included most, but not all, the features
from IBM's Fortran IV, an upgrade from the Fortran II on the 704.

IBM had upgraded from the 704 to the still vacuum tube powered 709,
and then the transistorized version, the 7090, still with only single
precision hardware.

Double precision in hardware seems to go to the IBM 7094, an upgrade
of the 7090, in 1962.

IBM S/360 in the mid 1960's decreased the word size to 32 bits, with a
single precision 32 bit format, and double precision 64 bit format.
The change from 36 bits to 32 bits mean that much floating point that
was done in 36 bits, instead went to the 64 bit double precision type.

C was internal to Bell labs in 1976, with the book defining it released in 1978.

Note also that Fortran defines a double precision type as twice as big as
the single precision type, but mostly not the size of either one.

In the 1960's, CDC introduced its 60 bit machines, with hardware 60 bit
and software 120 bit double precision floating point. Fortran programmers
commonly used single precision on such machines, but double on the popular,
at about that time, IBM S/360 machines.

The second version of the Fortran standard in 1977 didn't change much of
the way floating point was done, but did introduce a CHARACTER data type
for the first time.

With the Fortran 90 and 95 standards, it was desired to keep as much as
possible compatible with the previous versions, so old programs would still
work the same way they always did.

By this time, programmers (except in CDC installations) were used to
using double precision constants with the D0 suffix. (Or other D
values, as needed.)

So, the default double precision constants originated in the system
programming language C, originally not popular for scientific programming.
It became more popular for scientific programming in the late 1980's,
about the time that the IBM PC with optional 8087 floating point processor,
and not so much later PC/AT with 80286 and optional 80287 floating point
processor were becoming more popular. The 8087 does all arithmetic in 80
bit registers, usually with the full 64 bit significand.

gah4

unread,

Aug 28, 2023, 9:23:16 AM8/28/23

to

For the majority of common use, single precision results are fine, but often
that requires double precision intermediates. Many matrix operations easily
lose precision, when subtracting numbers that are close in value.

But yes, double precision hardware was slow in coming. First in the larger
mainframes, and then later also in the minicomputers that allowed for more
personal computing.

Spiros Bousbouras

unread,

Aug 28, 2023, 9:45:25 AM8/28/23

to

On Mon, 28 Aug 2023 06:19:12 -0700 (PDT)
gah4 <ga...@u.washington.edu> wrote:
> The C types were, and still are, float and double, which I note in your comment.

And long double .

pehache

unread,

Aug 28, 2023, 12:22:17 PM8/28/23

to

Le 28/08/2023 à 02:45, Kim Hanjoon a écrit :
> Thank you! I have never expected fortran's default precision is float not
> double.

The default precision is the one of the REAL type, nothing more. The
standard does not specify the actual size and precision of this type.
Although it is nowadays mapped to the IEEE754 single precision 32 bits
floating point by most (if not all) compilers, it was not the case in the
past. Notably, on the CRAY machines it was mapped to the hardware native
64 bits floating point.

But mapping the REAL type to 64 bits had an important drawback back then :
it was impossible to propose a 32 bits type in a portable way, as the
standard was only requiring the REAL type, and the DOUBLE PRECISION type,
which had to be twice the size of the REAL type.

Ron Shepard

unread,

Aug 28, 2023, 1:20:13 PM8/28/23

to

On 8/28/23 8:19 AM, gah4 wrote:
> The second version of the Fortran standard in 1977 didn't change much of
> the way floating point was done, but did introduce a CHARACTER data type
> for the first time.

This is true regarding the floating point formats themselves, but an
important feature of f77 was the idea of generic functions. Before f77,
for example, the programmer was required to use SIN(X) for REAL argument
X and DSIN(X) for double precision argument X. The same was true for all
of the intrinsic functions ABS/DABS, SQRT/DSQRT, and so on.

At that time, a REAL variable was likely to be any of 24-bits, 32-bits,
36-bits, 48-bits, 60-bits, or 64-bits on a given compiler, so that made
writing portable programs, where the source code could be used
unchanged, difficult. If your application required, say 8 decimal digits
of precision, then on some machines you needed to use double precision
while on other machines real was sufficient. The generic functions
introduced with f77 helped with that portability issue.

However, f77 did not define a generic way to specify literal constants.
A constant written without an exponent or with an E exponent was REAL
(e.g. 3.14 or 3.14E0), while a constant written with a D exponent was
double precision (e.g. 3.14D0).

You will see many workarounds for these limitations in legacy code. One
of the common ones, which I used when applicable, was to define
parameter constants of the appropriate type, but to specify their values
with double precision literal constants.

parameter (PI = 3.1415926535897932384626433832795028841971D0)

No matter how PI is defined, REAL or DOUBLE PRECISION, it always has the
correct value. Another common practice in legacy codes was to avoid the
REAL and DOUBLE PRECISION declarations and to use the nonstandard REAL*8
declaration instead.

REAL*8 PI

Compilers would then typically map this to either REAL or DOUBLE
PRECISION as appropriate, meaning 48-bit, 60-bit, 64-bit, or 72-bit
floating point. With these two conventions, the programmer could write
portable code that would compile unchanged and run correctly on a wide
variety of hardware and software combinations.

> With the Fortran 90 and 95 standards, it was desired to keep as much as
> possible compatible with the previous versions, so old programs would still
> work the same way they always did.

F90 kept all of these old conventions, but it introduced the new KIND
system. This is one of the nicest features of modern fortran. It allows
portable code to be written in a standard way, and it is open ended to
allow new floating point KIND values to be incorporated in the future.
For example, there are compilers that support 16-bit floating point,
80-bit floating point, and 128-bit floating point, in addition to the
traditional 32-bit REAL and 64-bit DOUBLE PRECISION KINDs. The fortran
KIND system also allows decimal floating point KINDs to be supported,
although none of the compilers I use right now are doing this.

F77 allowed intrinsic procedures to be generic, but it did not allow
programmers to write generic procedures. F90 also allowed this. This is
a nice feature when writing library code and when an application
required mixed-precision and multiple-precision floating point computations.

$.02 -Ron Shepard

Gary Scott

unread,

Aug 28, 2023, 1:59:48 PM8/28/23

to

or 96 bits :)

<snip>

gah4

unread,

Aug 28, 2023, 2:19:58 PM8/28/23

to

On Monday, August 28, 2023 at 9:22:17 AM UTC-7, pehache wrote:

(snip)

> The default precision is the one of the REAL type, nothing more. The
> standard does not specify the actual size and precision of this type.
> Although it is nowadays mapped to the IEEE754 single precision 32 bits
> floating point by most (if not all) compilers, it was not the case in the
> past. Notably, on the CRAY machines it was mapped to the hardware native
> 64 bits floating point.

Portable programs in those days would have sets of declarations at the
top, all except the one in use commented out.

Besides those needed for the different bit widths, some considered
finer details of the floating point. For one, there is epsilon, which
depends on the significand size. Or depending on the rounding modes.

robin vowels

unread,

Aug 30, 2023, 2:02:37 AM8/30/23

to

On Sunday, 27 August 2023 at 23:52:41 UTC+10, Kim Hanjoon wrote:
> Hi, I'm new at Fortran. I'm studying numerical analysis and I got question about real number precision
>
> program test
> real*8 x

.
You need:
double precision x
to be standard conforming.
>
> x = 1.0/3.0
.
You have used single-precision constants. You need to write
x = 1d0 / 3d0
.

jfh

unread,

Aug 30, 2023, 5:43:20 PM8/30/23

to

Double precision x was the only way to be standard-conforming in Fortran 77. Many of us would now use something like

integer, parameter :: dp = kind(1d0)
real(dp) x

robin vowels

unread,

Aug 30, 2023, 10:50:02 PM8/30/23

to

.
The OP is beginning in Fortran.
It makes sense to tell him something straightforward and simple..
I have been using what you suggest from F90 days.

Clive Page

unread,

Aug 31, 2023, 7:03:09 AM8/31/23

to

That is absolutely right. But what might be worth saying is that traditionally Fortran Standards have resolutely refused to specify the precision of arithmetic or the number of bits of storage. Almost since the start one could use REAL or DOUBLE PRECISION types, the latter now deprecated of course, but without being able to specify what they meant until fairly recently with functions like SELECTED_REAL_KIND. This used to be a serious problem. Some years ago I got code working on both a mini-computer and a mainframe where the DOUBLE PRECISION on the mini-computer had fewer digits than the REAL on the mainframe. And this was pefectly legal in the Fortran of the time, and indeed still would be if the hardware still existed.

What has changed is the introduction of the iso_fortran_env intrinsic module which means that Fortran programmers can finally catch up with the rest of the computing world, since pretty much all current processors offer the choice of 32-bits or 64-bits for their floating-point storage and arithmetic (plus other options in some cases). But the syntax for using them, as Ron Shepard showed above, is not very obvious ane really a bit verbose. It is only properly explained in a very few modern Fortran texts. Many old texts will mislead you badly.

It would be nice if we could get the next Fortran Standard written so that 32-bit and 64-bit arithmetic come by default, but I fear that this would break too much existing code for it to be possible.

--
Clive Page

gah4

unread,

Aug 31, 2023, 8:40:56 AM8/31/23

to

On Thursday, August 31, 2023 at 4:03:09 AM UTC-7, Clive Page wrote:

(snip)

> That is absolutely right. But what might be worth saying is that traditionally Fortran
> Standards have resolutely refused to specify the precision of arithmetic or the
> number of bits of storage. Almost since the start one could use REAL or
> DOUBLE PRECISION types, the latter now deprecated of course, but without being
> able to specify what they meant until fairly recently with functions like
> SELECTED_REAL_KIND. This used to be a serious problem.
> Some years ago I got code working on both a mini-computer and a mainframe
> where the DOUBLE PRECISION on the mini-computer had fewer digits than the
> REAL on the mainframe.

(snip)

> It would be nice if we could get the next Fortran Standard written so that 32-bit
> and 64-bit arithmetic come by default, but I fear that this would break too much
> existing code for it to be possible.

From the beginning, or close to it, it was 36 and 72 bits.

But yes, the CDC 60 bit machines, and then Cray 64 bit machines, did complicate things.

I believe that they had to support software double precision to satisfy the standard,
even if nobody used it.

I have noticed, though, the for IEEE 754 decimal, the basic types are the 64 and 128 bit types.
I think that means that a Fortran compiler should support them as REAL and DOUBLE
PRECISION. Not that it wouldn't support others, though.

It does seem that hardware support of 128 bit binary types has been slow at arriving.

IBM supported 128 bits (in hexadecimal) from the 360/85 and all S/370 models.
And then ESA/390 added IEEE 754 binary types, they had 32, 64, and 128.

Also RISC-V has a 128 bit format, though I don't know which actual processors
that you can buy support it.

robin vowels

unread,

Aug 31, 2023, 9:18:03 AM8/31/23

to

On Thursday, 31 August 2023 at 22:40:56 UTC+10, gah4 wrote:
> On Thursday, August 31, 2023 at 4:03:09 AM UTC-7, Clive Page wrote:
>
> (snip)
> > That is absolutely right. But what might be worth saying is that traditionally Fortran
> > Standards have resolutely refused to specify the precision of arithmetic or the
> > number of bits of storage. Almost since the start one could use REAL or
> > DOUBLE PRECISION types, the latter now deprecated of course, but without being
> > able to specify what they meant until fairly recently with functions like
> > SELECTED_REAL_KIND. This used to be a serious problem.
> > Some years ago I got code working on both a mini-computer and a mainframe
> > where the DOUBLE PRECISION on the mini-computer had fewer digits than the
> > REAL on the mainframe.
> (snip)
> > It would be nice if we could get the next Fortran Standard written so that 32-bit
> > and 64-bit arithmetic come by default, but I fear that this would break too much
> > existing code for it to be possible.
.
> From the beginning, or close to it, it was 36 and 72 bits.

.
From the beginning, it was 32 bit words.
Pilot ACE (1950) and DEUCE (1955) were 32/64 bit machines,.
'

> But yes, the CDC 60 bit machines, and then Cray 64 bit machines, did complicate things.

..
There were also 48-bit machines.

Ron Shepard

unread,

Aug 31, 2023, 12:53:40 PM8/31/23

to

On 8/31/23 7:40 AM, gah4 wrote:
> I believe that they had to support software double precision to satisfy the standard,
> even if nobody used it.

The ANSI f77 standard book was published so that when you opened a page,
the left hand side of the book was the standard subset, and the right
hand side page was the full standard. If I remember correctly, the
standard subset did not require support for double precision, complex,
or character types. I used a fortran compiler in the early 1980s that
supported complex and character but not double precision. Its REAL type
was 64-bit, so as you say, most programmers did not really need 128-bit
double precision. A subsequent compiler revision did eventually support
128-bit double precision using a mix of hardware and software, so it was
slow.

Nowadays, the de facto standard is that REAL is REAL32, which then means
that REAL64 is DOUBLE PRECISION. I do not know of any compiler in use
these days that contradicts that convention (unless compiler options are
specified). I doubt that the standard will incorporate that convention
due to the backwards compatibility issue and to the possibility that
someone might still want to write a modern fortran compiler for legacy
36-bit, 60-bit, or 64-bit hardware.

$.02 -Ron Shepard

Thomas Koenig

unread,

Sep 1, 2023, 12:16:35 PM9/1/23

to

Clive Page <use...@page2.eu> schrieb:

> That is absolutely right. But what might be worth saying is that
> traditionally Fortran Standards have resolutely refused to specify
> the precision of arithmetic or the number of bits of storage.
> Almost since the start one could use REAL or DOUBLE PRECISION
> types, the latter now deprecated of course, but without being able
> to specify what they meant until fairly recently with functions
> like SELECTED_REAL_KIND.

"Fairly recently" meaning 1991, the year that Fortran 90 came out.
It's only been a bit more than 30 years.

By comparision, Java was released in 1996.

Gary Scott

unread,

Sep 1, 2023, 12:33:15 PM9/1/23

to

1965 seems fairly recent to me, I remember tooling around in grandpas
1956 pontiac star chief :)

gah4

unread,

Sep 1, 2023, 5:27:02 PM9/1/23

to

On Friday, September 1, 2023 at 9:16:35 AM UTC-7, Thomas Koenig wrote:
> Clive Page <use...@page2.eu> schrieb:

(snip)

> "Fairly recently" meaning 1991, the year that Fortran 90 came out.
> It's only been a bit more than 30 years.

> By comparision, Java was released in 1996.

Java not only specifies the size of the data types, but requires floating point to be IEEE 754 binary.

Rumor is that IBM added BFP (IEEE 754 binary) to ESA/390 so that it could run Java.

gah4

unread,

Sep 1, 2023, 5:31:06 PM9/1/23

to

On Friday, September 1, 2023 at 9:16:35 AM UTC-7, Thomas Koenig wrote:
> Clive Page <use...@page2.eu> schrieb:

(snip)

> "Fairly recently" meaning 1991, the year that Fortran 90 came out.
> It's only been a bit more than 30 years.

> By comparision, Java was released in 1996.

My first Fortran programs were in 1972. Doesn't seem so long ago.

The IBM OS/360 Fortran manual was my 8th grade graduation present.