READ-statement and a segmentation fault

rncdnet

unread,

Oct 12, 2006, 8:32:48 AM10/12/06

to

Hello,

I have a problem with a "standard Fortran-77" code (written in July,
1997).
This code is given in

ftp://ssd.jpl.nasa.gov/pub/eph/export/fortran/testeph.f

and tests the calculations of positions and velocities of major planets
in our Solar System.

The code needs as an input the 'binary' file (for example)

ftp://ssd.jpl.nasa.gov/pub/eph/export/test-data/testpo.405

According to the description at the beginning I changed the following
variables to

NRECL=4
NAMFIL='testpo.405'
KSIZE = 2036

and chose the 'FSIZER3'-subroutine.
But after successful compiling with gfortran

[
===> gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --enable-languages=c,fortran
--prefix=/projects/tob/gcc-trunk
Thread model: posix
gcc version 4.2.0 20060703 (experimental)
]

it stops with a segmentation fault when executing.
The problem seems to be the line 884
'READ(NRFILE,REC=2)CVAL'

I checked the values of the variables given, all seems as usual.
Even, after changing the REC-variable to REC=1 the problem hasn't gone.

The same problem occurs for the other FSIZER-subroutines.

I don't understand what the problem is.
Any hint is appreciated,

rncdnet

Steven G. Kargl

unread,

Oct 12, 2006, 1:18:37 PM10/12/06

to

In article <1160656368.1...@b28g2000cwb.googlegroups.com>,

"rncdnet" <rnc...@netscape.net> writes:
>
> I have a problem with a "standard Fortran-77" code (written in July,
> 1997).
> This code is given in
>
> ftp://ssd.jpl.nasa.gov/pub/eph/export/fortran/testeph.f
>
> and tests the calculations of positions and velocities of major planets
> in our Solar System.
>
> The code needs as an input the 'binary' file (for example)
>
> ftp://ssd.jpl.nasa.gov/pub/eph/export/test-data/testpo.405
>
>
> According to the description at the beginning I changed the following
> variables to
>
> NRECL=4
> NAMFIL='testpo.405'
> KSIZE = 2036
>
> and chose the 'FSIZER3'-subroutine.
> But after successful compiling with gfortran
>

> it stops with a segmentation fault when executing.
> The problem seems to be the line 884
> 'READ(NRFILE,REC=2)CVAL'
>
> I checked the values of the variables given, all seems as usual.
> Even, after changing the REC-variable to REC=1 the problem hasn't gone.
>
> The same problem occurs for the other FSIZER-subroutines.
>
> I don't understand what the problem is.
> Any hint is appreciated,

How is the 'binary' data file generated (ie., by some other code
compiled with an old Fortran compiler)? I'd guess that you're
getting hit by gfortran's 64-bit recorder marker whereas many/most
non-modern compilers use a 32-bit recorder marker.

--
Steve
http://troutmask.apl.washington.edu/~kargl/

glen herrmannsfeldt

unread,

Oct 12, 2006, 1:47:43 PM10/12/06

to

rncdnet <rnc...@netscape.net> wrote:

> I have a problem with a "standard Fortran-77" code (written in July,
> 1997).
> This code is given in
> ftp://ssd.jpl.nasa.gov/pub/eph/export/fortran/testeph.f

(snip)

> The code needs as an input the 'binary' file (for example)

Reading binary files, (UNFORMATTED from the example) is, in general
not portable between systems or compilers.

-- glen

Richard E Maine

unread,

Oct 12, 2006, 1:47:52 PM10/12/06

to

Steven G. Kargl <ka...@troutmask.apl.washington.edu> wrote:

> How is the 'binary' data file generated (ie., by some other code
> compiled with an old Fortran compiler)? I'd guess that you're
> getting hit by gfortran's 64-bit recorder marker whereas many/most
> non-modern compilers use a 32-bit recorder marker.

I predict that lots of people will get hit by that one. I don't like
gfortran's choice of a default there. I know that it would be a problem
for users of my codes with gfortran. And no, having a compiler switch
doesn't really solve it as long as the default is the "wrong" way.
That's because the codes include libraries that are used in many other
applications. Making sure that the users always compile the libraries
with the needed compiler switches would be an issue. It isn't
insurmountable, but is an issue that will make gfortran stand out as
needing special attention for my users.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain

Jan Vorbrüggen

unread,

Oct 13, 2006, 4:06:56 AM10/13/06

to

> How is the 'binary' data file generated (ie., by some other code
> compiled with an old Fortran compiler)? I'd guess that you're
> getting hit by gfortran's 64-bit recorder marker whereas many/most
> non-modern compilers use a 32-bit recorder marker.

For such incompatible changes, the DEC compiler RTLs tended to have an
elaborate set of ways of setting such parameters. You could specify the
default at compile time, use an environment variable to set or override,
if you will, at run time, and in at least one case override again on a
LUN-specific basis, again per environment variable.

Is there an extension to OPEN to set the size of the record marker in gfortran?

Jan

Steven G. Kargl

unread,

Oct 13, 2006, 1:02:00 PM10/13/06

to

In article <4p8vnfF...@individual.net>,

There is a -frecord-marker option, but IIRC it has some issues

`-frecord-marker=LENGTH'
Specify the length of record markers for unformatted files. Valid
values for LENGTH are 4 and 8. Default is whatever `off_t' is
specified to be on that particular system. Note that specifying
LENGTH as 4 limits the record length of unformatted files to 2 GB.
This option does not extend the maximum possible record length on
systems where `off_t' is a four_byte quantity.

A patch was posted a few months ago. It hasn't made its way
into the code base (due to copyright assignment issues and
the patch submitter relocation). A fairly long thread
concerning the issues starts at

http://gcc.gnu.org/ml/fortran/2006-07/msg00221.html

The technical discussion that followed was trying to get everyone
to agree on the best approach to "fixing" the incompatibility.

--
Steve
http://troutmask.apl.washington.edu/~kargl/

glen herrmannsfeldt

unread,

Oct 13, 2006, 1:58:52 PM10/13/06

to

Jan Vorbr?ggen <jvorbr...@not-mediasec.de> wrote:
>> How is the 'binary' data file generated (ie., by some other code
>> compiled with an old Fortran compiler)? I'd guess that you're
>> getting hit by gfortran's 64-bit recorder marker whereas many/most
>> non-modern compilers use a 32-bit recorder marker.

> For such incompatible changes, the DEC compiler RTLs tended to have an
> elaborate set of ways of setting such parameters. You could specify the
> default at compile time, use an environment variable to set or override,
> if you will, at run time, and in at least one case override again on a
> LUN-specific basis, again per environment variable.

A better choice would have been to put an identifying marker at
the beginning of the new file format, and switch to the old format
if that marker wasn't there. In this case, a very large offset would
have a very small chance of accidentally appearing.

-- glen

Steve Lionel

unread,

Oct 13, 2006, 2:38:17 PM10/13/06

to

glen herrmannsfeldt wrote:

> Jan Vorbr?ggen <jvorbr...@not-mediasec.de> wrote:
> > For such incompatible changes, the DEC compiler RTLs tended to have an
> > elaborate set of ways of setting such parameters. You could specify the
> > default at compile time, use an environment variable to set or override,
> > if you will, at run time, and in at least one case override again on a
> > LUN-specific basis, again per environment variable.
>
> A better choice would have been to put an identifying marker at
> the beginning of the new file format, and switch to the old format
> if that marker wasn't there. In this case, a very large offset would
> have a very small chance of accidentally appearing.

While it is true that the DEC compilers (and much more so the Intel
compilers) have a variety of environment variables for adjusting
semantics for I/O, there is no such control for 32 vs. 64 bit record
lengths in unformatted files. Instead, we use a record structure that
uses 32-bit lengths as long as the length is less than 2GB, and an
alternate format for longer records. This preserves compatibility with
most existing implementations while still allowing longer records
without having to "flag" the file first or deciding based on the
platform.

The method was suggested to us by Bob Corbett of Sun and we implemented
it about 7-8 years ago. I think Bob had hopes that Sun would also
implement it, but last I checked with him, they hadn't. At the time we
did this, no other Fortran compiler we knew of supported 64-bit-length
records (and we asked many vendors). I was somewhat disappointed a few
years later to see that straight 64-bit lengths were used by the GNU
compilers as this was incompatible with standard practice for all
files.

Here's some text from our documentation that describes the method we
use.

"For a record length greater than 2,147,483,639 bytes, the record is
divided into subrecords. The subrecord can be of any length from 1 to
2,147,483,639, inclusive.

"The sign bit of the leading length field indicates whether the record
is continued or not. The sign bit of the trailing length field
indicates the presence of a preceding subrecord. The position of the
sign bit is determined by the endian format of the file.

"A subrecord that is continued has a leading length field with a sign
bit value of 1. The last subrecord that makes up a record has a leading
length field with a sign bit value of 0. A subrecord that has a
preceding subrecord has a trailing length field with a sign bit value
of 1. The first subrecord that makes up a record has a trailing length
field with a sign bit value of 0."

Steve Lionel
Developer Products Division
Intel Corporation
Nashua, NH

User communities for Intel Software Development Products
http://softwareforums.intel.com/
Intel Fortran Support
http://developer.intel.com/software/products/support/
My Fortran blog
http://www.intel.com/software/drfortran

glen herrmannsfeldt

unread,

Oct 13, 2006, 3:05:59 PM10/13/06

to

Steve Lionel <steve....@intel.com> wrote:

> The method was suggested to us by Bob Corbett of Sun and we implemented
> it about 7-8 years ago. I think Bob had hopes that Sun would also
> implement it, but last I checked with him, they hadn't. At the time we
> did this, no other Fortran compiler we knew of supported 64-bit-length
> records (and we asked many vendors). I was somewhat disappointed a few
> years later to see that straight 64-bit lengths were used by the GNU
> compilers as this was incompatible with standard practice for all
> files.

Since Sun wrote the standard (RFC) for record marking in TCP
(needed to do, for example, NFS over TCP), I thought
they might use that. It is similar to the one you mention,
but I think not exactly the same.

Personally, I don't see much need for records that long.
Files, containing multiple records, should be able to be
over 2G though.

If one wants to write out a large array with UNFORMATTED
I/O it isn't hard to split it into multiple records,
and depending on buffering, might be faster.

-- glen

Gary Scott

unread,

Oct 13, 2006, 9:20:58 PM10/13/06

to

I would have prefixed the length field with a value indicating the
number of bytes in the length field (plus perhaps some additional stuff
that could be used for record locking independent of OS support).

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

Steve Lionel

unread,

Oct 13, 2006, 9:34:25 PM10/13/06

to

Gary Scott wrote:

> I would have prefixed the length field with a value indicating the
> number of bytes in the length field (plus perhaps some additional stuff
> that could be used for record locking independent of OS support).

But that would make the files non-interchangeable with other
implementations that simply used a 32-bit length. The goal was to
preserve the maximum compatibility (in both directions) while still
allowing very large records. If for "normal" record lengths you do
anything that doesn't look like a 32-bit length, you've broken
compatibility. As all commercial compiler vendors will tell you, the
last thing you want to do is prevent a user of some other compiler from
using yours - that extends to data files as well as source files. If
on the other hand you're building a compiler just for the fun of it,
then you're free to be "elegant".

Steve

Gary Scott

unread,

Oct 13, 2006, 10:19:28 PM10/13/06

to

Steve Lionel wrote:

I was thinking more in terms of an original design. Once you've
implemented the simple 32-bit integer you're pretty much stuck with it.
I wished for more thought going into the original design. Besides,
having file content delineate record breaks isn't portable to several
operating systems (e.g. MVS, VM, IMS, VSE and others). But they can
still read them with only a little effort.

glen herrmannsfeldt

unread,

Oct 13, 2006, 11:28:50 PM10/13/06

to

Gary Scott wrote:
(snip)

> I would have prefixed the length field with a value indicating the
> number of bytes in the length field (plus perhaps some additional stuff
> that could be used for record locking independent of OS support).

If you really don't know the length of a field, the best way is
supposed to be to first put the number of zero bits equal to
the length of the field, followed by the appropriate number of bits.
There needs to be a special code for zero.

In this case, though, a byte should be enough for a long time.

Many of the MVS file formats still have a 32767 LRECL limit
which doesn't seem to have been much of a problem, as people still
seem to buy the systems.

-- glen

Richard E Maine

unread,

Oct 14, 2006, 12:08:19 AM10/14/06

to

Steve Lionel <steve....@intel.com> wrote:

> Gary Scott wrote:
>
> > I would have prefixed the length field with a value indicating the
> > number of bytes in the length field (plus perhaps some additional stuff
> > that could be used for record locking independent of OS support).
>
> But that would make the files non-interchangeable with other
> implementations that simply used a 32-bit length. The goal was to
> preserve the maximum compatibility (in both directions) while still
> allowing very large records.

Yes. To me, compatibility is essentially the whole issue here. In the
abstract, just using 64-bit length fields ought to be just fine. But the
world isn't that abstract. That's a lot like thinking that the issues of
array extents larger than 2 billion can be solved just by making default
real/integer size 64 bits; I used to think that was the "obvious"
solution. I still think it would be nice, and my codes would deal with
it fine. But it isn't likely to be the dominant thing at least in the
near future. Anyway, redesigning unformattted file structure from
scratch is an exercise in pointlessness, at least if one expects to sell
compilers. You need compatibility with both existing files and with
other compilers. Today, that means 32-bit headers. The solution that
Steve mentioned, or variants on it, seems fine with me. That addresses
both compatibility with existing files and large-record support.

Yes, Glenn, there are people who want/need/whatever records longer than
2 gb. Vendors need to be able to support those people also. In fact,
since those people are often ones with large budgets... well, I'm sure
you can fill in the rest there - support for them is important.

In my book, anything that depends on source code changes is going to
seriously hurt a vendor. Things that depend on compiler switches will
also hurt, although not as much. I'd really hope to be able to just take
existing programs and run them as is, with no special compiler switches,
and certainly no source code changes, and still be able to use existing
files. If I can't do that, I'm not going to be happy... not so much
because I couldn't deal with it for myself, but because users of my
codes would have to deal with making sure that they got the right source
code edits or compiler switches.

Yes, I know that depending on unformatted file compatibility is a "bad
idea" and has "issues". I tend to avoid that in new file specifications.
But sometimes my codes have to deal with existing requirements. So I do
what I have to do. I *KNOW* I'm not the only one in this position; I
have rather a lot of company, I'm sure. Compilers that make it harder
for me... well it isn't a make-or-break issue, but it goes into my
overall impression of how much I like and recommend the compiler.

robert....@sun.com

unread,

Oct 14, 2006, 1:06:19 AM10/14/06

to

Steve Lionel wrote:

> The method was suggested to us by Bob Corbett of Sun and we implemented
> it about 7-8 years ago. I think Bob had hopes that Sun would also
> implement it, but last I checked with him, they hadn't.

It was one of three proposals I presented to the product team at Sun
responsible for Fortran, and that I subsequently posted to c.l.f. to
solicit outside opinions. I learned never to present a committee more
than two alternatives. Had there been only two proposals, it is likely
that there would have been a majority for one proposal. Given three
proposals, deadlock was possible. In this case, deadlock was achieved.

I would have been happy if any of the proposals had been approved.

Bob Corbett

robert....@sun.com

unread,

Oct 14, 2006, 1:28:58 AM10/14/06

to

glen herrmannsfeldt wrote:

> Since Sun wrote the standard (RFC) for record marking in TCP
> (needed to do, for example, NFS over TCP), I thought
> they might use that. It is similar to the one you mention,
> but I think not exactly the same.

The sense of the bits is flipped. Also, the network records
have headers, but not trailers. Fortran needs trailers to
support backspacing efficiently.

I know of file systems that used similar encodings for
variable-length records before Sun Microsystems even
existed. The only thing new in my proposal was the
somewhat obvious idea of applying the same principle
used for the headers to the trailers.

Bob Corbett

robert....@sun.com

unread,

Oct 14, 2006, 2:50:37 AM10/14/06

to

Gary Scott wrote:

> I would have prefixed the length field with a value indicating the
> number of bytes in the length field (plus perhaps some additional stuff
> that could be used for record locking independent of OS support).

One of the other proposals we considered at Sun worked a little
like that. If the sign bit of the record header was on, the header
would contain a "flag" value that indicated the length of the rest
of the header.

What killed support for that approach was that we had we wanted
to allow large records to be written to devices and pseudo-devices
that did not allow seeks. Using that representation would, in
some cases, have required buffering the entire record before
writing it so that the length could be included as part of the header.
Multi-gigabyte records might have to be buffered in temporary files
to avoid hitting memory limits.

One of the problems in implementing Fortran I/O is that the size
of a record to be written cannot always be determined before the
values to be written have been computed. Using a representation
that includes subrecords allows records to be written a piece at a
time. Some of the file structures used for magnetic tapes supported
"segmented" records, which provide another way of encoding large
records.

Bob Corbett

Jan Vorbrüggen

unread,

Oct 16, 2006, 5:35:47 AM10/16/06

to

> "For a record length greater than 2,147,483,639 bytes, the record is
> divided into subrecords. The subrecord can be of any length from 1 to
> 2,147,483,639, inclusive.
>
> "The sign bit of the leading length field indicates whether the record
> is continued or not. The sign bit of the trailing length field
> indicates the presence of a preceding subrecord. The position of the
> sign bit is determined by the endian format of the file.

Ah! That's a nice idea. It does assume that other implementations don't see
the record length marker as an unsigned integer...but that is perhaps so.
I do think 2GB per I/O should be enough for the forseeable future.

And to reply to Glen (IIRC) about the desireablilty of larger records: Yes,
you can do the subrecords yourself. But there's nothing like having a big
data structure in a module, say, with a FIRST and a LAST variable, and then
all you have to do is dump the thing to a file to save your program's state.
Easy and unlikely to miss something important.

Jan

Thomas Koenig

unread,

Oct 17, 2006, 5:06:02 PM10/17/06

to

<robert....@sun.com> wrote:

>One of the problems in implementing Fortran I/O is that the size
>of a record to be written cannot always be determined before the
>values to be written have been computed.

Does that refer to formatted and/or unformatted I/O? Can you give
an example?

wclo...@lanl.gov

unread,

Oct 17, 2006, 6:12:50 PM10/17/06

to

robert....@sun.com wrote:
> glen herrmannsfeldt wrote:
>
> > Since Sun wrote the standard (RFC) for record marking in TCP
> > (needed to do, for example, NFS over TCP), I thought
> > they might use that. It is similar to the one you mention,
> > but I think not exactly the same.
>
> The sense of the bits is flipped. Also, the network records
> have headers, but not trailers. Fortran needs trailers to
> support backspacing efficiently.

Not quite. The first ATT Unix Fortran compiler used the header/trailer
approach, and subsequent systems have adopted it for mutual
compatibility, but I see no intrinsic reason it should provide higher
performance.

The run time library could, for example, use a linked list (or a large
buffer) to keep all the headers in memory, or a single value to keep
track of the most recent header value and reread the headers backwards.
Although the programming is a bit trickier in these approaches, memory
system latency would still dominate performance and the simplified file
structure would save disk memory.

As another example the operating system could keep file structure
information in a hidden file such as in Mac OS's (prior to OS X?).
However, historically Unix system have gone for simple file/directory
structures, and the hidden file descriptor approach would not be an
easy fit to that approach.

glen herrmannsfeldt

unread,

Oct 17, 2006, 7:11:31 PM10/17/06

to

wclo...@lanl.gov wrote:
(snip on file formats for UNFORMATTED data)

> Not quite. The first ATT Unix Fortran compiler used the header/trailer
> approach, and subsequent systems have adopted it for mutual
> compatibility, but I see no intrinsic reason it should provide higher
> performance.

I could argue that, on average, it lowers performance. It increases
the I/O traffic by a small amount, but relative to the use of
BACKSPACE it is probably large. A small cache of the most recent
record positions would likely cover a large fraction of the small
number of times that BACKSPACE is used. The remainder would have
to read from the beginning. (Note also that many older systems,
including OS/360, support reading tapes backwards. Fortran doesn't
support this.)

The file system designed for OS/360 to support UNFORMATTED
has disk hardware that knows about block boundaries,
and record descriptors that allow up to 32763 byte records.
They don't have trailers, but at worst you read the
previous block and search forward until you find the right
record. (I believe RECFM=VBS was designed for Fortran.)

As far as I know, 32763 (32767-4 for the descriptor)
is still the maximum for z/OS VBS files used for Fortran
UNFORMATTED I/O. As one of the more expensive operating
systems, (I can't afford it) you can see why I am not
so convinced of the need for >2GB records.

-- glen

Richard Maine

unread,

Oct 17, 2006, 7:35:02 PM10/17/06

to

Thomas Koenig <Thomas...@online.de> wrote:

Not worthy my time to write out examples, but...

It can happen for either. For formatted, it is particularly trivial.
Non-advancing output can do that. Also, field widths (and thus record
sizes) can be data-dependent in many ways. Write an integer with an I0
format, the width of which depends on the data... or several other
similar data dependencies.

Even for unformatted, there can be data dependencies. Character data is
an obvious case. Write a trim(some_character_expression). You have to
evaluate the expression in order to see how long the output item is
going to be.

Implied DOs can cause all kinds of incredibly complicated behavior,
particularly if you mix them with function references. You would not
believe how messy the edge cases of that can get. Throw in some
zero-trip-count implied DOs for extra "fun". SOme of the interp
questions that can raise took multiple revisions over quite a few years
to get consistent answers for.

Or speaking of function references, just writing f(x), where f is a
function that returns a dynamically-sized array, is a trivial example.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

robert....@sun.com

unread,

Oct 19, 2006, 3:02:27 AM10/19/06

to

Both. Because Solaris, Linux, MacOS and some other OSs
use one or more control characters to indicate the end of a
line of text and because Fortran implementations for those
systems encode formatted records as lines of text, there is
no need to know the length of a formatted record in advance.
The reason there is a problem for unformatted records is that
the record encoding for unformatted records used by many
implementations includes a header that contains the length
of the record.

> Can you give an example?

Yes.

The functions TRIM and LEN_TRIM are key to the simple
examples. Consider the following main program:

PROGRAM MAIN
CHARACTER*133 SFUNC()
EXTERNAL SFUNC
INTRINSIC TRIM
INTEGER I, N

READ *, N
OPEN(10, FILE='FILNAM', ACCESS='SEQUENTIAL', FORM='UNFORMATTED)
WRITE (10) (TRIM(SFUNC(I)), I = 1, N)
CLOSE (10)
END

The length of the record to be written depends the values
returned by the function references. Until the last function
reference has been evaluated, the length of the record is
unknown.

The external function SFUNC might or might not have side
effects. If it does have side effects, the side effects cannot
affect the "evaluation" of the function references involving
SFUNC executed by the WRITE statement, but they could,
for example, alter the values of variables in common blocks.

Some members of J3 hold that a Fortran processor is free to
call a function with side effects more times than is required
by a naive interpretation of the program. If that viewpoint is
accepted, the WRITE statement could be implemented by
first evaluating all the function references once to get the
length of the record and then evaluating the function
references a second time to obtain the values to be written.
I don't think any implementor has ever tried anything like
that.

Thank you for asking your important questions.

Bob Corbett

Arjen Markus

unread,

Oct 19, 2006, 4:04:58 AM10/19/06

to

robert....@sun.com schreef:

>
> Some members of J3 hold that a Fortran processor is free to
> call a function with side effects more times than is required
> by a naive interpretation of the program. If that viewpoint is
> accepted, the WRITE statement could be implemented by
> first evaluating all the function references once to get the
> length of the record and then evaluating the function
> references a second time to obtain the values to be written.
> I don't think any implementor has ever tried anything like
> that.
>
> Thank you for asking your important questions.
>

But that would mean the side effects take place twice!

Suppose SFUNC(I) is something like:

function SFUNC(I) result(r)
integer:: i
character(len=i) :: r
integer, save :: a = 0
r = char(a)
a = a + 1
end function

The length of the result will of course vary but also the content.
The first round of evaluations will enable the program to determine
the length of the output, but then the second round will have
completely different results.

Regards,

Arjen

Richard Maine

unread,

Oct 19, 2006, 11:27:40 AM10/19/06

to

<robert....@sun.com> wrote:

> Some members of J3 hold that a Fortran processor is free to
> call a function with side effects more times than is required
> by a naive interpretation of the program.

More times? Wow! I thought I was among the more extreme in my
interpretation of what the standard allowed the compiler to do with
functions. I think even I wouId balk at that one. There is enough
variation of opinion in the general area that you might be right, but I
half suspect that you might have read someone's extrapolation about the
limit of the implications of a opinion that they disagreed with.

For example, I have seen people suggest that

x = some_expression

is mathematically equivalent to

x = some_expression + 0.0*some_function(stuff)

which is to say that the compiler could randomly insert calls to pretty
much any function pretty much anywhere, as long as the result was
multiplied by 0, or otherwise didn't change the value of the expression.
That's one case of calling a function more times than specified - one
time instead of zero.

Although I've seen that example cited, I don't think anyone on the
committee seriously believes that one is allowed. I suppose someone
might believe that wording of the standard technically allows it, but
that "everyone" knows that isn't what it really means. Or maybe I've
just completely misunderstood someone's position - that can happen.

I suppose I would accept an argument in some cases based on a particular
code being illegal and thus alowing the processor to do anything with
it. That wouldn't apply to all cases - there would basically have to be
something that I'd claim would make the code illegal; some cases of side
effects could do that.

robert....@sun.com

unread,

Oct 19, 2006, 9:00:10 PM10/19/06

to

Arjen Markus wrote:
> robert....@sun.com schreef:

> But that would mean the side effects take place twice!
>
> Suppose SFUNC(I) is something like:
>
> function SFUNC(I) result(r)
> integer:: i
> character(len=i) :: r
> integer, save :: a = 0
> r = char(a)
> a = a + 1
> end function
>
> The length of the result will of course vary but also the content.
> The first round of evaluations will enable the program to determine
> the length of the output, but then the second round will have
> completely different results.

That particular function is allowed in the case of the example I gave
only if N < 2. If two or more function references are evaluated, the
function references' "evaluation" affect and are affected by the other
function references, and so are not allowed by the Fortran standard.

Bob Corbett

robert....@sun.com

unread,

Oct 19, 2006, 9:29:39 PM10/19/06

to

Richard Maine wrote:

> For example, I have seen people suggest that
>
> x = some_expression
>
> is mathematically equivalent to
>
> x = some_expression + 0.0*some_function(stuff)
>
> which is to say that the compiler could randomly insert calls to pretty
> much any function pretty much anywhere, as long as the result was
> multiplied by 0, or otherwise didn't change the value of the expression.
> That's one case of calling a function more times than specified - one
> time instead of zero.

I too have seen that example.

I am not sure, but I think there was an interpretation that ruled that
the extra calls are allowed. IIRC, the example involved a substring
operation, something like STR(IFOO():N). In a particular
implementation, the function reference was evaluated twice, once
to establish the index of the starting character and once to establish
the length of the resulting character value. I admit that my memory
is fuzzy on this one, but I think it was ruled to be confoming.

There are cases, such as specification expressions in interface blocks
and subprograms, where implementations routinely evaluate the
expressions both in the caller and the callee. In those cases, the
routines are required to be pure, and so there should be no side
effects.

> Although I've seen that example cited, I don't think anyone on the
> committee seriously believes that one is allowed. I suppose someone
> might believe that wording of the standard technically allows it, but
> that "everyone" knows that isn't what it really means. Or maybe I've
> just completely misunderstood someone's position - that can happen.

You might recall that during the public review of CD-1 of the Fortran
2003
standard, I argued for putting additional restrictions on what the
"mathematical equivalence" rule allows. One of the specific
restrictions
I proposed was to prohibit evaluating function references where the
function involved is not pure more than once. As anyone who reads
the standard can see, my proposal was not adopted.

Bob Corbett

Dick Hendrickson

unread,

Oct 19, 2006, 10:53:54 PM10/19/06

to

robert....@sun.com wrote:
> Richard Maine wrote:
>
>> For example, I have seen people suggest that
>>
>> x = some_expression
>>
>> is mathematically equivalent to
>>
>> x = some_expression + 0.0*some_function(stuff)
>>
>> which is to say that the compiler could randomly insert calls to pretty
>> much any function pretty much anywhere, as long as the result was
>> multiplied by 0, or otherwise didn't change the value of the expression.
>> That's one case of calling a function more times than specified - one
>> time instead of zero.
>
> I too have seen that example.
>
> I am not sure, but I think there was an interpretation that ruled that
> the extra calls are allowed. IIRC, the example involved a substring
> operation, something like STR(IFOO():N). In a particular
> implementation, the function reference was evaluated twice, once
> to establish the index of the starting character and once to establish
> the length of the resulting character value. I admit that my memory
> is fuzzy on this one, but I think it was ruled to be confoming.
>

I don't remember that case (fuzzy memory seems to be going around).
But I do remember discussion about the rule that the character
strings in array constructors all be the same length. I think the
example was something like
[ (trim(f(i)), i = 1,n) ]
or maybe
[ ( string(f(i):g(i)), i = 1,n ]

The argument was that there was no practical way to evaluate
the constructor without evaluating the functions once to get the
maximum length, set up to make each element that length, and then
go through the sequence again term by term and pad with blanks as
necessary. The alternative would require continual back patching each
time a longer length occurred. It was a practical implementation
argument for large N, not a proof of impossibility. If I've
remembered everything correctly, that's an argument that J3
didn't believe that functions could be evaluated more than once
as an implementation strategy.

Dick Hendrickson

robert....@sun.com

unread,

Oct 20, 2006, 2:03:36 AM10/20/06

to

Dick Hendrickson wrote:
> robert....@sun.com wrote:

> > I am not sure, but I think there was an interpretation that ruled that
> > the extra calls are allowed. IIRC, the example involved a substring
> > operation, something like STR(IFOO():N). In a particular
> > implementation, the function reference was evaluated twice, once
> > to establish the index of the starting character and once to establish
> > the length of the resulting character value. I admit that my memory
> > is fuzzy on this one, but I think it was ruled to be confoming.
> >
> I don't remember that case (fuzzy memory seems to be going around).

It's entirely possible I am remembering a question asked outside of the
context of an official request for interpretation.

> But I do remember discussion about the rule that the character
> strings in array constructors all be the same length. I think the
> example was something like
> [ (trim(f(i)), i = 1,n) ]
> or maybe
> [ ( string(f(i):g(i)), i = 1,n ]
>
> The argument was that there was no practical way to evaluate
> the constructor without evaluating the functions once to get the
> maximum length, set up to make each element that length, and then
> go through the sequence again term by term and pad with blanks as
> necessary. The alternative would require continual back patching each
> time a longer length occurred. It was a practical implementation
> argument for large N, not a proof of impossibility. If I've
> remembered everything correctly, that's an argument that J3
> didn't believe that functions could be evaluated more than once
> as an implementation strategy.

Great! Then there should be no objection to adding language to the
standard prohibiting bogus function calls to be added via the
"mathematical equivalence" rule.

Bob Corbett

Dick Hendrickson

unread,

Oct 20, 2006, 11:39:20 AM10/20/06

to

Before you get too excited, remember I started out by saying
"fuzzy memory" and the last sentence started with "If". So,
there's some room for negotiation.

More seriously, adding a rule that tries to say A=B can't be
evaluated "as if" it were A=B + 0*F(X) doesn't seem useful.
Nobody seriously thinks that would ever happen. True, the current
rules could be read to say it could happen ("as if" is pretty open
ended), but it doesn't happen. The current fuzzy definition of
"mathematical equivalence" works because A) everybody wants it to
work, B) everybody intuitively knows what it means, C) we don't want
to unecessarily prevent future breakthroughs in optimization or
debugging. I once worked on a compiler that "evaluated" ARRAY(I,J)
as if it were FUNCTION(ARRAY, I, J, ...) where the ... had the
declared array bounds and the function did bounds checking. It
was a half-baked scheme to let us "check the box" on a procurement
that required some debugging capabilities. I would be a shame to
prevent something quasi-useful like this in order to clarify
something that "nobody" gets wrong in practice. At least, that's
my personal opinion today ;).

Dick Hendrickson
> Bob Corbett
>

Richard Maine

unread,

Oct 20, 2006, 1:35:55 PM10/20/06

to

Dick Hendrickson <dick.hen...@att.net> wrote:

> robert....@sun.com wrote:
> > Great! Then there should be no objection to adding language to the
> > standard prohibiting bogus function calls to be added via the
> > "mathematical equivalence" rule.
> >

> Before you get too excited,...

> More seriously, adding a rule that tries to say A=B can't be
> evaluated "as if" it were A=B + 0*F(X) doesn't seem useful.
> Nobody seriously thinks that would ever happen.

And if, for the sake of argument, I read Corbett's statement above
literally, then I'd have to say that it presupposes that there is a very
small set of possible reasons why anyone might object to something. This
does not match my empirical observation.

I'm sure that there is also universal agreement on the committee that

A = B

cannot be implemented as

write (*,*) 42

but I think you'd find substantial objection to adding language to the
standard prohibitting such bogus I/O. Yes, I know it isn't quite the
same. But still... convincing people to change things in the standard
requires more than abstract proof that the change is innocuous and
technically correct. Even if the proof is rock-solid, that doesn't do
the job. People object to things for more reasons than anyone could
enumerate.

robert....@sun.com

unread,

Oct 20, 2006, 5:35:57 PM10/20/06

to

So, let's take the example of the substring operation I gave before.
Would you say that it is standard -conforming for the substring
operation STR(F():N) to evaluate the function reference F() twice?
If not, why not?

Bob Corbett

Dick Hendrickson

unread,

Oct 20, 2006, 7:01:57 PM10/20/06

to

No, it's not (in my opinion) standard conforming. [I'm truly NOT
trying to be insulting here.] It's not standard conforming because
it makes no sense in the context of the general case that user written
external functions that are separately compiled often have side effects.
Chapter 1 says something like "the purpose of this standard is to
promote portability, reliability, ..." How can "portability" be
promoted or enhanced if F() is evaluated two (or more, what the heck)
times? It can't! Many of the statements in F2003 read better if
the phrase "use 1950s or 1960s common sense here" were added. The
standard can't possibly rule out every conceivable way to do
something (odd?) to a statement.

I think the same arguments applies to those nuts (two of whom drank
my wine in my house last weekend!) who say that a statement like
A = user_function_with_side_effects()
doesn't have to evaluate the function. How on earth does that improve
portability or reliability? Rather than try to interpret a specific
sentence in a narrow way, people need to step back a tad and look
at the purpose of the language.

If I hadn't had a heckuva day and a martini I probably would have
phrased this differently.

Dick Hendrickson

>
> Bob Corbett
>

robert....@sun.com

unread,

Oct 25, 2006, 12:52:29 AM10/25/06

to

I think you missed my point. I must not have been clear enough.

I believe that the standard should not allow a conforming processor
to evaluate a function reference where the referenced function has
side effects more than once. The standard, if the mathematical
equivalence rule is taken literally, says that it can. I intended the
example of STR(F():N) as an example where an implementor might
use the mathematical equivalence rule to claim he has the right to
evaluate the function reference F() more than once. Nothing
currently in the standard contradicts such a claim.

You state, in essence, that a literal reading of the Fortran standard
would allow evaluating a single function reference more than once,
but that such a reading is unreasonable. I agree that it is
undesireable for a processor to do so. Therefore, I believe that
the standard should prohibit a processor from doing so. I don't
believe it is in anyone's interest to encourage people to think that
if a portion of the Fortran standard taken literally is unreasonable,
it should be ignored.

Much of what the Fortran standard says seems unreasonable to
many people. In another recent thread, people were surprised to
learn that the standard allows the expression RAND() + RAND()
to be evaluated as 2*RAND(). A literal reading of the standard
makes it clear that that transformation is allowed, but many
people find that interpretation unreasonable.

I have seen instances where changes to the membership of J3
have changed the committee's interpretation of portions of the
standard. The same text that was interpreted one way at one
meeting was interpreted the opposite way in a later meeting.
The only difference was who attended the meetings.

The standard as written is an informal document, and so is
subject to differing interpretations. Nonetheless, the text of the
standard should be made as tight as it can be made to avoid as
many misinterpretations as possible.

One simple way of eliminating the possibility of some
misinterpretations would be to add examples illustrating the
complicated features of the language. One of the maddening
features of the current standard is that it has many examples of
the simple features of the language, which people are likely to
understand without assistance, but few examples of the
complicated features, which routinely trip people up. For
example, the semantics given in Section 12.3 of the standard
"Characteristics of procedures" have been misinterpreted by
many people, even members of the committee. There is not
a single example in that section.

Bob Corbett

glen herrmannsfeldt

unread,

Oct 25, 2006, 4:42:26 AM10/25/06

to

robert....@sun.com wrote:

(snip)

> Much of what the Fortran standard says seems unreasonable to
> many people. In another recent thread, people were surprised to
> learn that the standard allows the expression RAND() + RAND()
> to be evaluated as 2*RAND(). A literal reading of the standard
> makes it clear that that transformation is allowed, but many
> people find that interpretation unreasonable.

I would only find it reasonable in the case of an optimizing
compiler. That is, where given a choice of optimization levels,
I chose a higher level of optimization. I do realize that the
standard doesn't say anything about optimization levels, but it
makes some sense to me.

-- glen

Thomas Koenig

unread,

Oct 25, 2006, 2:38:03 PM10/25/06

to

<robert....@sun.com> wrote:

>I believe that the standard should not allow a conforming processor
>to evaluate a function reference where the referenced function has
>side effects more than once.

Just thinking... what should the following program print?
From what I understood, both 1.0 and 2.0 would be OK here.

program main
implicit none
integer :: iol
inquire(iolength=iol) f(1.0)
print *,f(1.0)
contains
function f(x)
implicit none
real :: f
real, intent(in) :: x
logical :: init = .false.
if (init) then
f = x+1.0
else
f = x
init = .true.
end if
end function f
end program main

robert....@sun.com

unread,

Oct 26, 2006, 1:04:37 AM10/26/06

to

Thomas Koenig wrote:

> Just thinking... what should the following program print?
> From what I understood, both 1.0 and 2.0 would be OK here.
>
> program main
> implicit none
> integer :: iol
> inquire(iolength=iol) f(1.0)
> print *,f(1.0)
> contains
> function f(x)
> implicit none
> real :: f
> real, intent(in) :: x
> logical :: init = .false.
> if (init) then
> f = x+1.0
> else
> f = x
> init = .true.
> end if
> end function f
> end program main

You might be correct. Interpretation 95 for Fortran 90 states that
function references in the output list of an INQUIRE statement
with an IOLENGTH= specifier may be evaluated but do not have
to be evaluated. Of course, given a stetement such as

INQUIRE(IOLENGTH=IOL) TRIM(SFUNC())

where SFUNC is a CHARACTER function, it is highly likely that
the function reference SFUNC() will be evaluated.

Technically, that interpretation does not apply to the current
Fortran standard. Interpretations are valid only for the
edition of the standard for which they are issued.

Bob Corbett

Dick Hendrickson

unread,

Oct 27, 2006, 1:16:14 PM10/27/06

to

This is interesting, I've read your post several times, and I agree
with essentially everything you say except for your conclusion.
My reasoning is mostly a hodge-podge of semi-unrelated ideas
and not a proof of anything.

Mostly, I think we disagree about the breadth of the mathematical
equivalence rule. Should it be very literal or should it be
subject to an "implicit reasonableness" rule. I think the
strength of Fortran is the looseness and informality of many of the
rules. This has the major advantage that the rules don't
generally need to be overhauled when new things are added. For
example, the anti-dummy-argument-aliasing rule didn't need to
be modified when structures were added, nor did the C interop stuff
need a bunch of rules about what the c code could or could not do
with its arguments. F95 made a major accommodation for IEEE
machines mostly by adding the phrase "unless the processor supports
it" to the chapter 7 prohibition against doing mathematically
undefined things (like division by zero). The Fortran kind
mechanism will, apparently, support the various proposed IEEE
packed decimal formats by magic (although the poor guy who
supports the libraries will be somewhat inconvenienced).

The mathematical equivalence allows a ton of optimizations.
It is worded loosely and informally. But, messing with it
runs the risk of prohibiting something good sometime in the
future. True, I can't imagine why evaluating X+Y as
X+Y + 0*F() would ever be good. But, I can easily imagine
that "fixing" poor wording can have unintended consequences.
I claim that the existing wording is "obviously correct"
because everybody knows what it means and gets it right,
at least as far as unnecessary function calls is concerned.

I would change my opinion if somebody were to show me an
existing useful program that had extra side effects on some
processors. Otherwise, I think we just disagree.

Dick Hendrickson

I agree completely about the lack of examples. Did you know that
the 2008 draft is going through its internal review now? This
is a great time to add examples!

Dick Hendrickson

unread,

Oct 27, 2006, 1:20:02 PM10/27/06

to

Yes, that's technically true. But, any interps that
result in wording changes will propagate forward (unless the
part in question gets heavily modified or somebody makes a
mistake.) Ditto, interps that say "no change is necessary
because what's there is clear enough" also propagate forward
(the wording remains as clear as ever, although that isn't
very satisfying ;) ).

Dick hendrickson
> Bob Corbett
>

glen herrmannsfeldt

unread,

Oct 27, 2006, 1:55:46 PM10/27/06

to

Dick Hendrickson <dick.hen...@att.net> wrote:
(very large snip)

> The mathematical equivalence allows a ton of optimizations.
> It is worded loosely and informally. But, messing with it
> runs the risk of prohibiting something good sometime in the
> future. True, I can't imagine why evaluating X+Y as
> X+Y + 0*F() would ever be good. But, I can easily imagine
> that "fixing" poor wording can have unintended consequences.
> I claim that the existing wording is "obviously correct"
> because everybody knows what it means and gets it right,
> at least as far as unnecessary function calls is concerned.

One of the more popular optimizations often done by Fortran
is common subexpression elimination. That would tend to
evaluate a function less often, instead of more often.

If functions can have side effects, then they can't be
considered in common subexpression elimination. There
are probably examples that would fit X+Y+0*F(), but I
can't think of them right now.

-- glen

Dick Hendrickson

unread,

Oct 27, 2006, 5:30:59 PM10/27/06

to

glen herrmannsfeldt wrote:
> Dick Hendrickson <dick.hen...@att.net> wrote:
> (very large snip)
>
>> The mathematical equivalence allows a ton of optimizations.
>> It is worded loosely and informally. But, messing with it
>> runs the risk of prohibiting something good sometime in the
>> future. True, I can't imagine why evaluating X+Y as
>> X+Y + 0*F() would ever be good. But, I can easily imagine
>> that "fixing" poor wording can have unintended consequences.
>> I claim that the existing wording is "obviously correct"
>> because everybody knows what it means and gets it right,
>> at least as far as unnecessary function calls is concerned.
>
> One of the more popular optimizations often done by Fortran
> is common subexpression elimination. That would tend to
> evaluate a function less often, instead of more often.

That's a point of contention. There are several well respected
people who will claim that in a statement like
print *, F()
the processor doesn't have to evaluate F() at all!
Personally, I think they are reaching to over-interpret
"clear" words that say otherwise. But, they disagree and
are unlikely to ever see the light ;( .

>
> If functions can have side effects, then they can't be
> considered in common subexpression elimination.

That also is a matter of opinion. After all, if they
don't ever have to be evaluated, it surely is OK to
not evaluate them several times in a statement. ;)

> There are probably examples that would fit X+Y+0*F(), but I
> can't think of them right now.

My point was that I can't think of any either. So, it's a
mistake to try and add words to the standard that prohibit
something that never happens. As I think I said in my
earlier post, the chapter 7 text that allows optimization
is loosely worded. Personally, I think that's a good thing
because it allows progress (or whatever we call it). But,
the loose wording can be misinterpreted: that's the heart of
the function with side effects evaluation problem. But, I
claim there is no actual problem in production code or compilers.
Compilers don't abuse the mathematical equivalence rule; they
don't abuse functions with side effects. Hence, there's no
point in trying to fix (admittedly poorly worded) text that
works as intended. It runs too great a risk of breaking
something else.

Dick Hendrickson

PS: Just to be clear. In an explicit expression like
X + Y + 0*F()
the standard currently (and has forever) allowed the processor
to skip evaluation of F() if it wants to. The side effected
things are specifically undefined after the expression is
evaluated.

>
> -- glen

glen herrmannsfeldt

unread,

Oct 27, 2006, 5:52:02 PM10/27/06

to

Dick Hendrickson <dick.hen...@att.net> wrote:

> glen herrmannsfeldt wrote:

>> There are probably examples that would fit X+Y+0*F(), but I
>> can't think of them right now.

> My point was that I can't think of any either.

OK, try this:

DO I=0,1000000
J=0
IF(I.NE.0) J=I*F(I)
WRITE(*,*) J
ENDDO

An optimizer might be able to figure out that
it would get the same result executing J=I*F(I)
even when I=0, and save 999999 successful comparisons
and conditional jumps.

I probably don't believe that any compilers do this,
but it likely would be faster.

-- glen

Richard Maine

unread,

Oct 27, 2006, 10:31:18 PM10/27/06

to

glen herrmannsfeldt <g...@seniti.ugcs.caltech.edu> wrote:

> DO I=0,1000000
> J=0
> IF(I.NE.0) J=I*F(I)
> WRITE(*,*) J
> ENDDO
>
> An optimizer might be able to figure out that
> it would get the same result executing J=I*F(I)
> even when I=0, and save 999999 successful comparisons
> and conditional jumps.

I know of no possible justification for a compiler evaluating that. Even
in the most extreme interpretation of allowing random junk to be done in
evaluating expressions, there still has to be an expression being
evaluated in the first place. I don't even know of any straw-man
arguments that have ever been made for allowing the compiler to randomly
evaluate expressions just because it feels like it. This is not
comparable to the x+y+0*f() case.

Of course, if the compiler can do it in a way that has no effect (for
example, if the function has no side effects) then the "don't ask, don't
tell" policy applies. I.E. all is ok if it doesn't show.

glen herrmannsfeldt

unread,

Oct 28, 2006, 12:11:05 AM10/28/06

to

Richard Maine wrote:

(I wrote)

>> DO I=0,1000000
>> J=0
>> IF(I.NE.0) J=I*F(I)
>> WRITE(*,*) J
>> ENDDO

(snip)

> I know of no possible justification for a compiler evaluating that. Even
> in the most extreme interpretation of allowing random junk to be done in
> evaluating expressions, there still has to be an expression being
> evaluated in the first place. I don't even know of any straw-man
> arguments that have ever been made for allowing the compiler to randomly
> evaluate expressions just because it feels like it. This is not
> comparable to the x+y+0*f() case.

I think I agree that the compiler shouldn't do it. Maybe I don't
understand the x+y+0*f() case, but I thought it was reasonably close.
Would you like it better if it actually had an x and y?

> Of course, if the compiler can do it in a way that has no effect (for
> example, if the function has no side effects) then the "don't ask, don't
> tell" policy applies. I.E. all is ok if it doesn't show.

I thought that was the original question: what is the compiler allowed
to assume regarding side effects.

-- glen

Richard Maine

unread,

Oct 28, 2006, 12:31:49 AM10/28/06

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Richard Maine wrote:
>
> (I wrote)
>
> >> DO I=0,1000000
> >> J=0
> >> IF(I.NE.0) J=I*F(I)
> >> WRITE(*,*) J
> >> ENDDO
>
> (snip)
>
> > I know of no possible justification for a compiler evaluating that. Even
> > in the most extreme interpretation of allowing random junk to be done in
> > evaluating expressions, there still has to be an expression being
> > evaluated in the first place. I don't even know of any straw-man
> > arguments that have ever been made for allowing the compiler to randomly
> > evaluate expressions just because it feels like it. This is not
> > comparable to the x+y+0*f() case.
>
> I think I agree that the compiler shouldn't do it. Maybe I don't
> understand the x+y+0*f() case, but I thought it was reasonably close.
> Would you like it better if it actually had an x and y?

No. The x+y+0*f() case is about how to evaluate the expression x+y,
assuming that the expression is to be evaluated. In your case, the
expression is not to be evaluated. That is a fundamental difference. It
makes no difference at all what is in the expression, since it isn't
evaluated anyway.

Dick Hendrickson

unread,

Oct 29, 2006, 11:55:19 AM10/29/06

to

glen herrmannsfeldt wrote:
> Richard Maine wrote:
>
> (I wrote)
>
>>> DO I=0,1000000
>>> J=0
>>> IF(I.NE.0) J=I*F(I)
>>> WRITE(*,*) J
>>> ENDDO
>
> (snip)
>
>> I know of no possible justification for a compiler evaluating that. Even
>> in the most extreme interpretation of allowing random junk to be done in
>> evaluating expressions, there still has to be an expression being
>> evaluated in the first place. I don't even know of any straw-man
>> arguments that have ever been made for allowing the compiler to randomly
>> evaluate expressions just because it feels like it. This is not
>> comparable to the x+y+0*f() case.
>
> I think I agree that the compiler shouldn't do it. Maybe I don't
> understand the x+y+0*f() case, but I thought it was reasonably close.
> Would you like it better if it actually had an x and y?

It's a hard one to understand because, in a way, it makes no sense.
There's a statement in the standard something like "once the compiler
has figured out what the expression means, it can evaluate any
mathematically equivalent expression." And then gives legal examples
like x + x + x can be done as 3*x and some illegal examples like
I/J/K can't be done as I/(J*K) for integers. The problem is that
the sentence is pretty loosely worded, probably to give the
optimizers great freedom as new hardware evolves. The problem is
that "X+Y" is "mathematically equivalent" to "X+Y+0*F()" which
would seem to allow the compiler to evaluate the function and then
discard the result. And then, what happens to the side effects?

I think the crux of the argument is that Bob would like to add words
to the standard that explicitly prevent a compiler from making
"really silly" transformations under the guise of "mathematical
equivalence". I'm more or less opposed to mucking around with the
wording of a complicated section of the standard (that, in my
opinion, everybody understands and everybody gets right) merely
to prevent something that never happens. Bob would use a different
set of adjectives and, correctly, point out that loose wording in other
parts of the standard has caused countless problems. I just think the
possibility of unintended side effects is too great in this case and
we should leave the wording alone.

Dick Hendrickson

Gary Scott

unread,

Oct 29, 2006, 12:02:29 PM10/29/06

to

Dick Hendrickson wrote:

Doesn't C interoperability need to say something about this issue, maybe
clarify things, since everything in C is a "function" and they obviously
can do just about anything possible.

>
> Dick Hendrickson
>
>>
>>> Of course, if the compiler can do it in a way that has no effect (for
>>> example, if the function has no side effects) then the "don't ask, don't
>>> tell" policy applies. I.E. all is ok if it doesn't show.
>>
>>
>> I thought that was the original question: what is the compiler allowed
>> to assume regarding side effects.
>>
>> -- glen
>>

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

glen herrmannsfeldt

unread,

Oct 29, 2006, 3:44:39 PM10/29/06

to

Dick Hendrickson wrote:

(I wrote)
>>
>>>> DO I=0,1000000
>>>> J=0
>>>> IF(I.NE.0) J=I*F(I)
>>>> WRITE(*,*) J
>>>> ENDDO

(others commented in the mean time, then...)

> It's a hard one to understand because, in a way, it makes no sense.
> There's a statement in the standard something like "once the compiler
> has figured out what the expression means, it can evaluate any
> mathematically equivalent expression." And then gives legal examples
> like x + x + x can be done as 3*x and some illegal examples like
> I/J/K can't be done as I/(J*K) for integers. The problem is that
> the sentence is pretty loosely worded, probably to give the
> optimizers great freedom as new hardware evolves. The problem is
> that "X+Y" is "mathematically equivalent" to "X+Y+0*F()" which
> would seem to allow the compiler to evaluate the function and then
> discard the result. And then, what happens to the side effects?

I wrote the above as a case where it might be faster to evaluate
0*f() than not to evaluate it. I haven't seen anyone else even
try to do that. Before the addition of PURE, I might say it could
have been reasonable to allow the evaluation of functions a different
number of times than as written. With PURE, and the ability to tell the
compiler which functions don't have side effects, I am less sure.

> I think the crux of the argument is that Bob would like to add words
> to the standard that explicitly prevent a compiler from making
> "really silly" transformations under the guise of "mathematical
> equivalence". I'm more or less opposed to mucking around with the
> wording of a complicated section of the standard (that, in my
> opinion, everybody understands and everybody gets right) merely
> to prevent something that never happens. Bob would use a different
> set of adjectives and, correctly, point out that loose wording in other
> parts of the standard has caused countless problems. I just think the
> possibility of unintended side effects is too great in this case and
> we should leave the wording alone.

Higher levels of optimization sometimes require making mathematical
assumptions. I would say that with optimization off that a compiler
shouldn't do things like this. I am not so sure about the case
with high levels of optimization.

-- glen

Richard Maine

unread,

Oct 29, 2006, 4:10:25 PM10/29/06

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> I wrote the above as a case where it might be faster to evaluate
> 0*f() than not to evaluate it. I haven't seen anyone else even
> try to do that. Before the addition of PURE, I might say it could
> have been reasonable to allow the evaluation of functions a different
> number of times than as written. With PURE, and the ability to tell the
> compiler which functions don't have side effects, I am less sure.

...

> Higher levels of optimization sometimes require making mathematical
> assumptions. I would say that with optimization off that a compiler
> shouldn't do things like this. I am not so sure about the case
> with high levels of optimization.

Maybe I'm misreading what you are saying again. I seem to have that
trouble a lot with your posts. I think the problem is that the subject
switches in subtle ways without warning to me.

Are you talking about what the standard requires or about how you think
things ought to be independent of the standard? I think that maybe you
switched to talking about what you think the rules ought to be
independent of the standard. If so, that's fine, but that's not a
discussion I will participate in.

If you are still talking about what the standard requires, then...

I am 100% sure on this one. There is nothing in the standard that even
hints at allowing this. None of the arguments presented for other cases
- even the arguments that nobody really believes, but give as
ilustrations of silliness - apply.

The two parts of the standard that are often quoted are

1. A bit about evaluating mathematically equivalent expressions. You
aren't talking about a case where there is an expression to be
mathematically equivalent to. So that doesn't apply.

2. A bit saying that fucntions don't always have to be evaluated in some
cases. You are talking about evaluating calls that aren't there instead
of about not evaluating them, so that doesn't apply.

One might as well ask whether a compiler is allowed to evaluate some
function f() for the program

program main
write (*,*) "Hello, world."
end

But I'm going to stop replying to you on this one. It is really all a
waste of time. No, the standard does not allow that. If you don't
understand why it doesn't allow that, then I guess I just give up on
explaining it if that's the subject.

If you are talking about what you think "should" be the case, instead of
about what the standard says, then that's a discussion I decline to
participate in.

So, I suppose that either way, I have nothing more to say.

glen herrmannsfeldt

unread,

Oct 30, 2006, 12:44:38 AM10/30/06

to

Richard Maine wrote:
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

(snip regarding function calls and optimizing compilers)

> Maybe I'm misreading what you are saying again. I seem to have that
> trouble a lot with your posts. I think the problem is that the subject
> switches in subtle ways without warning to me.

I suppose that happens sometimes.

> Are you talking about what the standard requires or about how you think
> things ought to be independent of the standard? I think that maybe you
> switched to talking about what you think the rules ought to be
> independent of the standard. If so, that's fine, but that's not a
> discussion I will participate in.

As far as I can tell, no-one seems to be certain about what the standard
requires. I thought it was you who originally brought up the question
about side effects and function evaluation.

> If you are still talking about what the standard requires, then...

> I am 100% sure on this one. There is nothing in the standard that even
> hints at allowing this. None of the arguments presented for other cases
> - even the arguments that nobody really believes, but give as
> ilustrations of silliness - apply.

> The two parts of the standard that are often quoted are

> 1. A bit about evaluating mathematically equivalent expressions. You
> aren't talking about a case where there is an expression to be
> mathematically equivalent to. So that doesn't apply.

I had 0*f(x), which is mathematically equivalent to 0.
(For integers there is not usually a NaN or Inf to wonder about.)

> 2. A bit saying that fucntions don't always have to be evaluated in some
> cases. You are talking about evaluating calls that aren't there instead
> of about not evaluating them, so that doesn't apply.

Well, someone else brought that up in the first place. I was trying to
find an example where I believed the more optimal, and mathematically
equivalent, code would evaluate a function more often than it would
as written.

I have been surprised in the past to find some optimizations that
were allowed, and could generate different results. I don't know
that I believe in this one, only that it could generate more optimal
code by evaluating the function more often.

-- glen

robert....@sun.com

unread,

Oct 30, 2006, 1:48:27 AM10/30/06

to

Dick Hendrickson wrote:

> I think the crux of the argument is that Bob would like to add words
> to the standard that explicitly prevent a compiler from making
> "really silly" transformations under the guise of "mathematical
> equivalence". I'm more or less opposed to mucking around with the
> wording of a complicated section of the standard (that, in my
> opinion, everybody understands and everybody gets right) merely
> to prevent something that never happens. Bob would use a different
> set of adjectives and, correctly, point out that loose wording in other
> parts of the standard has caused countless problems. I just think the
> possibility of unintended side effects is too great in this case and
> we should leave the wording alone.

Right. I think the standard should not allow those transformations
that everyone agrees it should not allow.

Both you and Richard Maine have said that a processor that
added extra evaluations of a user-defined function under the
cover of the mathematical equivalence rule are not
standard-conforming. But such an implementation would, if
it was conforming in all other respects, satisfy the requirements
of the "Conformance" section of the standard. What then would
be the basis for saying it is not a conforming implementation?

Bob Corbett

Dick Hendrickson

unread,

Oct 30, 2006, 3:20:09 PM10/30/06

to

Gary Scott wrote:
> Dick Hendrickson wrote:
>
>> glen herrmannsfeldt wrote:
>>
>>> Richard Maine wrote:
>>>

[snip, long discussion about function side effects.]

>
> Doesn't C interoperability need to say something about this issue, maybe
> clarify things, since everything in C is a "function" and they obviously
> can do just about anything possible.
>

Probably not. I believe that Fortran sets the rules for how functions
must behave and, if any called function doesn't follow the rules then
the program is non-conforming. I think C functions must follow the
same rules. So, for example, in a statement like
A = c_func(x)
c_func is allowed to modify its argument, but in
A = c_func(x) + c_func(x) + c_func(y)
only the third reference is allowed to modify its argument. It's
up to the programmer to do the right thing here.

Dick Hendrickson

glen herrmannsfeldt

unread,

Oct 30, 2006, 3:28:54 PM10/30/06

to

Dick Hendrickson <dick.hen...@att.net> wrote:
> Probably not. I believe that Fortran sets the rules for how functions
> must behave and, if any called function doesn't follow the rules then
> the program is non-conforming. I think C functions must follow the
> same rules. So, for example, in a statement like

> A = c_func(x)

> c_func is allowed to modify its argument, but in

> A = c_func(x) + c_func(x) + c_func(y)
> only the third reference is allowed to modify its argument. It's
> up to the programmer to do the right thing here.

Well, C functions use call by value, so functions can't
modify their (actual) arguments. To modify something in
the calling program, at is necessary to pass a pointer.

As I understand it, C interop can do call by value,
or pass a pointer to the Fortran variable. C has rules
on modifying the same variable in one statement somewhat
different from those of Fortran.

-- glen

Dick Hendrickson

unread,

Oct 30, 2006, 4:26:26 PM10/30/06

to

robert....@sun.com wrote:
> Dick Hendrickson wrote:
>
>> I think the crux of the argument is that Bob would like to add words
>> to the standard that explicitly prevent a compiler from making
>> "really silly" transformations under the guise of "mathematical
>> equivalence". I'm more or less opposed to mucking around with the
>> wording of a complicated section of the standard (that, in my
>> opinion, everybody understands and everybody gets right) merely
>> to prevent something that never happens. Bob would use a different
>> set of adjectives and, correctly, point out that loose wording in other
>> parts of the standard has caused countless problems. I just think the
>> possibility of unintended side effects is too great in this case and
>> we should leave the wording alone.
>
> Right. I think the standard should not allow those transformations
> that everyone agrees it should not allow.

Exactly, we're almost in furious agreement here. There's only the
problem of getting the words correct enough so that
they say what we mean and don't say more than we mean. My
feeling is that the current words are good enough because
everybody gets it right. If there were a production compiler
that evaluated a function more than required in a production code
I'd give a different answer. But, there isn't and, therefore,
I think it's a not-good idea to mess with the existing wording.

>
> Both you and Richard Maine have said that a processor that
> added extra evaluations of a user-defined function under the
> cover of the mathematical equivalence rule are not
> standard-conforming. But such an implementation would, if
> it was conforming in all other respects, satisfy the requirements
> of the "Conformance" section of the standard. What then would
> be the basis for saying it is not a conforming implementation?

I'm not sure I said exactly that. I'm sure I would have qualified
it by saying "a function with side effects" and if I didn't say
something like that I either meant to say it, or thought it was obvious
from context. My argument is inferential, not direct. I certainly
don't believe there is a direct statement in the standard that says
the compiler can't do extra evaluations of functions with side effects.
But, chapter 2 describes order of evaluation and gives no hint that
extra evaluations are allowed. It specifically says the result is
"as if" the statements were executed in order, which, to me, means that
the optimizer must recognize the naive program as the real program
interpretation. Chapter 7 describes the allowable
transformations and gives no hint that adding things like 0*f()
was ever contemplated. Chapter 1 says that the purpose
of the standard is to ensure portability, reliability and lots of
other good things. Taken together, this has to mean that a processor
can't induce extra side-effects where they aren't called for by
the original program. It's a weak argument, but I can't believe that
extra side-effects increase portability. I think there is
no explicit prohibition merely because the original framers never
thought it was necessary. I can imagine many bizarre things that
compilers don't do and that are not specifically prohibited. I
don't think it's necessary to go through and add prohibitions against
things that will never happen. Yes, I can recognize that as a weak
argument.

Dick Hendrickson

>
> Bob Corbett
>

Dick Hendrickson

unread,

Oct 30, 2006, 4:33:57 PM10/30/06

to

I don't think things like call by value or anything else matter.
The Fortran standard style is to list prohibitions by
philosophy, rather than as a strict list of specific things
you can't do. The general rule is that functions aren't allowed
to have side-effects on the statement they appear in. It's
not a list of particular statements you can't execute (although
that can often be inferred). It's just a rule that the programmer
has to get right. In my second example above, c_func is not
allowed to modify x, period. And, only the third reference
is allowed to modify y.

At least, it is my understanding that C interop functions must obey
the same rules that normal Fortran functions must obey.

Dick Hendrickson

Craig Powers

unread,

Oct 30, 2006, 5:05:14 PM10/30/06

to

If I recall correctly, in C there's a sequence point introduced by a
function call, so it's not undefined behavior for c_func to modify its
argument in the second example.

Where things get more interesting is,
A = c_func_2(c_func(x), c_func(x), c_func(y))

where, as I recall, the order of evaluation is unspecified, the
arguments to c_func_2 can be evaluated in any order.

Gary Scott

unread,

Oct 30, 2006, 9:36:56 PM10/30/06

to

Dick Hendrickson wrote:

> Gary Scott wrote:
>
>> Dick Hendrickson wrote:
>>
>>> glen herrmannsfeldt wrote:
>>>
>>>> Richard Maine wrote:
>>>>
> [snip, long discussion about function side effects.]
>
>>
>> Doesn't C interoperability need to say something about this issue,
>> maybe clarify things, since everything in C is a "function" and they
>> obviously can do just about anything possible.
>>
>
> Probably not. I believe that Fortran sets the rules for how functions
> must behave and, if any called function doesn't follow the rules then
> the program is non-conforming. I think C functions must follow the
> same rules. So, for example, in a statement like
> A = c_func(x)
> c_func is allowed to modify its argument, but in
> A = c_func(x) + c_func(x) + c_func(y)

What happens in this situation?

iretcode = c_f_setIODeviceFuncCode(1) + c_f_setIODeviceFuncCode(2)

Where you want to determine the return code of an IO device with one
function code value and then add it to the return code of an IO device
with another function code value. (this isn't contrived or uncommon)

> only the third reference is allowed to modify its argument. It's
> up to the programmer to do the right thing here.
>
> Dick Hendrickson

Dick Hendrickson

unread,

Oct 30, 2006, 10:25:17 PM10/30/06

to

Gary Scott wrote:
> Dick Hendrickson wrote:
>
>> Gary Scott wrote:
>>
>>> Dick Hendrickson wrote:
>>>
>>>> glen herrmannsfeldt wrote:
>>>>
>>>>> Richard Maine wrote:
>>>>>
>> [snip, long discussion about function side effects.]
>>
>>>
>>> Doesn't C interoperability need to say something about this issue,
>>> maybe clarify things, since everything in C is a "function" and they
>>> obviously can do just about anything possible.
>>>
>>
>> Probably not. I believe that Fortran sets the rules for how functions
>> must behave and, if any called function doesn't follow the rules then
>> the program is non-conforming. I think C functions must follow the
>> same rules. So, for example, in a statement like
>> A = c_func(x)
>> c_func is allowed to modify its argument, but in
>> A = c_func(x) + c_func(x) + c_func(y)
>
>
> What happens in this situation?
>
> iretcode = c_f_setIODeviceFuncCode(1) + c_f_setIODeviceFuncCode(2)
>
> Where you want to determine the return code of an IO device with one
> function code value and then add it to the return code of an IO device
> with another function code value. (this isn't contrived or uncommon)

It's perfectly fine, so long as one of the function references
doesn't affect the other. It's no different from
A = sin(x) + sin(y) + cos(x) + cos(y).
Functions can do anything they want so long as they don't have side
effects on anything else in the statement.

The rule in F2003 is "The evaluation of a function reference shall
neither affect nor be affected by the evaluation of any other entity
within the statement." Like most Fortran rules, this is really a
restriction on the programmer, not the processor. It gives the
processor the freedom to evaluate things in any order. The same
rule has been in Fortran since F77.

Dick Hendrickson

unread,

Oct 30, 2006, 10:25:39 PM10/30/06

to

Craig Powers wrote:
> glen herrmannsfeldt wrote:
>> Dick Hendrickson <dick.hen...@att.net> wrote:
>>> Probably not. I believe that Fortran sets the rules for how functions
>>> must behave and, if any called function doesn't follow the rules then
>>> the program is non-conforming. I think C functions must follow the
>>> same rules. So, for example, in a statement like
>>
>>> A = c_func(x)
>>
>>> c_func is allowed to modify its argument, but in
>>
>>> A = c_func(x) + c_func(x) + c_func(y)
>>> only the third reference is allowed to modify its argument. It's
>>> up to the programmer to do the right thing here.
>>
>> Well, C functions use call by value, so functions can't
>> modify their (actual) arguments. To modify something in
>> the calling program, at is necessary to pass a pointer.
>>
>> As I understand it, C interop can do call by value,
>> or pass a pointer to the Fortran variable. C has rules
>> on modifying the same variable in one statement somewhat
>> different from those of Fortran.
>
> If I recall correctly, in C there's a sequence point introduced by a
> function call, so it's not undefined behavior for c_func to modify its
> argument in the second example.

NO, NO, NO, a thousand times NO! The second example is intended to
be a Fortran statement. There are no sequence points in a Fortran
statement. A function in a Fortran statement is NOT allowed to
modify anything else that affects the statement. You can't
come up with a way to modify something that the Fortran standard
doesn't forbid! It forbids everything! This is not a restriction
that Fortran is required to enforce or detect violations of. It's
not a syntax rule. It's an absolute prohibition in this case. It's
up to you to not do it. It doesn't matter if you write it in Fortran
or C or Pascal or java or Lisp or COMPASS. It's an incorrect
program if c_func modifies its arguments in the way I described!
There are no "well, what about ...?"s!

The rule in F2003 is "The evaluation of a function reference shall
neither affect nor be affected by the evaluation of any other entity

within the statement." like most Fortran rules, this is really a
restriction on the programmer, not the processor.

>
> Where things get more interesting is,
> A = c_func_2(c_func(x), c_func(x), c_func(y))
>
> where, as I recall, the order of evaluation is unspecified, the
> arguments to c_func_2 can be evaluated in any order.

True, they can be evaluated in any order. But, they also can be
evaluated in any order in my example above. That's why the
restriction on function side-effects is in the language.

Dick Hendrickson

Gary Scott

unread,

Oct 31, 2006, 7:54:43 AM10/31/06

to

Ok, I see, I guess I missed the "in the statement" part. I only saw the
"can't have side effects part".

>
> The rule in F2003 is "The evaluation of a function reference shall
> neither affect nor be affected by the evaluation of any other entity
> within the statement." Like most Fortran rules, this is really a
> restriction on the programmer, not the processor. It gives the
> processor the freedom to evaluate things in any order. The same
> rule has been in Fortran since F77.
>
> Dick Hendrickson
>
>>
>>
>>> only the third reference is allowed to modify its argument. It's
>>> up to the programmer to do the right thing here.
>>>
>>> Dick Hendrickson
>>
>>
>>

Craig Powers

unread,

Oct 31, 2006, 11:17:07 AM10/31/06

to

Dick Hendrickson wrote:

> Craig Powers wrote:
>>
>> If I recall correctly, in C there's a sequence point introduced by a
>> function call, so it's not undefined behavior for c_func to modify its
>> argument in the second example.
>
> NO, NO, NO, a thousand times NO! The second example is intended to
> be a Fortran statement. There are no sequence points in a Fortran
> statement.

Sorry about the language confusion. I'm aware of the Fortran rules, and
should have altered the example so that it was clearly C only.