On 7/15/22 6:47 AM, Dave Tholen wrote:
> If you read (sequentially) past the end-of-file, gfortran sets the value
> of IOSTAT to -1, but if you read a second time, gfortran sets the value
> of IOSTAT to 5001. I discovered this situation after having written a
> program expecting subsequent READs to all return a negative number, and
> of course the program didn't behave as expected until I investigated the
> problem. So I'm curious as to why the choice was made to have IOSTAT
> set to something other than -1 on any READ attempt after the end-of-file
> condition has been triggered. Isn't the file pointer still just beyond
> the last record in the file, even after the first failed READ?
As others have explained, this dates back to when files were mostly
stored and accessed on magnetic tape. There were separate end-of-record
EOR and end-of-file EOF markers that were written to the tape, and you
could have separate files stored on a single tape, one after another.
The logical way to think of this is that each write to the tape would
write the data, followed by the EOR mark, followed by the EOF mark, and
then the tape was positioned to right before the EOF mark. If subsequent
records were written, they would overwrite the old EOF mark. If the tape
was rewound or unmounted or backspaced, or anything else, then that old
EOF mark would be there for the next time the tape was mounted or
positioned at that point.
After the last record in a tape file was read, a subsequent read would
detect the EOF mark on the tape. That information was returned through
the end= branch in the read statement or through the IOSTAT variable.
Once that happened, there were several things that could occur. One of
them was that you could read the next file. That next file might have
different record lengths, or it could have different
formatted/unformatted status, and so on, so the program needed some
ability to change those characteristics. This was all done in
nonportable, machine-specific ways before f77, and then f77 introduced
OPEN, CLOSE, and INQUIRE to make it more portable. But it was sometimes
still difficult to do things like this in a portable way.
Another thing that you could do was to BACKSPACE on the unit to move the
tape back to right before the EOF mark. From there, you could append
more records to that file. Of course not all files were stored on
magnetic tapes. There were other devices including paper tape, cards,
drums, disks, and now in modern times various kinds of solid state
devices and network file systems and so on. But the language allowed
these newer devices to look like they were magnetic tapes, so all this
stuff about records and EOR and EOF marks was still retained.
However, this one thing about EOF marks was not always done
consistently. In f77, the only way to append to a file was to read
records until the end= or IOSTAT value indicated that the file was
beyond the virtual EOF mark. From there, you were supposed to be able to
BACKSPACE the unit and then append new records. But many machines did
not do this correctly. There was often no actual EOF record written to
the file, and instead the end= condition was triggered when the file
pointer was at the end of the filesystem metadata for that file. On
these machines, no BACKSPACE was required, you could just start writing
the new records at that point. And even worse, if you did BACKSPACE, in
what should have been the correct instruction sequence, it would
backspace over the last record in the file. Then when you wrote the new
records, you would lose the information in that last record and the
record counts thereafter would all be off by one. So for the 1980s and
even 1990s, a cautious programmer had to test for this condition and
only do the BACKSPACE conditionally. I have not tested this recently,
but I expect it still happens when one tries to append to a file that
has been positioned with end=.
F90 allowed a file to be opened and positioned appropriately to append
new records. So the conditional BACKSPACE code could be replaced with
this new portable functionality.
So for your question, it is because EOF is not treated as an error
condition. It is intended to be a normal situation that a programmer can
encounter, detect, and continue processing accordingly in a portable
way. However, reading or writing past the EOF might or might not be
allowed on a given device, or for a file on some device. If it is not
allowed, then it is an error condition, an exceptional situation that
should not be normally encountered.
> On a completely separate matter, I have a different program that
> didn't behave as expected, and that misbehavior was totally repeatable.
> In an attempt to debug the program, I added a WRITE statement to check
> on the value of a variable during execution. However, once the WRITE
> statement was added, the program started behaving properly, repeatably.
> Comment out the added WRITE statement, and the program once again
> misbehaves, repeatedly. Re-enable the WRITE statement, and everything
> is once again hunky-dory. Damned frustrating. It's too easy to blame
> the optimizer. Anybody have any generic advice on what to look for in
> such a situation?
These are called 'Heisenbugs'. The program is corrupting either the data
or the stored instruction code somehow. It could be caused by
out-of-range array indexing, or by some illegal argument mismatch (e.g.
a scalar actual argument associated with a dummy array argument), or
referencing an undefined or stale pointer, or maybe by switching the
order of some arguments that otherwise pass the type-kind-rank (TKR)
detection that the compiler can detect (an example might be switching
row and column indices of an array). The difficult thing about locating
these types of errors is that the backtrace does not always occur right
when the error occurs. The corruption can occur at one point, then then
execution can continue for some time before that data is accessed or
those instructions are executed.
These bugs were quite common before f90. F90 introduced several new
features that help to detect or eliminate these errors. These include
TKR checking for subroutines that have explicit interfaces, assumed
shape array declarations, the IMPLICIT NONE declaration (which helps
eliminate local variables that arise because of typos and misspellings),
and free-form source (which eliminates errors due to running past column
72 in the older fixed-form source). If you are writing new code, then
try to use these features. If you are working with legacy code, then you
might try moving some of the newer features into the code in order to
get some help from the compiler. Of course, modern fortran also has new
ways to introduce Heisenbugs, such as pointers which can become stale
and reference data that no longer exists in memory.
$.02 -Ron Shepard