Reading an unformatted file

Arjan

unread,

Jan 27, 2011, 3:58:49 AM1/27/11

to

Hi!

I first compiled my program with ifort and it converted some data-
files from a nasty format to unformatted files using:

OPEN(ScratchFile,FILE=TRIM(OutName),FORM='UNFORMATTED')
WRITE(ScratchFile) x%GridSpecs
WRITE(ScratchFile) x%Values
CLOSE(ScratchFile)

New runs recognize these converted files and the new files are read in
well using:

OPEN(ScratchFile,FILE=TRIM(InName),FORM='UNFORMATTED')
READ(ScratchFile) x%GridSpecs
IF (.NOT.ALLOCATED(x%Values)) ALLOCATE(x%Values(x%GridSpecs
%NGridX,x%GridSpecs%NGridY))
READ(ScratchFile) x%Values
CLOSE(ScratchFile)

Now I re-compiled with g95 on the same machine (linux box), and the
program crashes reading one of the unformatted files, telling me:

"Fortran runtime error: Reading more data than the record size (RECL)"

--> How can I write/read my unformatted files such that they are
accepted by both compilers?

Regards,

Arjan

unread,

Jan 27, 2011, 4:07:51 AM1/27/11

to

ps.
Reals are all of type "Float", defined as:
Float = SELECTED_REAL_KIND(p= 6)
INTEGERS are just "INTEGER". Maybe the size in bytes of an INTEGER
differs from compiler to compiler.
How to assure conformity? Or how to account for non-conformity?

psps. Yeah, I know that trimming a filename is unnecessary when
opening a file...

A.

Arjan

unread,

Jan 27, 2011, 4:20:04 AM1/27/11

to

pspsps.
The ifort version was compiled with -assume byterecl
For g95 I didn't know how to do the same.
Would this have made the difference?
And if so: how to solve the problem?

A.

Tobias Burnus

unread,

Jan 27, 2011, 4:44:32 AM1/27/11

to

On 01/27/2011 09:58 AM, Arjan wrote:
> I first compiled my program with ifort and it converted some data-
> files from a nasty format to unformatted files using:

...

> Now I re-compiled with g95 on the same machine (linux box), and the
> program crashes reading one of the unformatted files, telling me:
>
> "Fortran runtime error: Reading more data than the record size (RECL)"

Works here with ifort, gfortran and g95. I think the "-assume byterecl"
is also not needed as an explicit RECL= is not used. (g95 and gfortran
use byte recl by default.)

I could image that you have the following problem:

g95 is available in a 64-bits integer version, labelled on the download
site as:
"Linux x86_64/EMT64 (64 bit D.I.)"
in that case, the default integer is 64 bits (8 bytes) wide while most
compilers default to 32 bytes (4 bytes). Thus, you might be reading
4byte variables into 8byte variables which is problematic ;-)

Solution:
(a) For checking: explicitly use INTEGER(4) to see whether I was right
(b) Use the "32 bit D.I." version
(c) Explicitly specify the integer kind you want to have in some more
portable way.

Note: 32 bit D.I. vs. 64 bit D.I only affects the kind used for the
default integer. It does not influence whether you have a 32 bit or 64
bit compiler.

Side note: g95 does not seem to be maintained since half a year.

Tobias

James Van Buskirk

unread,

Jan 27, 2011, 4:53:51 AM1/27/11

to

"Arjan" <arjan.v...@rivm.nl> wrote in message
news:f5de0fc9-50ba-4242...@l18g2000yqm.googlegroups.com...

I'm not very good at I/O. My approach would be to switch to
ACCESS='STREAM'. Also the writing program might want to check that
x%Values is really allocated to (x%GridSpecs%NGridX,x%GridSpecs%NGridY),
and the reading program might want to check that if x%Values is
already allocaated that it has that shape and reallocate it if not.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Dieter Britz

unread,

Jan 27, 2011, 5:17:05 AM1/27/11

to

Arjan wrote:

> Hi!
>
> I first compiled my program with ifort and it converted some data-
> files from a nasty format to unformatted files using:
>
> OPEN(ScratchFile,FILE=TRIM(OutName),FORM='UNFORMATTED')
> WRITE(ScratchFile) x%GridSpecs
> WRITE(ScratchFile) x%Values
> CLOSE(ScratchFile)

My reaction is, why do you want to do that? I used to use unformatted in the
bad old days when we had limited disk space, but these days, ASCII rules,
there is plenty of space and you can look at your files outside a program.
--
Dieter Britz (dieterhansbritz<at>gmail.com)

Arjan

unread,

Jan 27, 2011, 5:34:29 AM1/27/11

to

> My reaction is, why do you want to do that? I used to use unformatted in the
> bad old days when we had limited disk space, but these days, ASCII rules,
> there is plenty of space and you can look at your files outside a program.

1): It is MUCH faster if you have a lot of files
2): In this way I can dump the lot with just 2 WRITE-statements
3): The dump-routine does not need modification if I add components to
x%GridSpecs or x%Values
(as long as I don't try to read old files with a new definition...)

A.

Arjan

unread,

Jan 27, 2011, 5:36:25 AM1/27/11

to

> I think the "-assume byterecl" is also not needed as an explicit RECL= is not used.

It is necessary for some other routines that write .bmp-files, where a
larger
record-length gave problems.

A.

Arjan

unread,

Jan 27, 2011, 5:38:07 AM1/27/11

to

> Side note: g95 does not seem to be maintained since half a year.

But g95 is still very strong in its feedback, which helps tracking
bugs!
@Andy: How can I/we tempt you to continue with g95??

A.

JB

unread,

Jan 27, 2011, 6:40:19 AM1/27/11

to

On 2011-01-27, Arjan <arjan.v...@rivm.nl> wrote:
> Hi!
>
> I first compiled my program with ifort and it converted some data-
> files from a nasty format to unformatted files using:
>
> OPEN(ScratchFile,FILE=TRIM(OutName),FORM='UNFORMATTED')
> WRITE(ScratchFile) x%GridSpecs
> WRITE(ScratchFile) x%Values
> CLOSE(ScratchFile)
>
> New runs recognize these converted files and the new files are read in
> well using:
>
> OPEN(ScratchFile,FILE=TRIM(InName),FORM='UNFORMATTED')
> READ(ScratchFile) x%GridSpecs
> IF (.NOT.ALLOCATED(x%Values)) ALLOCATE(x%Values(x%GridSpecs
> %NGridX,x%GridSpecs%NGridY))
> READ(ScratchFile) x%Values
> CLOSE(ScratchFile)
>
> Now I re-compiled with g95 on the same machine (linux box), and the
> program crashes reading one of the unformatted files, telling me:
>
> "Fortran runtime error: Reading more data than the record size (RECL)"

Does g95 use 4 byte record markers like ifort and gfortran 4.2+?

> --> How can I write/read my unformatted files such that they are
> accepted by both compilers?

ACCESS='stream' might help if the above issue is the culprit. More
generally, libraries like netcdf and hdf5 allow you to create
platform/programming language/compiler independent binary files.

--
JB

Richard Maine

unread,

Jan 27, 2011, 12:12:51 PM1/27/11

to

James Van Buskirk <not_...@comcast.net> wrote:

> I'm not very good at I/O. My approach would be to switch to
> ACCESS='STREAM'.

I've got lots of I/O experience. I agree with the recommendation to use
stream (that would be unformatted stream in this case; both formatted
and unformatted stream exist).

Various comments on points raised in other postings in the thread:

Sequential unformatted is not particularly good at portability across
different machines/compilers. That's not what it was designed for. It
often works, but you can't really count on it and sometimes there just
isn't a fix at all short of things like changing compilers or redoing
the code to not use sequential unformatted. There is no single "right"
answer, so you can't properly submit bug reports for compilers that do
it differently.

Stream unformatted is better. It still has some portability issues, but
they mostly have to do with portability between different hardware that
uses different internal data representations.

Byterecl has little to do with sequential unformatted. If you had to use
it before, that would most likely have been for direct access
unformatted. Yes, it makes a difference; sequential versus direct versus
stream is the central point here - not one you can neglect. Byterecl is
about the units used for the recl= specifier. If you don't have a recl=
specifier (which you would not normally have for sequential
unformatted), then it won't matter. I suppose you might have needed to
use recl= for an unformatted sequential file in some cases, but it would
have been the recl= that mattered.

To Dieter, There are *LOTS* of reasons to use unformatted. If you
haven't run into them, then I can only conclude your I/O experience is
pretty limitted.

1. We still have limitted disk space. The limits are a lot larger than
they used to be. In fact, generally, I'd say disk capacities have grown
particularly rapidly in the last decade or two. But they still are not
infinite. Yes, there *ARE* applications where disk space is important.
My applications have been only relatively large; they only use a few
terabytes (that's with unformatted data and specialized compression that
takes advantage of the particular properties of the data). They would
use about 10 times that for formatted. When those applications first
went into service, a single terabyte cost about a million dollars (if
you were satisfied with robotic semi-online access; it would have been
more to make it all "hot" disk). Costs have come down a lot, but they
are still not zero. And there are people with apps much larger than
mine.

2. We also have finite system time. As a rough rule of thumb, formatted
I/O tends to take somewhere around an order of magnitude more time than
unformatted. Yes, it can matter - sometimes a lot. When people already
complain about the time that the unformatted I/O takes, the last thing
you need it to make it a faster of 10 slower.

3. Formatted I/O involves conversion between the internal forms and the
formatted forms. If you care about the values being preserved to the
last bit, that can be a problem. Sometimes you don't care, but sometimes
you do. The standard provides no guarantee of last-bit accuracy.
Sometimes you get it, but sometimes you don't. It depends on many
things, and it isn't as trivial as you might think.

4. Just saying that "ASCII rules" sweeps a multitude of compatibility
problems under the carpet. I suppose you have never had to deal with the
issues of different record terminators for sequential formatted files on
different systems. Since the OP's probem has to do with record
structure, it seems relevant to note that formatted files also have
portability issues related to record structure.

5. Unformatted I/O is inherently simpler just because you don't have to
worry about specifying formats and all the complications that can go
with that. Note that an entire "chapter" ("clause" in standard-speak) of
the Fortran standard is about the details of format specification. It
isn't a particularly short chapter either. If you think that you can
ignore all that stuff and just use list-directed, then... well... that's
a whole extra list of issues - search past postings in this newsgroup
for a sample, as this posting of mine is already long and rambling.

6. You don't always actually get a choice in file format. If you want to
read/write some standardized file format, you have to follow its
standard. There do exist a few cases of standardized file formats that
are formatted, but they are definitely the exceptions. (FITS comes to
mind; I think it is formatted, but it's not one I have actually used, so
I could be wrong, partly wrong, or out of date).

Probably more, but this is already quite longer than justified. There
would be good argument for me harshly editing the above by cutting most
of it out... but I don't think I will.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

Tobias Burnus

unread,

Jan 27, 2011, 1:00:55 PM1/27/11

to

On 01/27/2011 10:44 AM, Tobias Burnus wrote:
> I could image that you have the following problem:
>
> g95 is available in a 64-bits integer version, labelled on the download
> site as:
> "Linux x86_64/EMT64 (64 bit D.I.)"
> in that case, the default integer is 64 bits (8 bytes) wide while most
> compilers default to 32 bytes (4 bytes).

Did the suggestion above actually help or not?

(If not, it would be helpful to have more details about the operating
system/platform (32bit, 64bit,...) and the exact compiler version
number. If you use tricks like -r8, -i8 etc. the used command line
options would also help.)

As written, in principle a simple unformatted sequential I/O should work
between g95, gfortran and ifort - with some environment
variable/compiler flag, it should even work across system with different
endianess.

Thus, unless you run into some compiler-specific issue (which might be
version dependent), I assume that the variable size is simply different.

Whether using an unformatted stream is better, depends on the
application - if you do not simply dump and read the data but follow a
specific format, using a stream is better than sequential - otherwise,
it does not really matter.

As JB mentioned, you could also use NetCDF or HDF5 to save the files in
a well-defined format.

>> > Side note: g95 does not seem to be maintained since half a year.

> But g95 is still very strong in its feedback, which helps tracking
> bugs!

For checking, I usually try the NAG compiler (unfortunately only v5.1),
though I miss also better F2003 support and it has also some bugs;
however, I find that it delivers the best compile- and run-time
diagnostic of all the 10 compilers I have access to.

Otherwise, I use ifort and gfortran as my default compilers - and only
look at Crayftn, g95 and pathf95 (and if needed be: openf95, sunf95,
pgf95 and g77) if I need to check something.

Tobias

Dave Allured

unread,

Jan 27, 2011, 2:30:51 PM1/27/11

to

Others have said that stream unformatted is the way to go in this case.
I have no problem with that. However you can probably debug your
current situation with sequential unformatted if you are interested and
willing to learn more about the details.

In my experience, several of the modern fortran compilers have features
and conventions to support compatibility for sequential unformatted. In
particular, current versions of gfortran and ifort on both Linux and Mac
OS (x86 Macs, little endian) seem to play nicely together.

Note that the actual fortran standards do NOT guarantee compatibility
for sequential unformatted between compilers and platforms, in
particular because the actual low level file format is not part of the
standards.

The common Unix-ish convention for sequential unformatted looks like
this.

* Each binary record consists of an integer byte count, the binary data,
and a second byte count which is an exact copy of the first one.
* The count is the number of bytes in the record data only, NOT
including the count integers.
* Records are concatenated sequentially until end of file.
* There is no extra padding anywhere.

The usual variations that cause portability problems are these:

* Variations in storage size of the data types, as others have
mentioned.
* The record counts are traditionally 4 byte integers, but 8 bytes are
sometimes used.
* Big endian vs. little endian. This affects the record counts as well
as the record data.

There are usually compiler options and standard language features to
give you control over all of the above. So debugging becomes a matter
of looking directly into the binary files to understand the structure
and differences, followed by learning about mode controls to fix the
problem.

Here are a couple particular suggestions to get you started:

1. Use a binary viewing tool such as "od" to see and understand your
current file structure.

2. Make a short test program to write out a dummy file that is supposed
to have exactly the same record dimensions as your original data file.
In particular, hard code x%GridSpecs%NGridX and x%GridSpecs%NGridY to
have the intended original values. Then compile and run this test on
g95, and examine the binary structure of the dummy file. This will tell
you what g95 is expecting for input, such as size of count integers and
data types, endian, and so on. HTH.

--Dave

Arjan

unread,

Jan 27, 2011, 2:55:32 PM1/27/11

to

> > I could image that you have the following problem:
>
> > g95 is available in a 64-bits integer version, labelled on the download
> > site as:
> > "Linux x86_64/EMT64 (64 bit D.I.)"
> > in that case, the default integer is 64 bits (8 bytes) wide while most
> > compilers default to 32 bytes (4 bytes).
>
> Did the suggestion above actually help or not?

Sorry, Tobias, but I had to hack around the bug by re-compilation with
ifort
simply to get results today. I hope to give your suggestion a try next
monday!
I'll let you know what comes out!

A.

Arjan

unread,

Jan 27, 2011, 3:00:58 PM1/27/11

to

> Others have said that stream unformatted is the way to go in this case.
> I have no problem with that. However you can probably debug your
> current situation with sequential unformatted if you are interested and
> willing to learn more about the details.

One of our operational systems runs only a limited set of compilers.
The ancient operating system makes it hard for me to copy
versions compiled on other systems to this particular operational box.
An ancient pgf90 that does not eat STREAM yet is the main engine...
So I am interested. I'll take your suggestions next monday, along
with
the other help offered above. Thanks!

A.

Richard Maine

unread,

Jan 27, 2011, 3:30:17 PM1/27/11

to

Arjan <arjan.v...@rivm.nl> wrote:

> An ancient pgf90 that does not eat STREAM yet is the main engine...

It might do one of the nonstandard alternatives that do the same thing,
but spell it differently. It has been too long since I used pgf90 for me
to recall that particular detail.

Thinking that the current docs might give a hint, I pretty easily found
the PGI Fortran Reference manual for their 2011 release. Somewhat
"interestingly", even though (as noted here just a day or two ago) the
marketting blurbs claim it is a "full f2003" compiler, and the manual
mentions f2003 features that haven't actually yet made it into the
compiler, it doesn't seem to acknowledge the existence of stream access.
That's for the current version. Whether this reflects a disconnect
between the manual and the product or whether it is just an omission
from the manual, I couldn't say.

Hmm. Doing a search for "stream" reveals that it is mentioned in Chapter
3 on the Fortran statements. I had first looked in Chapter 5
(Input/Output), which seems to be out of sync with Chapter 3.