Skipping lines in an ASCII datafile

Arjan

unread,

Jan 22, 2021, 11:22:10 AM1/22/21

to

Today, I came across a problem with reading an ASCII data file. I solved it, but would like to understand why.

The datafile is in UTF-8 format with Windows style line-ending: CR_LF.
It has lines of quantity "x" for men and for women.
Each line of information has quantity x for several years.
First, a set of lines gives the values for menfor different age groups, then 2 lines of bogus follow (a line with a lot of spaces followed by a line announcing the data for the women), and then you have the values for the women for the respective age groups.

The central part of the data is:
...
80-84 33.57 53.31 49.11 36.94
85+ 38.36 45.7 42.88 31.15

Women:
0-4 0 0.36 0.35 0
5-9 0 0 0 0
10-14 0.64 0.32 0.66 0.33
15-19 1 1.98 1.62 2.22
...

The problem occurs when I try to skip the 2 lines of bogus.
All goes well when I use
READ(ScratchFile,*)
or
READ(ScratchFile,'(A)') ALine
where ALine is a character array:
CHARACTER(DefaultLength) :: ALine

but when I use
READ(ScratchFile,*) ALine
then the first read statement results in ALine containing "Women:". Now the second read statement that was meant to skip the header of the second part of my data starts to eat from the first meaningful part of data for women.

Is it "a feature" that if you read a character string with format "*" then reading continues across multiple lines until it finds more than just spaces, CR or LF?

Thomas Koenig

unread,

Jan 22, 2021, 11:55:20 AM1/22/21

to

Arjan <arjan.v...@rivm.nl> schrieb:

> Is it "a feature" that if you read a character string with format
> "*" then reading continues across multiple lines until it finds
> more than just spaces, CR or LF?

In short: Yes.

Somewhat longer: If you want to be pedantic, it is actually the
end of a record (which is usually defined by a combination of CR
and LF on modern systems). List-directed input (with the format
as *) is more geared towards interactive use, where you usually
don't want to read an empty line if the user just hits Enter.

When doing I/O from files, it is usually better to use formatted
I/O, or to read on whole lines into character variables and then
do I/O them; you can then use list-directed input more safely.

C's scanf() shares this "feature", by the way.

Robin Vowels

unread,

Jan 22, 2021, 12:00:19 PM1/22/21

to

.
Reading continues until the data list is satisfied.
What's wrong with READ(ScratchFile,*) which is safest?

John

unread,

Jan 22, 2021, 12:16:03 PM1/22/21

to

The key issue is an end-of-record is equivalent to a space unless encountered in a character constant in list-directed input. Since you are reading using unquoted strings basically the answer is "Yes". There are several quirks to reading character variables when using list-directed input and it is best avoided in favor of using formatted input. Note the same would be true in this case when reading numbers. Read an integer value from a file where the value is preceeded by five blank lines and the blank lines will basically be treated as a few spaces and the value will be read from the sixth line, for example.

John

unread,

Jan 22, 2021, 12:29:16 PM1/22/21

to

Using

character(len=40) :: a
read(*,*)a
write(*,*)'STRING=['//a//']'
end

Here are a few other inputs that might have surprising results:

$ ./a.out
4*A
STRING=[A ]
$ ./a.out
"a
b
c
d
e
fghijklmnopqrstuvwxyz"
STRING=[abcdefghijklmnopqrstuvwxyz ]

Arjan

unread,

Jan 22, 2021, 1:33:50 PM1/22/21

to

> Reading continues until the data list is satisfied.
> What's wrong with READ(ScratchFile,*) which is safest?

Nothing, you are right, the construction in my code was probably the result of a copy/paste action or an earlier data format and certainly not my first choice. My problem is that to my eyes it did not look wrong, which it is.

Arjan

unread,

Jan 22, 2021, 1:36:38 PM1/22/21

to

Thanks you all!