Read question?

john.chl...@gmail.com

unread,

Nov 3, 2012, 2:26:00 AM11/3/12

to

I have a file I'm parsing that contains lines such as:

ind_var_index 3 # independent variable for table lookups ()
num_eng 7 #maximum number of engines allowable
eng_ena 0 0 1 1 1 1 0 0 0 0 #engine enable flags[]

I would (ideally) like to read the id in the line (first string) then compare it against a search id before deciding whether to read the rest of the line. But if I try to use list directed I/O I can't use: ADVANCE='NO'. So I ended up with the following solution:

read( unit=fp, fmt="(a)", iostat=stat2 ) line
read( unit=line, fmt=*, iostat=stat2 ) word, (var(i), i = 1, array_length)

The question I have is what if, after the id, there is junk, what does:
(var(i), i = 1, array_length)
evaluate to?

I checked the iostat with junk and it was 0 - no help.

---John

Robin Vowels

unread,

Nov 3, 2012, 6:01:48 AM11/3/12

to

Apparently everything from and including # is comment?
If so, then remove the comment.
Then count the blank fields between the various items.
(It's 1 less than the number of blank fields.)
Finally do your internal read of line, as above.

Arjen Markus

unread,

Nov 3, 2012, 9:42:55 AM11/3/12

to

In addition to Robin's answer:

if there are enough data items, the list-directed READ is satisfied. Anything
else in the string is simply ignored.

Therefore: if you remove the comment (including the comment character), you
can be sure that the READ will fail with an error code if there are too few
data. If there are more data, these are silently ignored.

Regards,

Arjen

jski

unread,

Nov 3, 2012, 10:02:12 AM11/3/12

to

I converted the '#' to char(0). Didn't really effect anything
though. It works either way - I get to the 3rd line, it recognizes
the id and reads 7 of the following 0's and 1's.

When I check the values of var(:) for the case of junk after the id, I
get junk values - not surprisingly. I thought there would be some
indication that there was nothing meaningful to read but evidently
not?

Interestingly, the second line doesn't even have 7 items to try to
convert to REALs?

I also tried:
read( unit=fp, fmt=*, iostat=stat ) word, (w(i), i = 1,
array_length)
where:
character(30) :: w(array_length)
to check out its behavior.

Of course, this simplely ploughs right thru newlines and eats up
whatever it needs to complete: (w(i), i = 1, array_length), including
the next id on the next line.

---John

jski

unread,

Nov 3, 2012, 10:05:34 AM11/3/12

to

It appears to have no ill effects if there are less than array_length
(=7) items in the line? No problems? Am I getting away with
something I shouldn't be doing to begin with?

---John

dpb

unread,

Nov 3, 2012, 10:09:32 AM11/3/12

to

What is value of array_length?

If it is the size of the dimension of array var (in which case you might
as well write read(...) word, var as the implied do does nothing useful)
then I would certainly expect an i/o error if there aren't sufficient
entries on the record and there is a non-numeric field.

But, back to your case, there really should be a format conversion error
and if the compiler doesn't complain I'd think that would be a bug if
not documented as a feature/extension, anyway (and I think a detailed
reading would show it to be nonconforming though I didn't go searching
for a citation).

Anyway, the "real" answer here is to parse the line to remove the
trailing comment before scanning.

PS. CVF will return the values but I don't believe it is guaranteed
behavior; I think it is "processor dependent" as to what happens to the
i/o list when the error occurs.

program test
implicit none
character(len=80) :: s
character(len=20) :: v
integer :: idx(2)
integer :: ios

s='ind_var_index 3 # independent variable'
idx=-1

read(s,*,iostat=ios) v,idx

write(*,*) v,idx
write(*,*) 'ios ', ios
end program test

C:\Temp> df /nologo jski.f90
jski.f90

C:\Temp> jski
ind_var_index 3 -1
ios 59

C:\Temp>

--

jski

unread,

Nov 3, 2012, 10:31:53 AM11/3/12

to

On Nov 3, 10:09 am, dpb <n...@non.net> wrote:

Doesn't matter if there are less than array_length (=7) items in the
line. iostat=5010 in the case of a bad read BUT it continues along.
No ill effects. I would say this is fine but is there a better way?

BTW, I'm using gfortran.

---John

dpb

unread,

Nov 3, 2012, 10:46:40 AM11/3/12

to

On 11/3/2012 9:31 AM, jski wrote:
> On Nov 3, 10:09 am, dpb<n...@non.net> wrote:

...

>> But, back to your case, there really should be a format conversion error
>> and if the compiler doesn't complain I'd think that would be a bug if
>> not documented as a feature/extension, anyway (and I think a detailed
>> reading would show it to be nonconforming though I didn't go searching
>> for a citation).
>>
>> Anyway, the "real" answer here is to parse the line to remove the
>> trailing comment before scanning.
>>

...

> Doesn't matter if there are less than array_length (=7) items in the
> line. iostat=5010 in the case of a bad read BUT it continues along.
> No ill effects. I would say this is fine but is there a better way?
>
> BTW, I'm using gfortran.

I don't understand what you mean above at all, precisely. :)

Returning an iostat will prevent the i/o rtl from aborting if that is
what you mean by "it continues along. No ill effects."

OTOH, as noted above, I do _not_ believe it is guaranteed Standard
behavior that it is required for the i/o list items to be returned
correctly if there is a conversion error on the READ.

As for the better way, as noted earlier, only parse the record to the
comment character. Sotoo, (caution, aircode)

read(s(1:max(index(s,'#')-1,len_trim(s)),*,iostat=ios) vnam, values

--

jski

unread,

Nov 3, 2012, 10:49:41 AM11/3/12

to

If I use:

read( unit=line, fmt=*, iostat=stat2 ) word, w

where:

character(30) :: w(array_length)

and there are lees that array_length (=7) items, I get iostat=-1.

Gordon Sande

unread,

Nov 3, 2012, 10:55:02 AM11/3/12

to

On 2012-11-03 11:02:12 -0300, jski said:

> On Nov 3, 6:01�am, Robin Vowels <robin.vow...@gmail.com> wrote:
>> On Nov 3, 5:26�pm, john.chludzin...@gmail.com wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> I have a file I'm parsing that contains lines such as:
>>
>>> � �ind_var_index �3 � � # independent variable for table lookups ()
>>> � �num_eng 7 � �#maximum number of engines allowable
>>> � �eng_ena 0 0 1 1 1 1 0 0 0 0 � � �#engine enable flags[]
>>
>>> I would (ideally) like to read the id in the line (first string) then
>>> compare it against a search id before deciding whether to read the rest
>>> of the line. But if I try to use list directed I/O I can't use:
>>> ADVANCE='NO'. �So I ended up with the following solution:
>>
>>> � read( unit=fp, fmt="(a)", iostat=stat2 ) line
>>> � read( unit=line, fmt=*, iostat=stat2 ) �word, (var(i), i = 1, array_length)
>>
>>> The question I have is what if, after the id, there is junk, what does:
>>> � �(var(i), i = 1, array_length)
>>> evaluate to?
>>
>>> I checked the iostat with junk and it was 0 - no help.
>>
>> Apparently everything from and including # is comment?
>> If so, then remove the comment.
>> Then count the blank fields between the various items.
>> (It's 1 less than the number of blank fields.)
>> Finally do your internal read of line, as above.
>
> I converted the '#' to char(0).

You complain that folks think you are resitant to using your intelligence
to finally notice that Fortran is not not C and then you go out of your way
to prove it it in spades. Any observer would think you are either a troll
or an idiot. Or is it the old story that you have lots and lots of time
to do it repeatedly wrong but not enough time to do it correctly at an
early stage. That biases the vote towards idiot.

The null charaacter is a Cism as a sentinal to follow the last meaningful
element in a C array of characters. Only half or the C string library uses
the convention as the other half uses explicit lengths. So it is at best
a rather weak convention even in C. Fortran strings are arrays of characters
that do not use a sentinal convention. Many string related things are easier
in Fortran but there is a problem with knowing how many trailing blanks are in
a string that has first been read and then blank padded as the trailing read
blanks and the padded blanks are very hard to tell apart.

When you were told to remove the comment they meant to overwrite it with
blanks. The null character has no special meaning in Fortran and is not even
listed in the Fortran character set with the possible exception of in a
character
context.

jski

unread,

Nov 3, 2012, 11:03:29 AM11/3/12

to

But, again, there is no guarantee that:
s(1:max(index(s,'#')-1,len_trim(s)) contains SIZE(values) number of
items. So there may be an I/O error.

---John

jski

unread,

Nov 3, 2012, 11:08:43 AM11/3/12

to

Gordon, thanks for the small minded, buffoonish remarks. They add a
lot to the the dialog.

Gordon Sande

unread,

Nov 3, 2012, 12:38:05 PM11/3/12

to

I must admire the intelligence, skill and diligence with which you go
about emulating
a complete fool. Keep it up. You will be able to waste both your time
and the time of
those who bother to try to help you inspite of your studious ignoring
of the advice.

By the way, did you do anything about the null characters or are you
just insisting
the Fortran must follow your notion of a Cism? You forgot to address
the content of
you inability to get your conversion to work.

dpb

unread,

Nov 3, 2012, 12:51:08 PM11/3/12

to

On 11/3/2012 10:03 AM, jski wrote:
...

>> As for the better way, as noted earlier, only parse the record to the
>> comment character. Sotoo, (caution, aircode)
>>
>> read(s(1:max(index(s,'#')-1,len_trim(s)),*,iostat=ios) vnam, values

...

>
> But, again, there is no guarantee that:
> s(1:max(index(s,'#')-1,len_trim(s)) contains SIZE(values) number of
> items. So there may be an I/O error.

The above of course was intended to be min() instead of max() except it
isn't robust to there not being a comment...so be sure to protect
against index() returning zero...

Which is what I was telling you...

But, there is a way...replace the '#' comment indicator w/ the '/'
record terminator (or if there isn't a comment add a trailing slash.
This has the effect of terminating the input record for list-directed
input and of supplying null values for any remaining values in the i/o
list. See p. 10.10.3 in the Standard I've given you links to previously
(or look up the section on list-directed input in the reference texts,
I'm sure it has the information in more nearly plain English).

Or, as Robin says, parse the line and count the input fields and adjust
the i/o list count accordingly...

--

john.chl...@gmail.com

unread,

Nov 3, 2012, 1:15:44 PM11/3/12

to

Thanks, that works well.

glen herrmannsfeldt

unread,

Nov 3, 2012, 3:32:42 PM11/3/12

to

Gordon Sande <Gordon...@gmail.com> wrote:

(big snip)

> The null charaacter is a Cism as a sentinal to follow the last meaningful
> element in a C array of characters. Only half or the C string library uses
> the convention as the other half uses explicit lengths. So it is at best
> a rather weak convention even in C. Fortran strings are arrays of characters
> that do not use a sentinal convention. Many string related things are easier
> in Fortran but there is a problem with knowing how many trailing blanks are in
> a string that has first been read and then blank padded as the trailing read
> blanks and the padded blanks are very hard to tell apart.

This is true, but doesn't mean that you won't have problems using
strings that have CHAR(0) in them.

For one, some of the underlying library routines in many Fortran
libraries are written in C, often using the C I/O routines along
the way. If you use CHAR(0) in I/O operations, you might be
surprised. Other than that, though, I would not expect problems,
but you never know.

"9.2.2 Formatted record
A formatted record consists of a sequence of characters that are
representable in the processor; however, a processor may prohibit
some control characters (3.1.1) from appearing in a formatted
record. The length of a formatted record is measured in characters
and depends primarily on the number of characters put into the
record when it is written. However, it may depend on the processor
and the external medium. The length may be zero. Formatted records
shall be read or written only by formatted input/output statements."

I believe this also applies to internal I/O, though I am not so
sure about that.

-- glen

Dick Hendrickson

unread,

Nov 3, 2012, 5:37:43 PM11/3/12

to

I'm sure you can be sure (although I haven't looked). One of the
beauties of Fortran is that it really tries to be consistent. A
formatted record is a formatted record; regardless of where it came from
or how it is used. Internal I/O is the same as external I/O unless
there is some obvious reason for a restriction.

Dick Hendrickson

> -- glen

elzbieta...@gmail.com

unread,

Nov 4, 2012, 8:05:52 PM11/4/12

to

Hopefully, no one will call me "idiot" for asking but what defines "end of record" in fortran? I too use gfortran.

<Ella>

glen herrmannsfeldt

unread,

Nov 4, 2012, 9:00:54 PM11/4/12

to

elzbieta...@gmail.com wrote:

(snip, I wrote)

>> > This is true, but doesn't mean that you won't have problems using
>> > strings that have CHAR(0) in them.

(snip, quote from Fortran 2008)

>> > "9.2.2 Formatted record
>> > A formatted record consists of a sequence of characters that are
>> > representable in the processor; however, a processor may prohibit
>> > some control characters (3.1.1) from appearing in a formatted
>> > record. The length of a formatted record is measured in characters
>> > and depends primarily on the number of characters put into the
>> > record when it is written. However, it may depend on the processor
>> > and the external medium. The length may be zero. Formatted records
>> > shall be read or written only by formatted input/output statements."

>> > I believe this also applies to internal I/O, though I am not so
>> > sure about that.

(snip)

> Hopefully, no one will call me "idiot" for asking but what
> defines "end of record" in fortran? I too use gfortran.

For external files, it pretty much has to be whatever the host
system uses. For unix and unix-like system, that is CHAR(10),
(X'0A', the ASCII linefeed character). For MS-DOS and Windows
systems, CRLF, CHAR(13) followed by CHAR(10), with it left
undefined what happens if you have one an not the other.

I believe that there is still a gcc, with gfortran included,
that runs on MVS descendants, using either IBM fixed length
records or VB (variable length, blocked, with a record length
in the record and block headers.)

For internal I/O, it should be the length of the CHARACTER
variable (or array element).

-- glen

Dick Hendrickson

unread,

Nov 4, 2012, 9:03:04 PM11/4/12

to

There isn't a specific definition of "end of record"; that is, there
isn't specified character. To understand historic Fortran I/O you
mostly have to think about card readers and line printers. When Fortran
reads a "record" it reads one card-the entire card goes through the card
reader. When it outputs a record, it prints one line and the printer
moves the paper forward one notch. Of course, things are different now.
Punch cards and line printers don't exist. But the idea is the same.
Historically, a Fortran output statement would produce at least one
record; you could get more by using a FORMAT statement with either long
lists or special format things. Similarly, a read statement would read
one (or more) input records.

So, the simple answer is that an output statement produces one (or more)
records; each separated from the other by a processor dependent
end-of-record thing and a read statement reads one (or more) records; it
reads the entire record, up to the end-of-record thing. The standard
doesn't specify what the end-of-record thing is; but, Fortran exists in
a computing system and usually uses whatever the other languages use.

It's a little harder to understand with the later versions of Fortran.
Non-advancing I/O allows you to read or write part of a record (sort of
like reading the first half of a card). Stream I/O allows you to use the
C model of I/O; I/O just produces a stream of characters and the OS
doesn't impose a particular structure on the characters. Although other
programs might require or expect some sort of structure. If you send a
stream output file to a screen something has to happen when the cursor
comes to the right hand edge of the screen.

Dick Hendrickson

Gordon Sande

unread,

Nov 4, 2012, 9:04:36 PM11/4/12

to

The notion of "end of record" is outside Fortran. It depends on
whatever the file
system used by Fortran uses. Sometimes it is a character embedded
within the stream of
characters. For Unix that is usually <LF>, for MacOs it is <CR>, for
Windows it is
the pair <CR><LF> with sometimes the <LF> becoming a <FF>. ASCII would
suggest using a <RS>
and some systems have taken the suggestion. When cards were common
"all" records were
80 characters and the end was defined by the count (and it was possable
for 80 to be other
lengths) in blocked files on IBM's OS. Others used a count stored
external to the stream.
So the short answer is "it depends". gfortran is available on several
systems so it has
several answers. Since you did not specify a system the details are
left as an exercise
for the reader.

Elzbieta Burnett

unread,

Nov 4, 2012, 9:11:53 PM11/4/12

to

On Nov 4, 9:00 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

If I do: read(unit=file, fmt="(a)", iostat=ios) string
then system looks for char(10) in Linux? So system reads up to
char(10) for string?

<Ella>

glen herrmannsfeldt

unread,

Nov 4, 2012, 11:20:08 PM11/4/12

to

(snip, someone wrote)

>> Hopefully, no one will call me "idiot" for asking but what
>> defines "end of record" in fortran? I too use gfortran.

then Dick Hendrickson <dick.hen...@att.net> wrote:
> There isn't a specific definition of "end of record"; that is, there
> isn't specified character. To understand historic Fortran I/O you
> mostly have to think about card readers and line printers. When Fortran
> reads a "record" it reads one card-the entire card goes through the card
> reader. When it outputs a record, it prints one line and the printer
> moves the paper forward one notch. Of course, things are different now.
> Punch cards and line printers don't exist. But the idea is the same.
> Historically, a Fortran output statement would produce at least one
> record; you could get more by using a FORMAT statement with either long
> lists or special format things. Similarly, a read statement would read
> one (or more) input records.

It is interesting. The IBM 704, where Fortran started, read each row
of holes into two 36 bit words. Software had to rearrange that to
72 characters. The last 8 card columns weren't read at all.

> So, the simple answer is that an output statement produces one (or more)
> records; each separated from the other by a processor dependent
> end-of-record thing and a read statement reads one (or more) records; it
> reads the entire record, up to the end-of-record thing. The standard
> doesn't specify what the end-of-record thing is; but, Fortran exists in
> a computing system and usually uses whatever the other languages use.

> It's a little harder to understand with the later versions of Fortran.
> Non-advancing I/O allows you to read or write part of a record (sort of
> like reading the first half of a card). Stream I/O allows you to use the
> C model of I/O; I/O just produces a stream of characters and the OS
> doesn't impose a particular structure on the characters. Although other
> programs might require or expect some sort of structure. If you send a
> stream output file to a screen something has to happen when the cursor
> comes to the right hand edge of the screen.

I started learning PL/I, not so long after Fortran. Even though designed
around IBM's record oriented file system, PL/I uses STREAM I/O for
what Fortran calls FORMATTED. It is still record buffered, but the
program can write parts of records at a time.

I don't know so well the systems where C originated, but at least in
the ANSI standard, and I believe even in K&R the idea that the system
will buffer the I/O is there.

The usual implementation of getc() and putc() uses C macros to load
and store from/to a buffer, only doing an actual function call when
the buffer is empty/full.

The early Fortran systems might have directly used the I/O buffer
that the system used for actual I/O operations.

Many systems now put another level of buffering between the program
and the I/O devices. Unix uses that as a disk cache, to speed up
I/O by prefetching and delaying writing full buffers.

So, or a long time now, there is not necessarily a connection
between records as seen by the program and those actually seen
by the I/O device.

-- glen

glen herrmannsfeldt

unread,

Nov 4, 2012, 11:28:51 PM11/4/12

to

Elzbieta Burnett <elzbieta...@gmail.com> wrote:

(snip)

> If I do: read(unit=file, fmt="(a)", iostat=ios) string
> then system looks for char(10) in Linux? So system reads up to
> char(10) for string?

It reads up to char(10) or until the end of string. If it hasn't
filled string, then it fills the rest with blanks.

If string filled before the char(10), then the rest of the
characters up to the char(10) are ignored.

-- glen

Robin Vowels

unread,

Nov 5, 2012, 7:49:28 AM11/5/12

to

On Nov 5, 1:11 pm, Elzbieta Burnett <elzbieta.burn...@gmail.com>
wrote:

> If I do: read(unit=file, fmt="(a)", iostat=ios) string
> then system looks for char(10) in Linux? So system reads up to
> char(10) for string?

System reads the entire line, as far as CR or whatever the
system recognizes as end of line.

STRING should be at least as long as the line if you want all
the characters on it to be stored in STRING.