EOF during Internal List-Directed Read

Ragu

unread,

May 6, 2009, 10:32:09 AM5/6/09

to

I am using a parsing (lexing) module to split the input into different
string tokens (CSV). Then I use internal read to get the real, integer
or character value of a token.

When the string token is blank or empty, the internal read fails. Is
there any way to avoid or overcome this issue?

program internal_read
implicit none
character(len = 200) :: strvalue
real :: realvalue

! This crashes with "end-of-file during read, unit -5,
! file Internal List-Directed read"
strvalue = ' '
read(strvalue, *) realvalue

end program internal_read

Paul van Delst

unread,

May 6, 2009, 10:50:02 AM5/6/09

to

Ragu wrote:
> I am using a parsing (lexing) module to split the input into different
> string tokens (CSV). Then I use internal read to get the real, integer
> or character value of a token.
>
> When the string token is blank or empty, the internal read fails. Is
> there any way to avoid or overcome this issue?
>
> program internal_read
> implicit none
> character(len = 200) :: strvalue
> real :: realvalue
>
> ! This crashes with "end-of-file during read, unit -5,
> ! file Internal List-Directed read"
> strvalue = ' '

IF ( LEN_TRIM(strvalue) > 0 ) THEN
read(strvalue, *) realvalue
ELSE
realvalue = BLANK_VALUE
END IF
>
> end program internal_read

?

For a suitable BLANK_VALUE (0.0? -999.999?)

BTW, you might also want to use the IOSTAT specifier on the internal read just in case
strvalue contains alpha characters, i.e.

strvalue = 'nothing but words'
read(strvalue, *) realvalue

gives me:

At line 11 of file blah.f90
Fortran runtime error: Bad real number in item 3 of list input

cheers,

paulv

Ragu

unread,

May 6, 2009, 11:03:49 AM5/6/09

to

Thanks Paul. I have nearly 100 or so internal reads without the proper
check using IOSTAT (iostat = +ve integer --> Error; -ve integer -->
EOF or EOR). All works well as long as the user provides meaningful
input.

From your suggestion, it might be worthwhile to do a small subroutine
or function that does the internal read and also performs all the
IOSTAT checks.

Thanks again.
Ragu

Paul van Delst

unread,

May 6, 2009, 11:36:12 AM5/6/09

to

Ragu wrote:
>
> Thanks Paul. I have nearly 100 or so internal reads without the proper
> check using IOSTAT (iostat = +ve integer --> Error; -ve integer -->
> EOF or EOR). All works well as long as the user provides meaningful
> input.

I'm going to print that out last sentence and stick it on the entrance to my cube! :o)

> From your suggestion, it might be worthwhile to do a small subroutine
> or function that does the internal read and also performs all the
> IOSTAT checks.

Maybe. It depends on how you read your variables - and the combinations could be nearly
endless. I would rather let the internal read statement handle the internal read. The one
constant thing you have to deal with is the IOSTAT result. I would first think about using
a script to do two things:
a) Insert an IOSTAT=ierr in your current 100+ internal read statements,
b) immediately after each internal read statement, insert a
CALL Check_IO_Status(ierr)

Initially the subroutine would do nothing, but eventually you could add the required
check(s) for error or EOF or EOR as you said. You don't want to pepper your code with
SELECT CASE (ierr)
CASE (:-1)
...process eof or eor
CASE (0)
...success!
CASE (1:)
...process error
END SELECT
after each of your 100+ internal read statements.

(Truth be told, if I was doing it from scratch I wuold use an error handling structure,
one component of which would be the IOSTAT result. Then I could add to the structure as
the need arises without changing the interface to the error handling routine. But, small
incremental changes are better with existing code IMO :o)

cheers,

paulv

Dave Allured

unread,

May 6, 2009, 12:20:58 PM5/6/09

to

> Thanks Paul. I have nearly 100 or so internal reads without the proper
> check using IOSTAT (iostat = +ve integer --> Error; -ve integer -->
> EOF or EOR). All works well as long as the user provides meaningful
> input.
>
> From your suggestion, it might be worthwhile to do a small subroutine
> or function that does the internal read and also performs all the
> IOSTAT checks.

Such routines proved invaluable for dozens of my applications over the
years. IIRC I have two main versions, for integers and reals.

I found that if the only check is (iostat /= 0) on internal read, then
it catches blank strings as well as all errors that the runtime system
cares about, and is fully portable as well.

However I also found that if I wanted to be fully rigorous, I needed a
character scan to guard against a few strange thinge things that some
runtimes accept. These include stray delimiters in the middle of the
input string that cause the right side of the input number to be
ignored, such as space, tab, slash, and maybe semicolon. Also included
are decimal point in integer field, and more than one decimal point in a
real number. All these things do occur in real world data handled by
humans.

Just the iostat check alone will be a big improvement over your initial
situation.

--Dave

Steve Lionel

unread,

May 6, 2009, 3:34:26 PM5/6/09

to

Ragu wrote:
> I am using a parsing (lexing) module to split the input into different
> string tokens (CSV). Then I use internal read to get the real, integer
> or character value of a token.
>
> When the string token is blank or empty, the internal read fails. Is
> there any way to avoid or overcome this issue?

My usual advice is to not use list-directed READ unless you're willing
to accept all of the vagueness/flexibility an implementation may allow.
Are repeated values (4*3.4) acceptable to you? How about commas for
omitted values? Undelimited strings?

I recommend using an explicit format rather than *. You would not have
the EOF problem then, either.

--
Steve Lionel
Developer Products Division
Intel Corporation
Nashua, NH

For email address, replace "invalid" with "com"

User communities for Intel Software Development Products
http://software.intel.com/en-us/forums/
Intel Fortran Support
http://support.intel.com/support/performancetools/fortran
My Fortran blog
http://www.intel.com/software/drfortran

Ragu

unread,

May 6, 2009, 5:09:23 PM5/6/09

to

> My usual advice is to not use list-directed READ unless you're willing
> to accept all of the vagueness/flexibility an implementation may allow.
> Are repeated values (4*3.4) acceptable to you? How about commas for
> omitted values? Undelimited strings?
>
> I recommend using an explicit format rather than *. You would not have
> the EOF problem then, either.
>
> --
> Steve Lionel

Steve,

The user is not restricted to provide input in one form (i.e. he/she
can input in F format or E format). If I say an explicit format as
"F25.0" in internal read, then will it for E format too? I don't
know. I don't have enough grey hair yet (Which means I haven't spent
enough time in fortran world to understand the possible combinations).

I used * because I thought the compiler implementations will be much
superior to an explicit format given by myself.

I suppose that there are some utilities that provide support for an
input like (4*3.4) or even sin() and cos() math operations in one
token. It is a overkill for non CS people.

Thanks.

Richard Maine

unread,

May 6, 2009, 7:06:22 PM5/6/09

to

Ragu <ssrag...@gmail.com> wrote:

> The user is not restricted to provide input in one form (i.e. he/she
> can input in F format or E format). If I say an explicit format as
> "F25.0" in internal read, then will it for E format too?

Yes. There is no difference between E and F edit descriptors (and for
that matter, D, ES, EN, and G) on input. The distinction is only on
output.

> I used * because I thought the compiler implementations will be much
> superior to an explicit format given by myself.

I'd say the opposite - that compiler implementations of list-directed
input are much more likely to surprise you in many ways. Some of those
ways are actually required by the standard. The blank treatment that you
noted is one, but there are plenty of others. If you know that the data
field should contain a single real value and nothing else, it is far
more robust to use an explicit format.

> I suppose that there are some utilities that provide support for an
> input like (4*3.4) or even sin() and cos() math operations in one
> token. It is a overkill for non CS people.

You appear to misunderstand what 4*3.4 would mean in the context of
list-directed input. It would not mean multiplication with a result of
about 13.6. Instead, it would mean 4 repetitions of the value 3.4, with
all after the first being ignored since you are reading a single value;
so it would effectively mean 3.4. Is that what one of your users that
typed this would most likely mean?

In my view, more interesting cases would include such things as

3,4

which might be typed by someone used to European decimal conventions, or
it could be simply a typo, the comma and decimal point being adjacent on
most keyboards. Regardless of why it was typed that way, list-directed
input will read it as just 3, with no error indication.

There are really quite a large number of ways that list-directed input
can give simillarly confusing results. If those correspond to
flexibility that you would actually like to have, then that's great. But
if you know that the input should be a single real value and nothing
else, then most of the flexibility of list-directed input is
counterproductive.

True, list-directed input is quick and easy to code. I could see it for
small, one-shot quickie programs. But if the program has "nearly 100 or
so internal reads", then I'd say it was out of that category. For cases
like that, I'd recommend making a short subroutine to do what you want,
as you suggested in another post. There are lots of advantages to that.
For example, my versions of that kind of thing include optional
arguments to specify allowed ranges for the value.

This is the kind of quite general library subroutine that you'd probably
want to have in your toolkit of things for all your Fortran programs
rather than redoing the same thing for each new program. And once you
have written it, it is just as quick and easy to use as list-directed
input - easier if you want the error checking.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

Louis Krupp

unread,

May 7, 2009, 5:48:07 AM5/7/09

to

Ragu wrote:
> I am using a parsing (lexing) module to split the input into different
> string tokens (CSV). Then I use internal read to get the real, integer
> or character value of a token.

<snip>

For what it's worth, splitting the input into tokens sounds more like
lexical analysis than parsing.

I've been known to confuse the two myself.

Louis

Ragu

unread,

May 7, 2009, 11:07:11 PM5/7/09

to

On May 6, 7:06 pm, nos...@see.signature (Richard Maine) wrote:

Richard,

Thanks for the detailed insight. I actually did not understand what
Steve meant i.e. (4*3.4), before you pointed it out. Thanks.

Some more info on what I do: I read a master command parameter input
file separated by commas. Based on that I generate a second input file
for transient dynamic analysis. Most of the numbers are real values
with a few integers. So an explicit format that you mentioned is
applicable.

I am delimiting the input values by commas, so the european type
decimal input will not work for what I have.

I have your freeware code in my computer and tried to look for the
utility that you had mentioned above. I looked in \generic\source\misc
\readCmd.f90. Is it the right place to start?

Thanks.
Ragu

Richard Maine

unread,

May 7, 2009, 11:38:33 PM5/7/09

to

Ragu <ssrag...@gmail.com> wrote:

> I have your freeware code in my computer and tried to look for the
> utility that you had mentioned above. I looked in \generic\source\misc
> \readCmd.f90. Is it the right place to start?

Close. I'm talking more about the string.f90 file in the same directory,
in particular the subroutines string_to_int and string_to_real. Feel
free to use or modify them as appropriate for your needs.

And it isn't directly relevant to those two routines, but if you are
using other of the routines in that file (notably upper_case,
lower_case, string_eq, or string_comp), there is a bug that I didn't fix
until Sept 2006. Don't know whether you have a version from before or
after then. Most of the code is vintage 1990-1992, but that one bug went
undetected for a long time.

I had assumed that iachar would always return a result in the range
0-127. The standard requires that... but it turns out that the
requirement has what I regard as a loophole in that it only explicitly
applies to characters in the ascii character set (or some such thing).
When you feed garbage input into it, the result doesn't have to be in
that range... which non-coincidentally happened to be the range of the
bounds of an array I indexed into with that result. I had apps that fed
potentially garbage input into this (to see if there was some specific
recognizable stuff amidst the garbage). I was going to turn in a bug
report on the compiler that failed on this until I reread the standard's
requirement carefully and noticed the loophole. Oops. The 2006 fix
explicitly limits the output of iachar to the right range before using
it as an array index.