I am using a parsing (lexing) module to split the input into different string tokens (CSV). Then I use internal read to get the real, integer or character value of a token.
When the string token is blank or empty, the internal read fails. Is there any way to avoid or overcome this issue?
program internal_read implicit none character(len = 200) :: strvalue real :: realvalue
! This crashes with "end-of-file during read, unit -5, ! file Internal List-Directed read" strvalue = ' ' read(strvalue, *) realvalue
Ragu wrote: > I am using a parsing (lexing) module to split the input into different > string tokens (CSV). Then I use internal read to get the real, integer > or character value of a token.
> When the string token is blank or empty, the internal read fails. Is > there any way to avoid or overcome this issue?
> program internal_read > implicit none > character(len = 200) :: strvalue > real :: realvalue
> ! This crashes with "end-of-file during read, unit -5, > ! file Internal List-Directed read" > strvalue = ' '
IF ( LEN_TRIM(strvalue) > 0 ) THEN read(strvalue, *) realvalue ELSE realvalue = BLANK_VALUE END IF
> end program internal_read
?
For a suitable BLANK_VALUE (0.0? -999.999?)
BTW, you might also want to use the IOSTAT specifier on the internal read just in case strvalue contains alpha characters, i.e.
strvalue = 'nothing but words' read(strvalue, *) realvalue
gives me:
At line 11 of file blah.f90 Fortran runtime error: Bad real number in item 3 of list input
> Ragu wrote: > > I am using a parsing (lexing) module to split the input into different > > string tokens (CSV). Then I use internal read to get the real, integer > > or character value of a token.
> > When the string token is blank or empty, the internal read fails. Is > > there any way to avoid or overcome this issue?
> > program internal_read > > implicit none > > character(len = 200) :: strvalue > > real :: realvalue
> > ! This crashes with "end-of-file during read, unit -5, > > ! file Internal List-Directed read" > > strvalue = ' '
> IF ( LEN_TRIM(strvalue) > 0 ) THEN > read(strvalue, *) realvalue > ELSE > realvalue = BLANK_VALUE > END IF
> > end program internal_read
> ?
> For a suitable BLANK_VALUE (0.0? -999.999?)
> BTW, you might also want to use the IOSTAT specifier on the internal read just in case > strvalue contains alpha characters, i.e.
> strvalue = 'nothing but words' > read(strvalue, *) realvalue
> gives me:
> At line 11 of file blah.f90 > Fortran runtime error: Bad real number in item 3 of list input
> cheers,
> paulv
Thanks Paul. I have nearly 100 or so internal reads without the proper check using IOSTAT (iostat = +ve integer --> Error; -ve integer --> EOF or EOR). All works well as long as the user provides meaningful input.
From your suggestion, it might be worthwhile to do a small subroutine or function that does the internal read and also performs all the IOSTAT checks.
> Thanks Paul. I have nearly 100 or so internal reads without the proper > check using IOSTAT (iostat = +ve integer --> Error; -ve integer --> > EOF or EOR). All works well as long as the user provides meaningful > input.
I'm going to print that out last sentence and stick it on the entrance to my cube! :o)
> From your suggestion, it might be worthwhile to do a small subroutine > or function that does the internal read and also performs all the > IOSTAT checks.
Maybe. It depends on how you read your variables - and the combinations could be nearly endless. I would rather let the internal read statement handle the internal read. The one constant thing you have to deal with is the IOSTAT result. I would first think about using a script to do two things: a) Insert an IOSTAT=ierr in your current 100+ internal read statements, b) immediately after each internal read statement, insert a CALL Check_IO_Status(ierr)
Initially the subroutine would do nothing, but eventually you could add the required check(s) for error or EOF or EOR as you said. You don't want to pepper your code with SELECT CASE (ierr) CASE (:-1) ...process eof or eor CASE (0) ...success! CASE (1:) ...process error END SELECT after each of your 100+ internal read statements.
(Truth be told, if I was doing it from scratch I wuold use an error handling structure, one component of which would be the IOSTAT result. Then I could add to the structure as the need arises without changing the interface to the error handling routine. But, small incremental changes are better with existing code IMO :o)
> On May 6, 10:50 am, Paul van Delst <paul.vande...@noaa.gov> wrote: > > Ragu wrote: > > > I am using a parsing (lexing) module to split the input into different > > > string tokens (CSV). Then I use internal read to get the real, integer > > > or character value of a token.
> > > When the string token is blank or empty, the internal read fails. Is > > > there any way to avoid or overcome this issue?
> > At line 11 of file blah.f90 > > Fortran runtime error: Bad real number in item 3 of list input
> Thanks Paul. I have nearly 100 or so internal reads without the proper > check using IOSTAT (iostat = +ve integer --> Error; -ve integer --> > EOF or EOR). All works well as long as the user provides meaningful > input.
> From your suggestion, it might be worthwhile to do a small subroutine > or function that does the internal read and also performs all the > IOSTAT checks.
Such routines proved invaluable for dozens of my applications over the years. IIRC I have two main versions, for integers and reals.
I found that if the only check is (iostat /= 0) on internal read, then it catches blank strings as well as all errors that the runtime system cares about, and is fully portable as well.
However I also found that if I wanted to be fully rigorous, I needed a character scan to guard against a few strange thinge things that some runtimes accept. These include stray delimiters in the middle of the input string that cause the right side of the input number to be ignored, such as space, tab, slash, and maybe semicolon. Also included are decimal point in integer field, and more than one decimal point in a real number. All these things do occur in real world data handled by humans.
Just the iostat check alone will be a big improvement over your initial situation.
Ragu wrote: > I am using a parsing (lexing) module to split the input into different > string tokens (CSV). Then I use internal read to get the real, integer > or character value of a token.
> When the string token is blank or empty, the internal read fails. Is > there any way to avoid or overcome this issue?
My usual advice is to not use list-directed READ unless you're willing to accept all of the vagueness/flexibility an implementation may allow. Are repeated values (4*3.4) acceptable to you? How about commas for omitted values? Undelimited strings?
I recommend using an explicit format rather than *. You would not have the EOF problem then, either.
-- Steve Lionel Developer Products Division Intel Corporation Nashua, NH
> My usual advice is to not use list-directed READ unless you're willing > to accept all of the vagueness/flexibility an implementation may allow. > Are repeated values (4*3.4) acceptable to you? How about commas for > omitted values? Undelimited strings?
> I recommend using an explicit format rather than *. You would not have > the EOF problem then, either.
> -- > Steve Lionel
Steve,
The user is not restricted to provide input in one form (i.e. he/she can input in F format or E format). If I say an explicit format as "F25.0" in internal read, then will it for E format too? I don't know. I don't have enough grey hair yet (Which means I haven't spent enough time in fortran world to understand the possible combinations).
I used * because I thought the compiler implementations will be much superior to an explicit format given by myself.
I suppose that there are some utilities that provide support for an input like (4*3.4) or even sin() and cos() math operations in one token. It is a overkill for non CS people.
Ragu <ssragun...@gmail.com> wrote: > The user is not restricted to provide input in one form (i.e. he/she > can input in F format or E format). If I say an explicit format as > "F25.0" in internal read, then will it for E format too?
Yes. There is no difference between E and F edit descriptors (and for that matter, D, ES, EN, and G) on input. The distinction is only on output.
> I used * because I thought the compiler implementations will be much > superior to an explicit format given by myself.
I'd say the opposite - that compiler implementations of list-directed input are much more likely to surprise you in many ways. Some of those ways are actually required by the standard. The blank treatment that you noted is one, but there are plenty of others. If you know that the data field should contain a single real value and nothing else, it is far more robust to use an explicit format.
> I suppose that there are some utilities that provide support for an > input like (4*3.4) or even sin() and cos() math operations in one > token. It is a overkill for non CS people.
You appear to misunderstand what 4*3.4 would mean in the context of list-directed input. It would not mean multiplication with a result of about 13.6. Instead, it would mean 4 repetitions of the value 3.4, with all after the first being ignored since you are reading a single value; so it would effectively mean 3.4. Is that what one of your users that typed this would most likely mean?
In my view, more interesting cases would include such things as
3,4
which might be typed by someone used to European decimal conventions, or it could be simply a typo, the comma and decimal point being adjacent on most keyboards. Regardless of why it was typed that way, list-directed input will read it as just 3, with no error indication.
There are really quite a large number of ways that list-directed input can give simillarly confusing results. If those correspond to flexibility that you would actually like to have, then that's great. But if you know that the input should be a single real value and nothing else, then most of the flexibility of list-directed input is counterproductive.
True, list-directed input is quick and easy to code. I could see it for small, one-shot quickie programs. But if the program has "nearly 100 or so internal reads", then I'd say it was out of that category. For cases like that, I'd recommend making a short subroutine to do what you want, as you suggested in another post. There are lots of advantages to that. For example, my versions of that kind of thing include optional arguments to specify allowed ranges for the value.
This is the kind of quite general library subroutine that you'd probably want to have in your toolkit of things for all your Fortran programs rather than redoing the same thing for each new program. And once you have written it, it is just as quick and easy to use as list-directed input - easier if you want the error checking.
-- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
Ragu wrote: > I am using a parsing (lexing) module to split the input into different > string tokens (CSV). Then I use internal read to get the real, integer > or character value of a token.
<snip>
For what it's worth, splitting the input into tokens sounds more like lexical analysis than parsing.
> Ragu <ssragun...@gmail.com> wrote: > > The user is not restricted to provide input in one form (i.e. he/she > > can input in F format or E format). If I say an explicit format as > > "F25.0" in internal read, then will it for E format too?
> Yes. There is no difference between E and F edit descriptors (and for > that matter, D, ES, EN, and G) on input. The distinction is only on > output.
> > I used * because I thought the compiler implementations will be much > > superior to an explicit format given by myself.
> I'd say the opposite - that compiler implementations of list-directed > input are much more likely to surprise you in many ways. Some of those > ways are actually required by the standard. The blank treatment that you > noted is one, but there are plenty of others. If you know that the data > field should contain a single real value and nothing else, it is far > more robust to use an explicit format.
> > I suppose that there are some utilities that provide support for an > > input like (4*3.4) or even sin() and cos() math operations in one > > token. It is a overkill for non CS people.
> You appear to misunderstand what 4*3.4 would mean in the context of > list-directed input. It would not mean multiplication with a result of > about 13.6. Instead, it would mean 4 repetitions of the value 3.4, with > all after the first being ignored since you are reading a single value; > so it would effectively mean 3.4. Is that what one of your users that > typed this would most likely mean?
> In my view, more interesting cases would include such things as
> 3,4
> which might be typed by someone used to European decimal conventions, or > it could be simply a typo, the comma and decimal point being adjacent on > most keyboards. Regardless of why it was typed that way, list-directed > input will read it as just 3, with no error indication.
> There are really quite a large number of ways that list-directed input > can give simillarly confusing results. If those correspond to > flexibility that you would actually like to have, then that's great. But > if you know that the input should be a single real value and nothing > else, then most of the flexibility of list-directed input is > counterproductive.
> True, list-directed input is quick and easy to code. I could see it for > small, one-shot quickie programs. But if the program has "nearly 100 or > so internal reads", then I'd say it was out of that category. For cases > like that, I'd recommend making a short subroutine to do what you want, > as you suggested in another post. There are lots of advantages to that. > For example, my versions of that kind of thing include optional > arguments to specify allowed ranges for the value.
> This is the kind of quite general library subroutine that you'd probably > want to have in your toolkit of things for all your Fortran programs > rather than redoing the same thing for each new program. And once you > have written it, it is just as quick and easy to use as list-directed > input - easier if you want the error checking.
> -- > Richard Maine | Good judgment comes from experience; > email: last name at domain . net | experience comes from bad judgment. > domain: summertriangle | -- Mark Twain
Richard,
Thanks for the detailed insight. I actually did not understand what Steve meant i.e. (4*3.4), before you pointed it out. Thanks.
Some more info on what I do: I read a master command parameter input file separated by commas. Based on that I generate a second input file for transient dynamic analysis. Most of the numbers are real values with a few integers. So an explicit format that you mentioned is applicable.
I am delimiting the input values by commas, so the european type decimal input will not work for what I have.
I have your freeware code in my computer and tried to look for the utility that you had mentioned above. I looked in \generic\source\misc \readCmd.f90. Is it the right place to start?
Ragu <ssragun...@gmail.com> wrote: > I have your freeware code in my computer and tried to look for the > utility that you had mentioned above. I looked in \generic\source\misc > \readCmd.f90. Is it the right place to start?
Close. I'm talking more about the string.f90 file in the same directory, in particular the subroutines string_to_int and string_to_real. Feel free to use or modify them as appropriate for your needs.
And it isn't directly relevant to those two routines, but if you are using other of the routines in that file (notably upper_case, lower_case, string_eq, or string_comp), there is a bug that I didn't fix until Sept 2006. Don't know whether you have a version from before or after then. Most of the code is vintage 1990-1992, but that one bug went undetected for a long time.
I had assumed that iachar would always return a result in the range 0-127. The standard requires that... but it turns out that the requirement has what I regard as a loophole in that it only explicitly applies to characters in the ascii character set (or some such thing). When you feed garbage input into it, the result doesn't have to be in that range... which non-coincidentally happened to be the range of the bounds of an array I indexed into with that result. I had apps that fed potentially garbage input into this (to see if there was some specific recognizable stuff amidst the garbage). I was going to turn in a bug report on the compiler that failed on this until I reread the standard's requirement carefully and noticed the loophole. Oops. The 2006 fix explicitly limits the output of iachar to the right range before using it as an array index.
-- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain