I found C99 to be less than clear about the expected behaviour in
parsing numbers. What surprised me even more was the wildy varying
results I got from several libraries regarding this topic.
What would be the *expected* output of the following code?
#include <stdio.h>
#include <stdlib.h>
int main()
{
char test1[] = "0xz"; // "0x" followed by non-digit
int rc, i, count;
unsigned u;
char * endptr;
/* set up test 1 */
FILE * fh = tmpfile();
fprintf( fh, "%s", test1 );
rewind( fh );
i = -1; count = -1;
rc = fscanf( fh, "%i%n", &i, &count );
printf( "case 1 - fscanf %%i: rc = %d, value = %2d, "
"count = %2d, position = %ld\n",
rc, i, count, ftell( fh ) );
rewind( fh );
u = -1; count = -1;
rc = fscanf( fh, "%x%n", &u, &count );
printf( "case 1 - fscanf %%u: rc = %d, value = %2d, "
"count = %2d, position = %ld\n",
rc, u, count, ftell( fh ) );
i = strtol( test1, &endptr, 0 );
printf( "case 1 - strtol: value = %d, "
"count = %td\n", i, endptr - test1 );
fclose( fh );
return 0;
}
The relevant parts of the standard are, AFAICT:
6.4.4.1 Integer constants
7.19.6.2 The fscanf function, paragraphs 9, 10, and 12 as well as
footnote 251.
7.20.1.4 The strtol, strtoll, strtoul, and strtoull functions
I'll try to pinpoint my confusion...
7.19.6.2 paragraph 9 states: "An input item is defined as the longest
sequence of input characters [...] which is, or is a prefix of, a
matching input sequence. 251) The first character, if any, after the
input item remains unread."
Paragraph 10 continues: "If the input item is not a matching sequence,
the execution of the directive fails: this condition is a matching
failure."
Footnote 251 states: "fscanf pushes back at most one input character
onto the input stream. Therefore, some sequences that are acceptable
to strtod, strtol, etc., are unacceptable to fscanf."
I could read that as:
- "0x" is a prefix of a matching input sequence, and thus an input
item (the "z" is being read and recognized as the first non-matching
character after the input item);
- the input item "0x" is not a matching sequence, so the execution of
the whole directive fails;
- because fscanf cannot push back everything in read, characters are
consumed but not parsed, effectively leaving the input stream in an
undefined position.
I could *also* read that as:
- the "z" is never actually "read" as far as the standard is concerned
("...remains unread");
- fscanf uses its one character pushback to unget the "x" and parses
the longest matching sequence, i.e. "0".
- parsing does not fail.
Apparently, some library implementors came up with yet other
interpretations:
newlib
case 1 - fscanf %i: rc = 1, value = 0, count = 1, position = 1
case 1 - fscanf %u: rc = 1, value = 0, count = 1, position = 1
case 1 - strtol: value = 0, count = 0
MinGW
case 1 - fscanf %i: rc = 0, value = -1, count = -1, position = 2
case 1 - fscanf %u: rc = 0, value = -1, count = -1, position = 2
case 1 - strtol: value = 0, count = 0
glibc
case 1 - fscanf %i: rc = 1, value = 0, count = 2, position = 2
case 1 - fscanf %u: rc = 1, value = 0, count = 2, position = 2
case 1 - strtol: value = 0, count = 1
IBM AIX
case 1 - fscanf %i: rc = 0, value = -1, count = -1, position = 2
case 1 - fscanf %u: rc = 0, value = -1, count = -1, position = 2
case 1 - strtol: value = 0, count = 1
Since I am exactly in that position (implementing a library), I'd like
to hear some authoritative opinion on that matter.
Help?!?
This subject has already been discussed, with no definite results, at
the following websites:
http://stackoverflow.com/questions/1425730/difference-between-scanf-and-strtol-strtod-in-parsing-numbers
http://forum.osdev.org/viewtopic.php?f=13&t=20934
Regards,
--
Martin Baute
so...@rootdirectory.de
> I could read that as:
>
> - "0x" is a prefix of a matching input sequence, and thus an input
> item (the "z" is being read and recognized as the first non-matching
> character after the input item);
Correct.
> - the input item "0x" is not a matching sequence, so the execution of
> the whole directive fails;
Correct
> - because fscanf cannot push back everything in read, characters are
> consumed but not parsed, effectively leaving the input stream in an
> undefined position.
Wrong. The 'z' was read, found to be of the wrong form, and pushed back.
So, the next character to be read would be that 'z'.
Look for the '100ergs' example in fscanf in Standard C.
---
Fred J. Tydeman Tydeman Consulting
tyd...@tybor.com Testing, numerics, programming
+1 (775) 358-9748 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.
Thanks for that answer!
That would mean MinGW and the IBM AIX libc got it right, and the Open
Source
libs got it wrong? Heh. Who'd have figured...
But now, what would be the correct behavior for strtol()? In 7.20.1.4,
paragraph
4, it reads:
"The subject sequence is defined as the longest initial subsequence of
the input
string, starting with the first non-white-space character, that is of
the expected
form."
That wording is a bit different from that for fscanf(), and I would
read it to mean
that, since "0x" is not of the expected form, strtol() would be
expected to parse
only the "0", and pointing endptr to the "x".
Is that assumption (as in the IBM AIX libc) correct? Or should strtol
() also fail
(as it does in MinGW)?
> That wording is a bit different from that for fscanf(), and I would
> read it to mean
> that, since "0x" is not of the expected form, strtol() would be
> expected to parse
> only the "0", and pointing endptr to the "x".
>
> Is that assumption (as in the IBM AIX libc) correct?
Yes, that is correct.
That is correct. Since there's no pushback limit for the strto*()
functions, they're required to get it right rather than being allowed to
give up like the I/O functions are. :-)
--
Larry Jones
Well, it's all a question of perspective. -- Calvin