Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

fscanf(), strtol() and the parsing of numbers

91 views
Skip to first unread message

DevSolar

unread,
Sep 18, 2009, 4:09:54 AM9/18/09
to
Hello,

I found C99 to be less than clear about the expected behaviour in
parsing numbers. What surprised me even more was the wildy varying
results I got from several libraries regarding this topic.

What would be the *expected* output of the following code?

#include <stdio.h>
#include <stdlib.h>

int main()
{
char test1[] = "0xz"; // "0x" followed by non-digit
int rc, i, count;
unsigned u;
char * endptr;

/* set up test 1 */
FILE * fh = tmpfile();
fprintf( fh, "%s", test1 );
rewind( fh );

i = -1; count = -1;
rc = fscanf( fh, "%i%n", &i, &count );
printf( "case 1 - fscanf %%i: rc = %d, value = %2d, "
"count = %2d, position = %ld\n",
rc, i, count, ftell( fh ) );
rewind( fh );
u = -1; count = -1;
rc = fscanf( fh, "%x%n", &u, &count );
printf( "case 1 - fscanf %%u: rc = %d, value = %2d, "
"count = %2d, position = %ld\n",
rc, u, count, ftell( fh ) );
i = strtol( test1, &endptr, 0 );
printf( "case 1 - strtol: value = %d, "
"count = %td\n", i, endptr - test1 );

fclose( fh );
return 0;
}


The relevant parts of the standard are, AFAICT:

6.4.4.1 Integer constants

7.19.6.2 The fscanf function, paragraphs 9, 10, and 12 as well as
footnote 251.

7.20.1.4 The strtol, strtoll, strtoul, and strtoull functions


I'll try to pinpoint my confusion...

7.19.6.2 paragraph 9 states: "An input item is defined as the longest
sequence of input characters [...] which is, or is a prefix of, a
matching input sequence. 251) The first character, if any, after the
input item remains unread."

Paragraph 10 continues: "If the input item is not a matching sequence,
the execution of the directive fails: this condition is a matching
failure."

Footnote 251 states: "fscanf pushes back at most one input character
onto the input stream. Therefore, some sequences that are acceptable
to strtod, strtol, etc., are unacceptable to fscanf."


I could read that as:

- "0x" is a prefix of a matching input sequence, and thus an input
item (the "z" is being read and recognized as the first non-matching
character after the input item);

- the input item "0x" is not a matching sequence, so the execution of
the whole directive fails;

- because fscanf cannot push back everything in read, characters are
consumed but not parsed, effectively leaving the input stream in an
undefined position.


I could *also* read that as:

- the "z" is never actually "read" as far as the standard is concerned
("...remains unread");

- fscanf uses its one character pushback to unget the "x" and parses
the longest matching sequence, i.e. "0".

- parsing does not fail.


Apparently, some library implementors came up with yet other
interpretations:

newlib
case 1 - fscanf %i: rc = 1, value = 0, count = 1, position = 1
case 1 - fscanf %u: rc = 1, value = 0, count = 1, position = 1
case 1 - strtol: value = 0, count = 0

MinGW
case 1 - fscanf %i: rc = 0, value = -1, count = -1, position = 2
case 1 - fscanf %u: rc = 0, value = -1, count = -1, position = 2
case 1 - strtol: value = 0, count = 0

glibc
case 1 - fscanf %i: rc = 1, value = 0, count = 2, position = 2
case 1 - fscanf %u: rc = 1, value = 0, count = 2, position = 2
case 1 - strtol: value = 0, count = 1

IBM AIX
case 1 - fscanf %i: rc = 0, value = -1, count = -1, position = 2
case 1 - fscanf %u: rc = 0, value = -1, count = -1, position = 2
case 1 - strtol: value = 0, count = 1


Since I am exactly in that position (implementing a library), I'd like
to hear some authoritative opinion on that matter.

Help?!?

This subject has already been discussed, with no definite results, at
the following websites:

http://stackoverflow.com/questions/1425730/difference-between-scanf-and-strtol-strtod-in-parsing-numbers
http://forum.osdev.org/viewtopic.php?f=13&t=20934

Regards,
--
Martin Baute
so...@rootdirectory.de

Fred J. Tydeman

unread,
Sep 18, 2009, 11:15:12 AM9/18/09
to
On Fri, 18 Sep 2009 08:09:54 UTC, DevSolar <so...@rootdirectory.de> wrote:

> I could read that as:
>
> - "0x" is a prefix of a matching input sequence, and thus an input
> item (the "z" is being read and recognized as the first non-matching
> character after the input item);

Correct.



> - the input item "0x" is not a matching sequence, so the execution of
> the whole directive fails;

Correct

> - because fscanf cannot push back everything in read, characters are
> consumed but not parsed, effectively leaving the input stream in an
> undefined position.

Wrong. The 'z' was read, found to be of the wrong form, and pushed back.
So, the next character to be read would be that 'z'.

Look for the '100ergs' example in fscanf in Standard C.
---
Fred J. Tydeman Tydeman Consulting
tyd...@tybor.com Testing, numerics, programming
+1 (775) 358-9748 Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

DevSolar

unread,
Sep 18, 2009, 6:43:34 PM9/18/09
to
On 18 Sep., 17:15, "Fred J. Tydeman"

<tydeman.consult...@sbcglobal.net> wrote:
> On Fri, 18 Sep 2009 08:09:54 UTC, DevSolar <so...@rootdirectory.de> wrote:
> > - the input item "0x" is not a matching sequence, so the execution of
> >   the whole directive fails;
>
> Correct
>
> > - because fscanf cannot push back everything in read, characters are
> >   consumed but not parsed, effectively leaving the input stream in an
> >   undefined position.
>
> Wrong.  The 'z' was read, found to be of the wrong form, and pushed back.
> So, the next character to be read would be that 'z'.

Thanks for that answer!

That would mean MinGW and the IBM AIX libc got it right, and the Open
Source
libs got it wrong? Heh. Who'd have figured...

But now, what would be the correct behavior for strtol()? In 7.20.1.4,
paragraph
4, it reads:

"The subject sequence is defined as the longest initial subsequence of
the input
string, starting with the first non-white-space character, that is of
the expected
form."

That wording is a bit different from that for fscanf(), and I would
read it to mean
that, since "0x" is not of the expected form, strtol() would be
expected to parse
only the "0", and pointing endptr to the "x".

Is that assumption (as in the IBM AIX libc) correct? Or should strtol
() also fail
(as it does in MinGW)?

Fred J. Tydeman

unread,
Sep 20, 2009, 7:40:50 PM9/20/09
to
On Fri, 18 Sep 2009 22:43:34 UTC, DevSolar <so...@rootdirectory.de> wrote:

> That wording is a bit different from that for fscanf(), and I would
> read it to mean
> that, since "0x" is not of the expected form, strtol() would be
> expected to parse
> only the "0", and pointing endptr to the "x".
>
> Is that assumption (as in the IBM AIX libc) correct?

Yes, that is correct.

lawrenc...@siemens.com

unread,
Sep 20, 2009, 7:58:56 PM9/20/09
to
DevSolar <so...@rootdirectory.de> wrote:
>
> But now, what would be the correct behavior for strtol()? In 7.20.1.4,
> paragraph
> 4, it reads:
>
> "The subject sequence is defined as the longest initial subsequence of
> the input
> string, starting with the first non-white-space character, that is of
> the expected
> form."
>
> That wording is a bit different from that for fscanf(), and I would
> read it to mean
> that, since "0x" is not of the expected form, strtol() would be
> expected to parse
> only the "0", and pointing endptr to the "x".

That is correct. Since there's no pushback limit for the strto*()
functions, they're required to get it right rather than being allowed to
give up like the I/O functions are. :-)
--
Larry Jones

Well, it's all a question of perspective. -- Calvin

0 new messages