C std: scanf conversion when a field is empty.

A. McKenney

unread,

Dec 24, 2009, 9:52:59 PM12/24/09

to

I've been using sscanf to do some data conversions,
and I looked at Harbison & Steele to see what
it would do if there were, say, blanks instead
of an integer or float value.

While it was not very explicit, it seemed to suggest
that you didn't need any digits for a conversion like
%d to succeed, and if there weren't any digits, it
would put in the value zero and count it as a successful
conversion.

This isn't the behavior I want (I want it to fail), and
it looks like the Solaris 10 C++ compiler (and, I assume,
the C compiler) will do what I want. Some test code:

const char *str = " : : ";
int a = 0; int b = 0; int c = 0; int ret;
ret = sscanf( str, "%d:%d:%d", &a, &b, &c );
printf( "'%s' is scanned as %d fields: %d:%d:%d\n", str, ret, a, b,
c );

yielded

' : : ' is scanned as 0 fields: 0:0:0

However, I worry that the next version of the C library
might do something different.

My version of Harbison & Steele doesn't say "C Standard" on
it, and the book I have that _does_ say "C standard"
(Plauger & Brodie) doesn't address the issue, so I don't
know what the standard says.

What does the C standard say about whether integer or
float conversion should fail if there are no digits?
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Barry Schwarz

unread,

Dec 25, 2009, 1:43:43 AM12/25/09

to

On Thu, 24 Dec 2009 20:52:59 -0600 (CST), "A. McKenney"
<alan_mc...@yahoo.com> wrote:

>I've been using sscanf to do some data conversions,
>and I looked at Harbison & Steele to see what
>it would do if there were, say, blanks instead
>of an integer or float value.
>
>While it was not very explicit, it seemed to suggest
>that you didn't need any digits for a conversion like
>%d to succeed, and if there weren't any digits, it
>would put in the value zero and count it as a successful
>conversion.
>
>This isn't the behavior I want (I want it to fail), and
>it looks like the Solaris 10 C++ compiler (and, I assume,
>the C compiler) will do what I want. Some test code:
>
>const char *str = " : : ";
>int a = 0; int b = 0; int c = 0; int ret;

Initializing to a value other than 0 would have shown whether the
variables were being modified.

>ret = sscanf( str, "%d:%d:%d", &a, &b, &c );
>printf( "'%s' is scanned as %d fields: %d:%d:%d\n", str, ret, a, b,
>c );
>
>yielded
>
>' : : ' is scanned as 0 fields: 0:0:0

The return value cannot count the conversion that failed. Since that
is the first conversion in this case, 0 is the only compliant value.

>
>However, I worry that the next version of the C library
>might do something different.
>
>My version of Harbison & Steele doesn't say "C Standard" on
>it, and the book I have that _does_ say "C standard"
>(Plauger & Brodie) doesn't address the issue, so I don't
>know what the standard says.
>
>What does the C standard say about whether integer or
>float conversion should fail if there are no digits?

Why not download n1256 and see for yourself. As near as I can tell
from the various sections of 7.19.6.2 (particularly -4, -8, and -9),
the %d directive fails. There is no indication that a zero should be
stored in the corresponding int.

--
Remove del for email

Clive D. W. Feather

unread,

Dec 25, 2009, 7:24:18 PM12/25/09

to

In message <clcm-2009...@plethora.net>, A. McKenney

<alan_mc...@yahoo.com> wrote:
>I've been using sscanf to do some data conversions,
>and I looked at Harbison & Steele to see what
>it would do if there were, say, blanks instead
>of an integer or float value.
>
>While it was not very explicit, it seemed to suggest
>that you didn't need any digits for a conversion like
>%d to succeed,

You are understanding this wrongly.

The definition of the scanf family is that, when a conversion
specification (such as %d) is reached, the following happens:
(1) Any white space is skipped (unless it's %[, %c, or %n).
(2) An input item is read. An input item is the longest possible
sequence of characters that does not exceed any field width, and which
could be a matching sequence or the first part of a matching sequence.
For %d, a matching sequence consists of:
(a) an optional sign
(b) one or more decimal digits
(3) If the length of the input item is zero, the directive fails.
(4) If the input item is not a matching sequence, the directive fails.
(5) Otherwise the input item is converted to a value and stored (unless
suppressed by a *).

For %d, the valid input sequences are:
- all matching sequences
- a sign on its own
and the latter is an error.

>const char *str = " : : ";
>int a = 0; int b = 0; int c = 0; int ret;
>ret = sscanf( str, "%d:%d:%d", &a, &b, &c );

Your example finds a colon after the white space. Therefore the input
sequence has zero characters in it and the conversion fails. None of the
variables a, b, or c will be affected by this code.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>

Ersek, Laszlo

unread,

Dec 26, 2009, 7:49:11 AM12/26/09

to

From: "Clive D. W. Feather" <cl...@davros.org>
Date: Fri, 25 Dec 2009 18:24:18 -0600 (CST)
Message-ID: <clcm-2009...@plethora.net>

> The definition of the scanf family is that, when a conversion
> specification (such as %d) is reached, the following happens:
> (1) Any white space is skipped (unless it's %[, %c, or %n).
> (2) An input item is read. An input item is the longest possible
> sequence of characters that does not exceed any field width, and which
> could be a matching sequence or the first part of a matching sequence.
> For %d, a matching sequence consists of:
> (a) an optional sign
> (b) one or more decimal digits
> (3) If the length of the input item is zero, the directive fails.
> (4) If the input item is not a matching sequence, the directive fails.
> (5) Otherwise the input item is converted to a value and stored (unless
> suppressed by a *).
>
> For %d, the valid input sequences are:
> - all matching sequences
> - a sign on its own
> and the latter is an error.

How are range errors reported? The description of the %d conversion
specifier refers to strtol(), and indeed, strtol() reports such errors
properly. However,

$ cat try.c

#include <errno.h> /* errno */
#include <stdio.h> /* sscanf() */
#include <string.h> /* strerror() */

int main(int argc, char **argv)
{
int a, b, count;

errno = 0;
a = 0;
b = 0;
count = sscanf(argv[1], "%d %d", &a, &b);
(void)fprintf(stdout, "count=%d a=%d b=%d \"%s\"\n",
count, a, b, strerror(errno));

return 0;
}

On my platform (gcc (Debian 4.3.2-1.1), x86_64 GNU/Linux, glibc 2.7-18)

$ ./try '99999999999999999999 4'

count=2 a=-1 b=4 "Numerical result out of range"

Looking at the C89 standard,

7.9.6.2 The fscanf function

[...]

Unless assignment suppression was indicated by a *, the result of
the conversion is placed in the object pointed to by the first
argument following the format argument that has not already
received a conversion result. If this object does not have an
appropriate type, or if the result of the conversion cannot be
represented in the space provided, the behavior is undefined.

Thus the program invocation above leads to undefined behavior and its
result is accidental. I believe this makes the scanf() family of functions
unsuitable for portably parsing integers from untrusted sources.

Personally, I use the following function to parse isolated decimal strings
as integers:

int long_parse(long *dest, const char *src, long min, long max)
{
char *endptr;

errno = 0;
*dest = strtol(src, &endptr, 10);
return ('\0' != *src && '\0' == *endptr && 0 == errno
&& min <= *dest && *dest <= max) ? 0 : -1;
}

In order to parse a value that can be converted to an int later on, min is
set to INT_MIN and max is set to INT_MAX. A list of examples:

string refused by
------ ----------
"a" '\0' == *endptr
" " '\0' == *endptr
"" '\0' != *src
"1 " '\0' == *endptr
"9223372036854775808" 0 == errno
"2147483648" *dest <= max
"-2147483649" min <= *dest

Leading whitespace is accepted.

Thanks,
lacos

Clive D. W. Feather

unread,

Dec 29, 2009, 10:27:31 AM12/29/09

to

In message <clcm-2009...@plethora.net>, "Ersek, Laszlo"

<la...@caesar.elte.hu> wrote:
>> The definition of the scanf family is that,

[...]

>> For %d, the valid input sequences are:
>> - all matching sequences
>> - a sign on its own
>> and the latter is an error.
>
>How are range errors reported?

You mean a value too big to fit in the relevant variable? That's
undefined behaviour.

>The description of the %d conversion specifier refers to strtol(), and
>indeed, strtol() reports such errors properly.

The description refers to strtol() to define what a valid input sequence
looks like, not for all the semantics. There are known differences
between the two.

>However,
[...]

>On my platform (gcc (Debian 4.3.2-1.1), x86_64 GNU/Linux, glibc 2.7-18)
> $ ./try '99999999999999999999 4'
>
> count=2 a=-1 b=4 "Numerical result out of range"

Since the behaviour is undefined, that's a permissible behaviour.

>Thus the program invocation above leads to undefined behavior and its
>result is accidental.

Right.

>I believe this makes the scanf() family of functions unsuitable for
>portably parsing integers from untrusted sources.

Not completely so. For example, you can use field widths to limit the
length of the parsed value. If integers are 32 bits on your platform,
then %8d can never result in undefined behaviour from this cause.

Also note that %u doesn't have this problem. Nor does %*d.

--
Clive D.W. Feather | Home: <cl...@davros.org>
Mobile: +44 7973 377646 | Web: <http://www.davros.org>
Please reply to the Reply-To address, which is: <cl...@davros.org>