Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

scanf("%d", &n) behaviour on overflow

1,251 views
Skip to first unread message

Szabolcs Nagy

unread,
Aug 26, 2012, 11:45:08 AM8/26/12
to
in c11, section 7.21.6.2 specifies fscanf
(7.19.6.2 in c99), but the %d case seems
to be under specified

what can the following code output:

#include <stdio.h>
#include <errno.h>
int main(void)
{
int r, n;
errno = 0;
r = sscanf("12345678901234", "%d", &n);
printf("%d %d %d\n", r, n, errno);
}

the input format for %d is required to be the
same as expected for strtol, but there is no
detail about the result stored after the
conversion (so it's not clear what happens
in the overflow/underflow case)

7.21.6.2p10:
"Except in the case of a % specifier, the input item (or, in the case
of a %n directive, the count of input characters) is converted to a
type appropriate to the conversion specifier. If the input item is not
a matching sequence, the execution of the directive fails: this
condition is a matching failure. Unless assignment suppression was
indicated by a *, the result of the conversion is placed in the object
pointed to by the first argument following the format argument that
has not already received a conversion result. If this object does not
have an appropriate type, or if the result of the conversion cannot be
represented in the object, the behavior is undefined."

the last line here could be interpreted
in a way that if the result is too big
for a signed int then the behaviour is
undefined

but that's bad as it means scanf and fscanf
must not be used with %d on unchecked input
(so i think either the result should be
explicitly unspecified or well defined)


James Kuyper

unread,
Aug 26, 2012, 4:15:00 PM8/26/12
to
On 08/26/2012 11:45 AM, Szabolcs Nagy wrote:
> in c11, section 7.21.6.2 specifies fscanf
> (7.19.6.2 in c99), but the %d case seems
> to be under specified
>
> what can the following code output:
>
> #include <stdio.h>
> #include <errno.h>
> int main(void)
> {
> int r, n;
> errno = 0;
> r = sscanf("12345678901234", "%d", &n);
> printf("%d %d %d\n", r, n, errno);
> }
...
> 7.21.6.2p10:
> "... if the result of the conversion cannot be
> represented in the object, the behavior is undefined."
>
> the last line here could be interpreted
> in a way that if the result is too big
> for a signed int then the behaviour is
> undefined

That's correct.

> but that's bad

So is that.

> ... as it means scanf and fscanf
> must not be used with %d on unchecked input

and so is that.

> (so i think either the result should be
> explicitly unspecified or well defined)

It would be nice.


--
James Kuyper

Szabolcs Nagy

unread,
Aug 26, 2012, 6:44:49 PM8/26/12
to
James Kuyper <james...@verizon.net> wrote:
> On 08/26/2012 11:45 AM, Szabolcs Nagy wrote:
>> 7.21.6.2p10:
>> "... if the result of the conversion cannot be
>> represented in the object, the behavior is undefined."
>>
>> the last line here could be interpreted
>> in a way that if the result is too big
>> for a signed int then the behaviour is
>> undefined
>
> That's correct.
>

but then scanf is no better than gets

actually it's even worse as bad input
can invoke undefined behaviour in
unexpected ways

and many of the scanf examples promote
the use of %d and none of them mention
anything about undefined behaviour

and in annex I, where the undefined
behaviours are collected, this particular
item is not listed

and fscanf_s in annex K, which tries to
"mitigate security vulnerabilities", does
not address this issue either

these suggest me that the ub interpretation
is not the right one, otherwise at least
there should be a note about it

Keith Thompson

unread,
Aug 26, 2012, 7:58:27 PM8/26/12
to
Szabolcs Nagy <n...@port70.net> writes:
> James Kuyper <james...@verizon.net> wrote:
>> On 08/26/2012 11:45 AM, Szabolcs Nagy wrote:
>>> 7.21.6.2p10:
>>> "... if the result of the conversion cannot be
>>> represented in the object, the behavior is undefined."
>>>
>>> the last line here could be interpreted
>>> in a way that if the result is too big
>>> for a signed int then the behaviour is
>>> undefined
>>
>> That's correct.
>
> but then scanf is no better than gets
>
> actually it's even worse as bad input
> can invoke undefined behaviour in
> unexpected ways

It's not as bad as gets(). scanf() *can* be used safely, if you're
sufficiently careful about the format string; for example "%20s" is ok
as long as the target array is big enough. (And sscanf is safe if you
exercise control over the input string.)

But yes, it's very easy to use it unsafely.

> and many of the scanf examples promote
> the use of %d and none of them mention
> anything about undefined behaviour
>
> and in annex I, where the undefined
> behaviours are collected, this particular
> item is not listed

That's in J.2, at least as of N1570, the latest draft. Yes, I agree
that that's a serious oversight.

> and fscanf_s in annex K, which tries to
> "mitigate security vulnerabilities", does
> not address this issue either
>
> these suggest me that the ub interpretation
> is not the right one, otherwise at least
> there should be a note about it

But I can't think of another interpretation for N1370 7.21.6.2p10:

If this object does not have an appropriate type, or if the
result of the conversion cannot be represented in the object,
the behavior is undefined.

I would *love* to see this corrected.

One solution would be to say that if the result cannot be represented
in the object, it's treated as a matching failure. Another would be
to say that an implementation-defined value is stored in the object
(but that would make it difficult to detect the problem).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Lew Pitcher

unread,
Aug 26, 2012, 9:18:18 PM8/26/12
to
On Sunday 26 August 2012 19:58, in comp.std.c, ks...@mib.org wrote:

> Szabolcs Nagy <n...@port70.net> writes:
[snip]
>> and fscanf_s in annex K, which tries to
>> "mitigate security vulnerabilities", does
>> not address this issue either
>>
>> these suggest me that the ub interpretation
>> is not the right one, otherwise at least
>> there should be a note about it
>
> But I can't think of another interpretation for N1370 7.21.6.2p10:
>
> If this object does not have an appropriate type, or if the
> result of the conversion cannot be represented in the object,
> the behavior is undefined.
>
> I would *love* to see this corrected.
>
> One solution would be to say that if the result cannot be represented
> in the object,

Keith, in this sense, what do you mean by "the object"?

If I read the excerpted passage correctly (within the context of the
standard), "the object" referred to is "the object pointed to by the first
argument following the format argument that has not already received a
conversion result".

How do you propose that the *scanf*() functions determine what values can be
represented in the "object" in question? Wouldn't the *scanf*() functions
need to know the /type/ of the object first, rather than the (possibly
incorrect) type *as specified by the format string* ?

> it's treated as a matching failure. Another would be
> to say that an implementation-defined value is stored in the object
> (but that would make it difficult to detect the problem).



--
Lew Pitcher
"In Skills, We Trust"

Keith Thompson

unread,
Aug 26, 2012, 11:55:56 PM8/26/12
to
Lew Pitcher <lpit...@teksavvy.com> writes:
> On Sunday 26 August 2012 19:58, in comp.std.c, ks...@mib.org wrote:
[...]
>> But I can't think of another interpretation for N1370 7.21.6.2p10:
>>
>> If this object does not have an appropriate type, or if the
>> result of the conversion cannot be represented in the object,
>> the behavior is undefined.
>>
>> I would *love* to see this corrected.
>>
>> One solution would be to say that if the result cannot be represented
>> in the object,
>
> Keith, in this sense, what do you mean by "the object"?
>
> If I read the excerpted passage correctly (within the context of the
> standard), "the object" referred to is "the object pointed to by the first
> argument following the format argument that has not already received a
> conversion result".

Yes.

> How do you propose that the *scanf*() functions determine what values can be
> represented in the "object" in question? Wouldn't the *scanf*() functions
> need to know the /type/ of the object first, rather than the (possibly
> incorrect) type *as specified by the format string* ?

If the object isn't of the appropriate type, the behavior is undefined
anyway.

James Kuyper

unread,
Aug 27, 2012, 8:06:31 AM8/27/12
to
On 08/26/2012 06:44 PM, Szabolcs Nagy wrote:
> James Kuyper <james...@verizon.net> wrote:
>> On 08/26/2012 11:45 AM, Szabolcs Nagy wrote:
>>> 7.21.6.2p10:
>>> "... if the result of the conversion cannot be
>>> represented in the object, the behavior is undefined."
>>>
>>> the last line here could be interpreted
>>> in a way that if the result is too big
>>> for a signed int then the behaviour is
>>> undefined
>>
>> That's correct.
....
> these suggest me that the ub interpretation
> is not the right one, otherwise at least
> there should be a note about it

Well, the clause in question very explicitly says that under some
circumstance the behavior IS undefined. To me "the result is too big for
a signed int" is pretty close match to "the result of the conversion
cannot be represented in the object". All I have to do to make the match
perfect is to assume that "the result" refers to "the result of the
conversion", and that signed int is the type of the relevant object. If
this isn't an example of such a circumstance, could you give an example
of a circumstance that you believe does actually fit the description?
--
James Kuyper

Szabolcs Nagy

unread,
Aug 27, 2012, 8:54:20 AM8/27/12
to
actually it's not clear to me what "result" is,
eg. it could be the return value of strtol, which
is well defined on overflowing inputs
(although it returns long int so there still can
be an implementation defined conversion to signed
int, but that's better than ub)

rereading the text it seems to me that it's not
specified anywhere what the result of the
conversion is in general, only that the format of
the input must match and then something is put
into the appropriate object

so a conforming implementation could put anything
into the given object after a match is found

but this would render scanf useless

James Kuyper

unread,
Aug 27, 2012, 9:19:48 AM8/27/12
to
No, it could not be. Footnote 285 on 7.21.6.2p9 points out that the
requirements of that paragraph imply that "fscanf pushes back at most
one input character onto the input stream. Therefore, some sequences
that are acceptable to strtod, strtol, etc., are unacceptable to
fscanf." The "result" referred to must be the result of an algorithm
similar to that used by strtol(), but modified to not require more than
one input character of pushback. 7.21.6.2p9 also allows for an "input
item" to include a sequence of characters representing a value too large
to be represented in an object of the specified type, and 7.21.6.2p10
allows for the algorithm used by fscanf() to be so different from that
used by strtod(), strtol(), etc, that the permitted range of behavior
when processing such an input item can only be described as "undefined".

I don't like this, it's clearly not necessary, and I favor changing it.
But that's what the standard currently says.
--
James Kuyper

James Kuyper

unread,
Aug 27, 2012, 9:40:26 AM8/27/12
to
Even if you're right with that interpretation, that just means that
section 7.21.6.2p10 says that "... if the result of the conversion
[returned by strtol] cannot be represented in the object, the behavior
is undefined." If the input sequence describes a number larger than
LONG_MAX, strtol() returns LONG_MAX; if it describes one smaller than
LONG_MIN, strtol() returns LONG_MIN. Your interpretation of what
"result" refers to produces different consequences than mine only on
systems where LONG_MAX==INT_MAX, or LONG_MIN==INT_MIN. Otherwise, the
result when a value is greater than INT_MAX or smaller than INT_MIN
"cannot be represented in the object", so according to 7.21.6.2p10, "the
behavior is undefined".
--
James Kuyper

Szabolcs Nagy

unread,
Aug 27, 2012, 11:06:08 AM8/27/12
to
James Kuyper <james...@verizon.net> wrote:
> On 08/27/2012 08:54 AM, Szabolcs Nagy wrote:
>> eg. it could be the return value of strtol, which
>> is well defined on overflowing inputs
>
> No, it could not be. Footnote 285 on 7.21.6.2p9 points out that the
> requirements of that paragraph imply that "fscanf pushes back at most
> one input character onto the input stream. Therefore, some sequences
> that are acceptable to strtod, strtol, etc., are unacceptable to
> fscanf." The "result" referred to must be the result of an algorithm
> similar to that used by strtol(), but modified to not require more than
> one input character of pushback.

i don't think the number of push backs has relevance to the
accepted number formats

strtol does not need more than one push back for decimal strings

and it is explicitly specified that the %d conversion specifier
accepts the same input as strtol(input, 0, 10)

the %x case is interesting: "0xy" would need two push backs to
match "0" only, but i don't think that changes the acceptable
input, it's only an inconvenience to the libc implementation
(which need to support more than one character push back, but
that's most likely needed for multibyte string parsing anyway).

Szabolcs Nagy

unread,
Aug 27, 2012, 11:13:30 AM8/27/12
to
Szabolcs Nagy <n...@port70.net> wrote:
> James Kuyper <james...@verizon.net> wrote:
>> No, it could not be. Footnote 285 on 7.21.6.2p9 points out that the
>> requirements of that paragraph imply that "fscanf pushes back at most
>> one input character onto the input stream. Therefore, some sequences
>> that are acceptable to strtod, strtol, etc., are unacceptable to
>> fscanf." The "result" referred to must be the result of an algorithm
>> similar to that used by strtol(), but modified to not require more than
>> one input character of pushback.
>
> i don't think the number of push backs has relevance to the
> accepted number formats
>

ok i checked that note, you are right and i was wrong

> strtol does not need more than one push back for decimal strings
>
> and it is explicitly specified that the %d conversion specifier
> accepts the same input as strtol(input, 0, 10)
>

this still stands though

> the %x case is interesting: "0xy" would need two push backs to
> match "0" only, but i don't think that changes the acceptable
> input, it's only an inconvenience to the libc implementation

i'm not sure what's the solution here
(ie. what is exactly the acceptable input for %x)

lawrenc...@siemens.com

unread,
Aug 27, 2012, 10:19:35 AM8/27/12
to
Szabolcs Nagy <n...@port70.net> wrote:
>
> and in annex I, where the undefined
> behaviours are collected, this particular
> item is not listed

In the current standard, it's Annex J and it *is* listed:

-- The result of a conversion by one of the formatted input functions
cannot be represented in the corresponding object, or the receiving
object does not have an appropriate type (7.21.6.2, 7.29.2.2).
--
Larry Jones

Why is it you always rip your pants on the day everyone has to
demonstrate a math problem at the chalkboard? -- Calvin

jacob navia

unread,
Aug 27, 2012, 2:34:39 PM8/27/12
to
Le 27/08/12 16:19, lawrenc...@siemens.com a �crit :
> Szabolcs Nagy <n...@port70.net> wrote:
>>
>> and in annex I, where the undefined
>> behaviours are collected, this particular
>> item is not listed
>
> In the current standard, it's Annex J and it *is* listed:
>
> -- The result of a conversion by one of the formatted input functions
> cannot be represented in the corresponding object, or the receiving
> object does not have an appropriate type (7.21.6.2, 7.29.2.2).
>


This is better but maybe it could be possible to specify an errorno
for that case? Or a means for Xscanf functions to detect overflow and
pass it to the calling program?

As it stands now, I have to scan the number like scanf, then see if
there could be any overflow, THEN call scanf.
0 new messages