Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

scanf and integer overflow, is it undefined behavior

936 views
Skip to first unread message

Kees Bakker

unread,
Apr 30, 2014, 6:04:33 AM4/30/14
to
Hi,

This item has been discussed before on comp.std.c, but I still am
unclear what the outcome is. Is it undefined behaviour or not?
(( Search for Subject: scanf("%d", &n) behaviour on overflow ))

With the danger of giving an example and then have the discussion
go in unwanted directions, I still want to give one.

Let's say I have a string "194" and the sscanf format is %hhd or %hhi.

signed char c;
sscanf("194", "%hhi", &c);

For some implementations you'd get c==-62. Is this the defined
behavior or is it undefined?

I'd argue that 194 does not fit in a signed char (because of hhi),
and that the last part of C99 7.19.6.2p10 comes into action.
"... if the result of the conversion cannot be represented
in the object, the behavior is undefined."
--
Kees Bakker

James Kuyper

unread,
Apr 30, 2014, 7:52:02 AM4/30/14
to
On 04/30/2014 06:04 AM, Kees Bakker wrote:
> Hi,
>
> This item has been discussed before on comp.std.c, but I still am
> unclear what the outcome is. Is it undefined behaviour or not?
> (( Search for Subject: scanf("%d", &n) behaviour on overflow ))
>
> With the danger of giving an example and then have the discussion
> go in unwanted directions, I still want to give one.
>
> Let's say I have a string "194" and the sscanf format is %hhd or %hhi.
>
> signed char c;
> sscanf("194", "%hhi", &c);
>
> For some implementations you'd get c==-62. Is this the defined
> behavior or is it undefined?
>
> I'd argue that 194 does not fit in a signed char (because of hhi),

That depends upon the value of SCHAR_MAX, but yes, that is likely to be
the case.

> and that the last part of C99 7.19.6.2p10 comes into action.
> "... if the result of the conversion cannot be represented
> in the object, the behavior is undefined."

That's 7.21.6.2p10 in the current version of the standard, and it seems
to be pretty clear on the subject. On what basis was it ever argued
otherwise? (I failed to locate the previous discussion you refer to above).
--
James Kuyper

Kees Bakker

unread,
Apr 30, 2014, 10:21:41 AM4/30/14
to
On 30-04-14 13:52, James Kuyper wrote:
> On 04/30/2014 06:04 AM, Kees Bakker wrote:
>> Hi,
>>
>> This item has been discussed before on comp.std.c, but I still am
>> unclear what the outcome is. Is it undefined behaviour or not?
>> (( Search for Subject: scanf("%d", &n) behaviour on overflow ))
>>
>> With the danger of giving an example and then have the discussion
>> go in unwanted directions, I still want to give one.
>>
>> Let's say I have a string "194" and the sscanf format is %hhd or %hhi.
>>
>> signed char c;
>> sscanf("194", "%hhi", &c);
>>
>> For some implementations you'd get c==-62. Is this the defined
>> behavior or is it undefined?
>>
>> I'd argue that 194 does not fit in a signed char (because of hhi),
>
> That depends upon the value of SCHAR_MAX, but yes, that is likely to be
> the case.
>
>> and that the last part of C99 7.19.6.2p10 comes into action.
>> "... if the result of the conversion cannot be represented
>> in the object, the behavior is undefined."
>
> That's 7.21.6.2p10 in the current version of the standard, and it seems
> to be pretty clear on the subject.

Sorry to be persistent, I'm not trying to be offensive by any means, but ...
even though you say that it seems pretty clear, you still don't explicitly say:

A. yes, it is undefined behavior
B. no, it is not undefined behavior

> On what basis was it ever argued
> otherwise?

Well, I'm having a discussion with someone who I'm trying to convince that it
is undefined behavior. It has to do with validation of our (i.e. TASKING) scanf
implementation.

> (I failed to locate the previous discussion you refer to above).
>

It's the first google hit when to search for:
comp.std.c scanf("%d", &n) behaviour on overflow

--
Kees

James Kuyper

unread,
Apr 30, 2014, 11:19:51 AM4/30/14
to
Sorry, I thought it was clear that I was agreeing with your argument,
and in particular, with the applicability of the specified clause. Yes,
if 194 > SCHAR_MAX, then "the result of the conversion" to signed char
"cannot be represented in the object" so "the behavior is undefined".

>> On what basis was it ever argued
>> otherwise?
>
> Well, I'm having a discussion with someone who I'm trying to convince that it
> is undefined behavior. It has to do with validation of our (i.e. TASKING) scanf
> implementation.
>
>> (I failed to locate the previous discussion you refer to above).
>>
>
> It's the first google hit when to search for:
> comp.std.c scanf("%d", &n) behaviour on overflow

Sorry - I used the American spelling of "behavior" on my previous search.

I found a discussion with that Subject: header, started by Szabolcs Nagy
on 2012-08-26, which produced responses from me, Keith Thompson, Lew
Pitcher, Larry Jones, and jacob navia. Nagy actually presented the
correct argument, which he did not like, because that conclusion is
inconvenient. All of us except Lew agreed that the argument was correct,
that the behavior is undefined, and that it is inconvenient for this to
be the case. Lew just asked a question, without indicating whether or
not he agreed.

Keith Thompson

unread,
Apr 30, 2014, 11:41:25 AM4/30/14
to
Yes, it is undefined behavior.

7.21.6.2p10 is very clear; it explicitly states that the behavior is
undefined. I'm curious why you would think it could be otherwise. (I
also wonder why your question might be considered offensive; it isn't.)

>> On what basis was it ever argued
>> otherwise?
>
> Well, I'm having a discussion with someone who I'm trying to convince
> that it is undefined behavior. It has to do with validation of our
> (i.e. TASKING) scanf implementation.

Does this person argue that the behavior is not undefined?

(If you're *implementing* scanf, it's probably a good idea to have it
behave more sanely than the standard requires. I suggest treating a
numeric overflow as a matching failure.)

>> (I failed to locate the previous discussion you refer to above).
>
> It's the first google hit when to search for:
> comp.std.c scanf("%d", &n) behaviour on overflow

https://groups.google.com/forum/#!msg/comp.std.c/7mSxlJir4Eo/MmxZ0ejUGv8J

Other than the original post in that thread, I see nobody arguing that
the behavior is not undefined. The original poster seems to have
*assumed* that the behavior must be well defined or unspecified, because
leaving it undefined would be bad. Conclusion: Yes, it's bad, and it
makes scanf() nearly useless for numeric input.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Kaz Kylheku

unread,
Apr 30, 2014, 3:14:48 PM4/30/14
to
On 2014-04-30, Kees Bakker <sp...@tasking.nl> wrote:
> Hi,
>
> This item has been discussed before on comp.std.c, but I still am
> unclear what the outcome is. Is it undefined behaviour or not?
> (( Search for Subject: scanf("%d", &n) behaviour on overflow ))

It is undefined behavior. scanf is not a robust mechanism for
general integer input.

It is robust if you use it with a limit on how many characters
are scanned, like "%3d".

> With the danger of giving an example and then have the discussion
> go in unwanted directions, I still want to give one.
>
> Let's say I have a string "194" and the sscanf format is %hhd or %hhi.
>
> signed char c;
> sscanf("194", "%hhi", &c);
>
> For some implementations you'd get c==-62. Is this the defined
> behavior or is it undefined?
>
> I'd argue that 194 does not fit in a signed char (because of hhi),
> and that the last part of C99 7.19.6.2p10 comes into action.
> "... if the result of the conversion cannot be represented
> in the object, the behavior is undefined."

Undefind behavior creates the possibility of trapping the situation. That is to
say scanf is allowed to terminate the program with or without a diagnostic.

Pragmatically speaking, you cannot write C code in which manipulation of 194 in
signed char is undefined due to overflow. For instance, if you have two char
variables whose arithmetic sum is 194, and add them together with +, what is
added is the promoted values (type int) which is reliably 194, so the addition
itself doesn't overflow. The assignment of the resulting value back to type
char is an implementation-defined conversion, not undefined behavior.

So if scanf ignores the situation (doesn't diagnose), it would have to be
contrived in some more language than C, or else very weirdly written, in order
to produce a value other than that of the expression (signed char) 194, or some
other surprising behavior.

Kaz Kylheku

unread,
Apr 30, 2014, 3:25:22 PM4/30/14
to
On 2014-04-30, Kees Bakker <sp...@tasking.nl> wrote:
That person probably isn't using "undefined behavior" to strictly mean
"behavior not defined by some edition of the ISO C standard".

He or she may have a broader definition: behavior not defined by the language,
the mainstream toolchains and libraries we are using.

Even if we take the broader view, you proabbly won't find any document which
assures you of the behavior of scanning "194" to a signed char, short of the
viewing the source code of your C library.

If you don't have the source, you can make an educated guess about how it is
almost ceratinly written. The argument is sound, but doesn't quite amount to a
definition of behavior.

Exhaustive empirical testing of all possible domain values of all the inputs
also doesn't amount to a definition of behavior; the data from testing must be
combined with the hypothesis that the behavior depends only on those inputs,
and not on some hidden parameters, like the phase of the moon.

Programmers do not understand "undefined behavior". I once made an answer on
the stackoverflow site (to a question about undefined behavior) in which I
argued that "undefined beahvior" isn't "wrong behavior" or "bad behavior".

I gave the example that #include <unistd.h> is undefined behavior.
It is a necessary undefined behavior which you need in order to get the C
job done on a Unix platform.

A big comment war ensued, mostly by bullies shouting "it's not undefined
behavior!" and the answer was heavily modded down.

I was not able to successfully argue that it's only ISO C undefined behavior,
not absolute undefined behavior (not defined by any document on the planet).

People couldn't wrap their heads around that #include <unistd.h> can do
something other than produce a diagnostic about a missing header, or else
include some part of the Unix interface: namely that it can *succeed*, and
bring in unknown contents into the translation unit, on a platform that happens
not to conform to POSIX, and happens to have a unistd.h file in the include
path.

Keith Thompson

unread,
Apr 30, 2014, 4:16:40 PM4/30/14
to
Kaz Kylheku <k...@kylheku.com> writes:
[...]
> Pragmatically speaking, you cannot write C code in which manipulation
> of 194 in signed char is undefined due to overflow. For instance, if
> you have two char variables whose arithmetic sum is 194, and add them
> together with +, what is added is the promoted values (type int) which
> is reliably 194, so the addition itself doesn't overflow. The
> assignment of the resulting value back to type char is an
> implementation-defined conversion, not undefined behavior.

Or, as of C99 and C11, the conversion can raise an
implementation-defined signal; the behavior if that happens *can*
be undefined (i.e., not defined by the C standard). As a QoI issue,
I'd expect any implementation that does this to define the behavior
that occurs when the signal is raised, though the standard doesn't
require it. And I know of no implementation that raises a signal
in this case anyway.

I suspect the standard would be improved by (a) dropping the
permission to raise an implementation-defined signal, and (b)
requiring the behavior of an out-of-range integer-to-signed
conversion to be the "obvious" result for one of the three permitted
signed integer representations. It seems likely that no existing
implementation would have to be changed. (Obviously a bit more
wordsmithing would be needed.)

Kees Bakker

unread,
May 1, 2014, 3:18:57 AM5/1/14
to
On 30-04-14 17:19, James Kuyper wrote:
> On 04/30/2014 10:21 AM, Kees Bakker wrote:
[...]
>
> I found a discussion with that Subject: header, started by Szabolcs Nagy
> on 2012-08-26, which produced responses from me, Keith Thompson, Lew
> Pitcher, Larry Jones, and jacob navia. Nagy actually presented the
> correct argument, which he did not like, because that conclusion is
> inconvenient. All of us except Lew agreed that the argument was correct,
> that the behavior is undefined, and that it is inconvenient for this to
> be the case. Lew just asked a question, without indicating whether or
> not he agreed.
>

Thank you James for this excellent summary.

Tim Rentsch

unread,
Jun 9, 2014, 7:58:43 PM6/9/14
to
Kaz Kylheku <k...@kylheku.com> writes:

[snip]

> Programmers do not understand "undefined behavior". I once made an answer on
> the stackoverflow site (to a question about undefined behavior) in which I
> argued that "undefined beahvior" isn't "wrong behavior" or "bad behavior".
>
> I gave the example that #include <unistd.h> is undefined behavior.
> [snip]

It's a bad example as the behavior of #include <unistd.h> is
not in fact undefined. In particular, 6.10.2 p2 (the same as
6.8.2 p2 in C89) defines it.

Kaz Kylheku

unread,
Jun 9, 2014, 8:07:33 PM6/9/14
to
Suppose the search succeeds in finding a header, and the #include <unistd.h>
directive is now replaced by the header's contents. So far so good; but what is
the subsequent behavior when that substituted material begins to be processed?

Tim Rentsch

unread,
Jun 12, 2014, 1:05:35 PM6/12/14
to
That is determined of course by the contents of the header, just
like what happens for #include "foo.h". The Standard makes no
guarantees about what is in the header <unistd.h>, just like it
makes no guarantees for what is in "foo.h". But it does guarantee
that those contents will be processed according to the other rules
of the language. If you had said, "#include <unistd.h> can lead to
undefined behavior, if its contents are not well-formed", I would
agree with that. But a claim that #include <unistd.h> is always
undefined behavior, independent of the contents of the header,
is not consistent with what the Standard requires. As long as
the contents of a #include'd header or source file are well-formed,
the behavior of #include'ing them is well-defined, even if we
don't know in advance what those results will be.

Kaz Kylheku

unread,
Jun 12, 2014, 2:14:54 PM6/12/14
to
On 2014-06-12, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Kaz Kylheku <k...@kylheku.com> writes:
>
>> On 2014-06-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>
>>> [snip]
>>>
>>>> Programmers do not understand "undefined behavior". I once made an
>>>> answer on the stackoverflow site (to a question about undefined
>>>> behavior) in which I argued that "undefined beahvior" isn't "wrong
>>>> behavior" or "bad behavior".
>>>>
>>>> I gave the example that #include <unistd.h> is undefined behavior.
>>>> [snip]
>>>
>>> It's a bad example as the behavior of #include <unistd.h> is
>>> not in fact undefined. In particular, 6.10.2 p2 (the same as
>>> 6.8.2 p2 in C89) defines it.
>>
>> Suppose the search succeeds in finding a header, and the
>> #include <unistd.h> directive is now replaced by the header's
>> contents. So far so good; but what is the subsequent behavior
>> when that substituted material begins to be processed?
>
> That is determined of course by the contents of the header, just
> like what happens for #include "foo.h". The Standard makes no
> guarantees about what is in the header <unistd.h>, just like it
> makes no guarantees for what is in "foo.h".

"foo.h" is part of the C program, and so the standard provides interpretation
for that.

The contents of <unistd.h> aren't part of the C program; if it exists
at all, the contents are from the implementation.

When C programs which depend on <unistd.h> are ported from one system
to another, the unistd.h header is not taken with those programs,
and doing so probably will not work very well.

If you have a POSIX implementation, but you consider its <unistd.h> to be just
another source file in your C program, then you're not correctly understanding
and following that standard.

> But it does guarantee
> that those contents will be processed according to the other rules
> of the language.

What rule prevents <fortran.h> from containing some special pragma or whatever
that causes it to be interpreted as Fortran? (And the rest of the translation
unit, too?)

An implementation can respond in whatever way they want to #include <unistd.h>
and is not required to document it. That in itself amounts to undefined
behavior.

> If you had said, "#include <unistd.h> can lead to
> undefined behavior, if its contents are not well-formed", I would
> agree with that.

There is no "can lead to undefined behavior here". The preprocessing directive
brings in unknown content which is not in the C program, and so cannot be
interpreted. (Remember: "This International Standard specifies the form and
establishes the interpretation of programs written in the C programming
language.")

The contents of headers do not have to be well-formed according to
the C syntax. Headers can be binary structures, or whatever,
or in a C dialect with extensions, or a completely different programming
language.

> But a claim that #include <unistd.h> is always
> undefined behavior, independent of the contents of the header,

Just like the use of an indeterminately-valued variable is always
undefined behavior, independently of its actual content.

That the actual contents may look correct is not a matter of definition in the
standard any more. It's something which can be locally defined.

The toolchain/platform documentation can say that <unistd.h> contains
strictly conforming ISO C, with such and such contents. That documentation
is not making it standard-defined, even though it is giving us a way to
continue interpreting the meaning of the program, using the standard
as a tool.

> is not consistent with what the Standard requires. As long as
> the contents of a #include'd header or source file are well-formed,
> the behavior of #include'ing them is well-defined, even if we
> don't know in advance what those results will be.

The behavior may be well-defined, but it is not "ISO C 9899:XXXX well-defined".

The behavior of fflush(stdin) can also be well-defined if a given
library documentation says so.

Dereferencing a null pointer is very well defined on many systems as raising
a predictable CPU exception that is translated to a signal or whatever.

Martin Shobe

unread,
Jun 12, 2014, 7:08:45 PM6/12/14
to
On 6/12/2014 1:14 PM, Kaz Kylheku wrote:
> On 2014-06-12, Tim Rentsch <t...@alumni.caltech.edu> wrote:
[snip]
>> But it does guarantee
>> that those contents will be processed according to the other rules
>> of the language.
>
> What rule prevents <fortran.h> from containing some special pragma or whatever
> that causes it to be interpreted as Fortran? (And the rest of the translation
> unit, too?)
>
> An implementation can respond in whatever way they want to #include <unistd.h>
> and is not required to document it. That in itself amounts to undefined
> behavior.

Not quite. There are some restrictions on how it responds. See below.

>> If you had said, "#include <unistd.h> can lead to
>> undefined behavior, if its contents are not well-formed", I would
>> agree with that.
>
> There is no "can lead to undefined behavior here". The preprocessing directive
> brings in unknown content which is not in the C program, and so cannot be
> interpreted. (Remember: "This International Standard specifies the form and
> establishes the interpretation of programs written in the C programming
> language.")
>
> The contents of headers do not have to be well-formed according to
> the C syntax. Headers can be binary structures, or whatever,
> or in a C dialect with extensions, or a completely different programming
> language.

Not quite. While the header can be stored in any of those ways (by the
as-if rule), it must work as a textual substitution of the contents of
the header for the #include directive. It also has to process that text
as C code. See n1570 6.10.2p2-3.

[snip]

Martin Shobe

Kaz Kylheku

unread,
Jun 12, 2014, 8:31:27 PM6/12/14
to
First of all, nowhere does it say that <unistd.h> must find a "header".
Only some additional documentation, or conformance to a standard such
as IEEE 1003.1 assures us that there is a header.

In an implementation that doesn't say there is a <unistd.h> header,
#include <unistd.h> can find anything whatsoever. It could open a frame buffer
device and read pixels.

> as-if rule), it must work as a textual substitution of the contents of
> the header for the #include directive.

"textual substitution" really amounts to "binary substitution": the bit string
#include <unistd.h> is replaced by another bit string.

> It also has to process that text as C code. See n1570 6.10.2p2-3.

Really, now. What is "C code"? Surely you don't mean "strictly conforming ISO
C dialect".

Actual implementations of a POSIX-conforming <unistd.h> contain things
like this:

void _exit(int) __attribute__ ((noreturn));

Is that "C code"?

Even if the #include mechanism strictly works by substitution, and the result
is earnestly processed from the first translation phase on up, that leaves a
lot of room for all kinds of behavior.

Before p2, C99 6.10.2 p1 gives a Constraints paragraph:

A #include directive shall identify a header or source file that can be
processed by the implementation.

But no restrictions are given as to what constitutes "can be processed".
A diagnostic is required if the directive fails to identify a header
or file that can be processed, but it's up to the implementation what
this means. Any sort of bit string "can be processed", in principle,
and with arbitrary semantics.

This constraint requirement, I think, is only useful to the extent that
it seems to require a diagnostic for the header/file not found case.
The one universal way in which a datum cannot be processed is if it doesn't
exist. Anything that can be found can be processed in some way.

Martin Shobe

unread,
Jun 12, 2014, 8:58:31 PM6/12/14
to
I never said it did.

> In an implementation that doesn't say there is a <unistd.h> header,
> #include <unistd.h> can find anything whatsoever. It could open a frame buffer
> device and read pixels.

It would not be conforming if it did.

>> as-if rule), it must work as a textual substitution of the contents of
>> the header for the #include directive.
>
> "textual substitution" really amounts to "binary substitution": the bit string
> #include <unistd.h> is replaced by another bit string.

But that bit string must be interpreted as text, and furthermore, as C code.

>> It also has to process that text as C code. See n1570 6.10.2p2-3.
>
> Really, now. What is "C code"? Surely you don't mean "strictly conforming ISO
> C dialect".

No. I mean as if the text had existed in the translation unit. It does
not have to be conforming, or even valid, but any diagnostics that would
be issued must still be issued, and if the text is strictly conforming
then it must produce the required result when executed.

> Actual implementations of a POSIX-conforming <unistd.h> contain things
> like this:
>
> void _exit(int) __attribute__ ((noreturn));
>
> Is that "C code"?

It must be compiled as if it were.

> Even if the #include mechanism strictly works by substitution, and the result
> is earnestly processed from the first translation phase on up, that leaves a
> lot of room for all kinds of behavior.
>
> Before p2, C99 6.10.2 p1 gives a Constraints paragraph:
>
> A #include directive shall identify a header or source file that can be
> processed by the implementation.
>
> But no restrictions are given as to what constitutes "can be processed".
> A diagnostic is required if the directive fails to identify a header
> or file that can be processed, but it's up to the implementation what
> this means. Any sort of bit string "can be processed", in principle,
> and with arbitrary semantics.

> This constraint requirement, I think, is only useful to the extent that
> it seems to require a diagnostic for the header/file not found case.
> The one universal way in which a datum cannot be processed is if it doesn't
> exist. Anything that can be found can be processed in some way.

Why are you ignoring what's in paragraphs 2 and 3?

Martin Shobe

Keith Thompson

unread,
Jun 12, 2014, 9:57:39 PM6/12/14
to
Kaz Kylheku <k...@kylheku.com> writes:
[...]
> First of all, nowhere does it say that <unistd.h> must find a "header".
> Only some additional documentation, or conformance to a standard such
> as IEEE 1003.1 assures us that there is a header.
>
> In an implementation that doesn't say there is a <unistd.h> header,
> #include <unistd.h> can find anything whatsoever. It could open a
> frame buffer device and read pixels.

N1570 6.10.2:

Constraints

A #include directive shall identify a header or source file that can
be processed by the implementation.

Semantics

A preprocessing directive of the form
# include <h-char-sequence> new-line
searches a sequence of implementation-defined places for a header
identified uniquely by the specified sequence between the < and >
delimiters, and causes the replacement of that directive by the
entire contents of the header. How the places are specified or the
header identified is implementation-defined.

The header needn't be a header *file*, but the standard specifically
calls it a "header".

[...]

>> It also has to process that text as C code. See n1570 6.10.2p2-3.
>
> Really, now. What is "C code"? Surely you don't mean "strictly
> conforming ISO C dialect".

I don't recall any one mentioning strict conformance. Something as
simple as printf("%d\n", INT_MAX) is not strictly conforming.

> Actual implementations of a POSIX-conforming <unistd.h> contain things
> like this:
>
> void _exit(int) __attribute__ ((noreturn));
>
> Is that "C code"?

Yes. It depends on certain extensions, as explicitly permitted by the C
standard. It could be part of a *conforming program* as defined by
section 4 paragraph 7.

> Even if the #include mechanism strictly works by substitution, and the result
> is earnestly processed from the first translation phase on up, that leaves a
> lot of room for all kinds of behavior.

Certainly.

[...]

Kaz Kylheku

unread,
Jun 12, 2014, 10:05:12 PM6/12/14
to
Why not? Not conforming to what requirement?

>>> as-if rule), it must work as a textual substitution of the contents of
>>> the header for the #include directive.
>>
>> "textual substitution" really amounts to "binary substitution": the bit string
>> #include <unistd.h> is replaced by another bit string.
>
> But that bit string must be interpreted as text, and furthermore, as C code.
>
>>> It also has to process that text as C code. See n1570 6.10.2p2-3.
>>
>> Really, now. What is "C code"? Surely you don't mean "strictly conforming ISO
>> C dialect".
>
> No. I mean as if the text had existed in the translation unit. It does
> not have to be conforming, or even valid, but any diagnostics that would
> be issued must still be issued, and if the text is strictly conforming
> then it must produce the required result when executed.

There is no "required result" becuase the contents are unknown.

The standard specifies an interpretation only for what is in the C program, not
for what may or may not be in some header pulled out of the implementation.

If your program consists of files, and one of those is "foo.h", then #include
"foo.h" has a precise interpretation: the substitution of "foo.h" at that
point.

The inclusion of a standard header like <limits.h> also has a precise
interpretation. Although this header is not in the program, the standard
defines what must be in it (what must be the effect of including it).

There is no interpretation for #include <fortran.h>. The file is not part of
the program and isn't a standard header.

The implementation can emit an unconditional diagnostic like "watch out: I'm
switching to Fortran", thereby satisfying the requirement for a diagnostic,
real or imagined.

There is no point in even implementing it as a substitution; it can be
treated as a canned command. No program can tell the difference.
If you cannot tell the difference, the difference isn't real.

(Why would the implementor of #include <fortran.h> bother with some
text file that contains #pragma go_into_fortran_mode, when
the program cannot tell the difference between that, and a built-in
implementation that just goes into Fortran mode without opening any file?)

Kaz Kylheku

unread,
Jun 12, 2014, 10:26:05 PM6/12/14
to
Paragraph 3 deals with #include "...". I'm only discussing
#include <...>, so I don't see it as relevant.

The first part of paragraph 2 deals with the searching for a header, which it
is irrelevant. I'm concerned with the case that the search for <fortran.h> or
whatever succeeds, somewhere in the internals of the implementation or out in
some /usr/nclude directory or whatever, so the search issue is settled.

The relevant part is "causes the replacement of that directive by the entire
contents of the header.".

Sure, so if the replacement happens to begin with something normal like

int foo(void);

that part should be processed normally.

This kind of thing isn't anything you can detect with a program though;
you have to peek into the internals of the implementation.

You yourself agree that the file need not actually contain that text;
it could have another representation, such as a binary blob which has
the effect of:

int foo(void);

such a "pre-compiled" binary blob is certainly not being parsed according to
the syntax of C!

And that opens the door to other blobs that do something completely different
without a shred of C syntax.

The ISO C standards only specify the interpretation of C programs, not some
hypothetical contents of implementation-specific nonstandard headers.

The C construct is the #include <...>, and its behavior is in question.
This is basically indivisible. The requirements about how it works
are not requirements that pertain to a C program.

Really, the whole bit about replacing the directive with the full contents
is completely pointless. It is an unenforceable, untestable requirement
that has no bearing on the interpretation of C programs or meaningful
implementation conformance.

Martin Shobe

unread,
Jun 13, 2014, 1:06:39 AM6/13/14
to
Okay. I overstated it here. It's conforming only if it documents that it
will look for headers there, and what constitutes finding <unistd.h> in
that location. (From 6.10.2p2 "How the places are specified or the
header identified is implementation-defined.") Please note that if it
was undefined behavior, the implementation wouldn't have to have that
documentation.

>>>> as-if rule), it must work as a textual substitution of the contents of
>>>> the header for the #include directive.
>>>
>>> "textual substitution" really amounts to "binary substitution": the bit string
>>> #include <unistd.h> is replaced by another bit string.
>>
>> But that bit string must be interpreted as text, and furthermore, as C code.
>>
>>>> It also has to process that text as C code. See n1570 6.10.2p2-3.
>>>
>>> Really, now. What is "C code"? Surely you don't mean "strictly conforming ISO
>>> C dialect".
>>
>> No. I mean as if the text had existed in the translation unit. It does
>> not have to be conforming, or even valid, but any diagnostics that would
>> be issued must still be issued, and if the text is strictly conforming
>> then it must produce the required result when executed.
>
> There is no "required result" becuase the contents are unknown.

But there are some restrictions, unlike with undefined behavior.

> The standard specifies an interpretation only for what is in the C program, not
> for what may or may not be in some header pulled out of the implementation.

While it doesn't tell you what is in the header, it does tell you how
the contents of the header have to be processed.

> If your program consists of files, and one of those is "foo.h", then #include
> "foo.h" has a precise interpretation: the substitution of "foo.h" at that
> point.
>
> The inclusion of a standard header like <limits.h> also has a precise
> interpretation. Although this header is not in the program, the standard
> defines what must be in it (what must be the effect of including it).
>
> There is no interpretation for #include <fortran.h>. The file is not part of
> the program and isn't a standard header.
>
> The implementation can emit an unconditional diagnostic like "watch out: I'm
> switching to Fortran", thereby satisfying the requirement for a diagnostic,
> real or imagined.

If it were undefined behavior, it wouldn't even have to issue the
diagnostic.

> There is no point in even implementing it as a substitution; it can be
> treated as a canned command. No program can tell the difference.
> If you cannot tell the difference, the difference isn't real.

Applications of the "as-if" rule are not undefined behavior.

Martin Shobe

Kaz Kylheku

unread,
Jun 13, 2014, 1:31:50 PM6/13/14
to
On 2014-06-13, Martin Shobe <martin...@yahoo.com> wrote:
> On 6/12/2014 9:05 PM, Kaz Kylheku wrote:
>> There is no "required result" becuase the contents are unknown.
>
> But there are some restrictions, unlike with undefined behavior.

The restrictions are only that in the construct #include <aaa>,
aaa can be a valid operand: namely a standard header, in whcih case
the construct is replaced with the contents of the standard header
described in the Library clause. Then this "as if" kicks in: it doesn't
literally have to be textual substitution, only if the effect is the same.

If "aaa" isn't a standard behavior, then anything is possible.

An analogy can be made to

#if FOO / BAR

where BAR substitutes to 0.

Obviously, #if has a definition, and so does preprocessor arithmetic.

But whether the #if as a whole invokes undefined behavior, or just the division
operation, is not real (to me): it resembles the question of how many angels
fit on the head of a pin.

Martin Shobe

unread,
Jun 13, 2014, 3:48:25 PM6/13/14
to
On 6/13/2014 12:31 PM, Kaz Kylheku wrote:
> On 2014-06-13, Martin Shobe <martin...@yahoo.com> wrote:
>> On 6/12/2014 9:05 PM, Kaz Kylheku wrote:
>>> There is no "required result" becuase the contents are unknown.
>>
>> But there are some restrictions, unlike with undefined behavior.
>
> The restrictions are only that in the construct #include <aaa>,
> aaa can be a valid operand: namely a standard header, in whcih case
> the construct is replaced with the contents of the standard header
> described in the Library clause. Then this "as if" kicks in: it doesn't
> literally have to be textual substitution, only if the effect is the same.

There are further restrictions stipulated in 6.10.2p2. You keep ignoring
them.

> If "aaa" isn't a standard behavior, then anything is possible.
>
> An analogy can be made to
>
> #if FOO / BAR
>
> where BAR substitutes to 0.
>
> Obviously, #if has a definition, and so does preprocessor arithmetic.

A better analogy would be where we don't know what BAR is. (Your
argument for calling #include <unistd.h> "undefined behavior" centers on
not knowing its contents.) Calling the if directive "undefined behavior"
even in the case where BAR isn't 0 does not fit with the standard's
definition of "undefined behavior".

Martin Shobe

Tim Rentsch

unread,
Jun 13, 2014, 11:05:17 PM6/13/14
to
Kaz Kylheku <k...@kylheku.com> writes:

> On 2014-06-12, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Kaz Kylheku <k...@kylheku.com> writes:
>>
>>> On 2014-06-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>>>> Kaz Kylheku <k...@kylheku.com> writes:
>>>>
>>>> [snip]
>>>>
>>>>> Programmers do not understand "undefined behavior". I once made
>>>>> an answer on the stackoverflow site (to a question about
>>>>> undefined behavior) in which I argued that "undefined beahvior"
>>>>> isn't "wrong behavior" or "bad behavior".
>>>>>
>>>>> I gave the example that #include <unistd.h> is undefined
>>>>> behavior. [snip]
>>>>
>>>> It's a bad example as the behavior of #include <unistd.h> is
>>>> not in fact undefined. In particular, 6.10.2 p2 (the same as
>>>> 6.8.2 p2 in C89) defines it.
>>>
>>> Suppose the search succeeds in finding a header, and the
>>> #include <unistd.h> directive is now replaced by the header's
>>> contents. So far so good; but what is the subsequent behavior
>>> when that substituted material begins to be processed?
>>
>> That is determined of course by the contents of the header, just
>> like what happens for #include "foo.h". The Standard makes no
>> guarantees about what is in the header <unistd.h>, just like it
>> makes no guarantees for what is in "foo.h".

I have read ahead and am giving a consolidated response to
several postings. In the following the term "header" should
be taken to mean a header other than a standard header.

You make several claims regarding headers and how they work.
I repeat them here as best I can.

1. Headers are part of the implementation.

2. Included headers are not part of the C program.

3. The content of headers is not C code (or needn't be
interpreted or interpretable as C code).

4. Implementations can do anything they choose upon
encountering a #include for a header (or equivalently
they can process the contents of a header any way
they choose).

5. What is done for a #include'd header need not be
documented.

6. Because the contents of a header are unknown, any
behavior is possible (ie, allowed by the Standard).

7. Since the Standard does not define what happens when
any particular (for emphasis, non-standard) header
is #include'd, the result of doing such an #include
is necessarily undefined behavior.

8. What happens when a header is #include'd is not
defined by the Standard, unlike the case for when
a source file is #include'd (eg, #include "foo.h").

9. If I believe that what happens when a header is #include'd
is defined the Standard, then I don't understand what the
Standard is saying (and implicitly that you do).

Is this a fair recitation of what your claims are?

Let's take these points one by one.

Point 1. Headers can be part of an implementation but they don't
have to be. Under 6.10.2 p2, where headers are searched for and
how they are identified is implementation-defined. If I can
arrange to put a header in one of the places the implementation
searches, which on my system I certainly can, then that header
will be found by a #include <whatever.h> even though <whatever.h>
is not part of the implmentation.

However, if a header /is/ part of an implementation (and not a
standard header), then it qualifies as an extension, and under
section 4 p8 all extensions must be documented. To say that
another way, if an implementation does /not/ document some
particular (non-standard) header, then that header cannot be
part of the implementation.

Point 2. Section 5.1.1.1 p1 makes clear that any #include'd
headers are part of, after pre-processing, the corresponding
translation unit, and therefore part of the C program.

Point 3. Section 5.1.1.2 p1 part 4 makes clear that #include'd
headers are processed in exactly the same way as #include'd
soure files. That is, the Standard requires the implementation
to treat them as C code.

Point 4. Section 6.10.2 p2 says where headers are looked for
and how they are identified. Section 5.1.1.2 explains how they
must be processed. How a #include'd header is processed is just
as constrained, and just as defined, as #include'ing a source
file.

Point 5. If the header in question is part of the implementation
then indeed it must be documented as an extension, under 4 p8.
If the implementation fails to provide such documentation, then
the header in question is not part of the implementation; the
contents of such a header need not be documented, but how they
are processed is defined under section 5.1.1.2.

Point 6. No one disagrees that, if the contents of a header are
ill-formed then undefined behavior can result. However, the
implmentation is obliged to process a header under the rules of
5.1.1.2, so a header whose contents are well-formed will have
defined behavior. If the implementation does not provide
documentation for the header in question, then per 4 p8 it is not
part of the implementation and outside the bounds of what the
implementation can choose to affect.

Point 7. Look at the definition of undefined behavior in 3.4.3.
Certainly the #include statement itself has requirements imposed
upon it, and so is not undefined behavior. The only other
possibility for undefined behavior (since no run-time behavior is
involved) is encountering a program construct that has no
requirements imposed on it. However this depends on the content:
as long as the content is well-formed, then by definition there
are no such program constructs. So again the result of
#include'ing a header can give undefined behavior, but it is not
/necessarily/ undefined behavior. Indeed it will not be whenever
the contents are well-formed.

Point 8. Under 6.10.2 p2,3 and 5.1.1.2, what happens when a
header is included is exactly as defined as when a source
file is included. In both cases well-formed content implies
defined behavior.

Point 9. In every point you have raised, I have responded
with a citation from the Standard supporting my position and
contradicting yours. If you can't support your position(s)
with some specific citations from the Standard, I see no
reason to conclude that your interpretations are right and
mine are wrong.

Kaz Kylheku

unread,
Jun 14, 2014, 3:20:42 PM6/14/14
to
The characterization of the claims is fair.

> 1. Headers are part of the implementation.
>
> 2. Included headers are not part of the C program.

I would say that included headers can contribute to the image of the
translation unit, but they are not part of the source program: not part of that
text which is interpreted by the standard.

An analogy here is the library. A C library linked with the user's translation
units is part of the translated and linked program, but it is not part of the C
program source that is subject to the semantic descriptions. It need not even
be written in C.

> 3. The content of headers is not C code (or needn't be
> interpreted or interpretable as C code).

This is correct, and is allowed by the "as if" principle in spite
of 5.1.1.2 (which says that a header is processed from translation phase 1).

So #include <stdio.h> can simply flip some bits in a compiler to open
access to ready made symbolic data structures which provide declarations.
(And those data structures can be prepared in ways that don't require
the processing of C source, such as by hand, or using a notation which is
not C.)

> 4. Implementations can do anything they choose upon
> encountering a #include for a header (or equivalently
> they can process the contents of a header any way
> they choose).

Yes, keeping in mind the introductory note that header refers
to "standard header".

> 5. What is done for a #include'd header need not be
> documented.

This is correct, and examples of this abound. Implemenations commonly have
headers which are internal, yet accessible via the #include mechanism.

> 9. If I believe that what happens when a header is #include'd
> is defined the Standard, then I don't understand what the
> Standard is saying (and implicitly that you do).
>
> Is this a fair recitation of what your claims are?

Pretty much.

>
> Let's take these points one by one.
>
> Point 1. Headers can be part of an implementation but they don't
> have to be. Under 6.10.2 p2, where headers are searched for and
> how they are identified is implementation-defined. If I can
> arrange to put a header in one of the places the implementation
> searches, which on my system I certainly can, then that header
> will be found by a #include <whatever.h> even though <whatever.h>
> is not part of the implmentation.

I would say that you are sticking the proverbial fork in the toaster
by doing that. This use case is not interesting to me. I'm concerned
with the case that #include <something.h> appears in the progarm,
and no arrangement was made to sneak such a header into the place
which is searched by the #include <...>.

There is no requirement that this be possible to do. The places searched
by <...> could be some read-only, non-user-serviceable part.

> However, if a header /is/ part of an implementation (and not a
> standard header), then it qualifies as an extension, and under
> section 4 p8 all extensions must be documented.

This does not help you. For example, void main(void) { }
could be considerd an extension, and documented.

Or the effect of fflush(stddin).

Moreover, as stated already, implementations often have undocumented
headers which are easily found, and are reachable by #include.

Using those headers is not only undefined behavior in the ISO C standard sense,
it is undefined by the implementation also; it is reliance upon undocumented
behavior.

> To say that
> another way, if an implementation does /not/ document some
> particular (non-standard) header, then that header cannot be
> part of the implementation.

That is incorrect. That header is not part of the documented
interface to the implementation. It may be a key component of
the implementation such that it if is tampered with or removed,
the implementation breaks.

> Point 2. Section 5.1.1.1 p1 makes clear that any #include'd
> headers are part of, after pre-processing, the corresponding
> translation unit, and therefore part of the C program.

5.1.1.1 does not state that that program has a well-defined
behavior after including an unknown header.

> Point 3. Section 5.1.1.2 p1 part 4 makes clear that #include'd
> headers are processed in exactly the same way as #include'd
> soure files. That is, the Standard requires the implementation
> to treat them as C code.

"C code" is a troublesome term, as I pointed out elsethread.

As a synonym for "material acceptable to an implementation" it is not
a suitable choice, because material acceptable to an implementation
may not resemble C.

I believe that 5.1.1.2 is subject to the latitude between abstract machine and
actual implementation.

Moreover, that is completely moot. Whether or not headers must be
processed exactly as stated (read as text from translation phase 1)
or not (could be binary files or just data structures in a compiler)
it doesn't rescue the claim that the behavior is well-defined
for any header whatsoever.

Either way, the proverbial header can make the proverbial demons fly out of
your nose.


> Point 4. Section 6.10.2 p2 says where headers are looked for
> and how they are identified. Section 5.1.1.2 explains how they
> must be processed. How a #include'd header is processed is just
> as constrained, and just as defined, as #include'ing a source
> file.

My point has never revolved around any notion that the mechanism of
inclusion is not well-defined.

An easy analogy can be made here to other features, like calling
a function, declaring a function or dereferencing a pointer.

There are aspects of "void main(void) { }" or "fflush(stdin)"
which have a well-defined description.

> Point 6. No one disagrees that, if the contents of a header are
> ill-formed then undefined behavior can result.

If the contents of a header can be ill-formed or nonexistent
in any conforming implementation, then its use is not behavior
which is defined in the standard.

Some aspects of the inclusion of that header are covered by semantic
description, but not the overall operation.

#include <unistd.h>
/* X */

is not required anywhere in the standard to successfully bring in
some material, and then continue scanning at point /* X */.

I do not dispute that #include <X> is a kind of operator, and X is
is operand, and that the operator denotes a search for something
identified by X.

Your point rests on the very incorrect notion that since there *can* exist
content behind X defined by some implementations as a documented extenion, the
behavior is not undefined.

> Point 9. In every point you have raised, I have responded
> with a citation from the Standard supporting my position and
> contradicting yours. If you can't support your position(s)
> with some specific citations from the Standard

Of course I cannot!

If a behavior is undefined because it is not defined anywhere in the standard
(not because it is a violation of a requirement, or explicitly described as
undefined behavior), we cannot cite the section which makes it undefined,
because there is no such section.

If you think some text does define the meaning of #include <unistd.h>
(the complete meaning, beyond just the mechanism of searching for a header),
then please cite that.

The sections you have referenced so far do not help you, because you're
ultimately appealing to the wrong principle that content made available by
specific implementations as a documented extension somehow back-propagates a
definition of behavior to the standard; i.e. that "gcc-defined",
"POSIX-defined" or "Ubuntu-defined" or "Intel-defined" amount to "standard
defined".

Tim Rentsch

unread,
Jun 26, 2014, 6:41:20 PM6/26/14
to
The Standard neither defines nor uses the term "source program".
Section 5.1.1.1, titled "Program Structure", establishes that
programs are made up of separately compiled translations units,
and that translation units comprise, after preprocessing, the
contents of all included headers and included source files. The
text in 5.1.1.2 p4 makes clear that included headers are
processed in exactly the same manner as included source files.

Your claim is inconsistent with the text of the Standard.

> An analogy here is the library. A C library linked with the
> user's translation units is part of the translated and linked
> program, but it is not part of the C program source that is
> subject to the semantic descriptions. It need not even be written
> in C.

The analogy is faulty because the Standard does describe, in
detail, how included headers are processed, but does not give
such detail for library components. Compare 5.1.1.2 p4 and
5.1.1.2 p8. The analogy is thus irrelevant to the question
of how headers are processed.

>> 3. The content of headers is not C code (or needn't be
>> interpreted or interpretable as C code).
>
> This is correct, and is allowed by the "as if" principle in spite
> of 5.1.1.2 (which says that a header is processed from translation
> phase 1).

Your understanding of the "as if" principle, to use your phrase,
is flawed. See below.

> So #include <stdio.h> can simply flip some bits in a compiler to
> open access to ready made symbolic data structures which provide
> declarations. (And those data structures can be prepared in ways
> that don't require the processing of C source, such as by hand, or
> using a notation which is not C.)

This conclusion is based on a false assumption, and so
incorrect. See below.

>> 4. Implementations can do anything they choose upon
>> encountering a #include for a header (or equivalently
>> they can process the contents of a header any way
>> they choose).
>
> Yes, keeping in mind the introductory note that header refers to
> "standard header".
>
>> 5. What is done for a #include'd header need not be
>> documented.
>
> This is correct, and examples of this abound. Implemenations
> commonly have headers which are internal, yet accessible via the
> #include mechanism.

You are offering a proof by example, but the examples don't
matter - only what is said in the Standard matters. Any header
provided by an implementation that is #include'able qualifies as
an extension to what the Standard describes, and section 4 p8
requires implementations to document all extensions. Your
position is inconsistent with the text of the Standard.

>> 9. If I believe that what happens when a header is #include'd
>> is defined the Standard, then I don't understand what the
>> Standard is saying (and implicitly that you do).
>>
>> Is this a fair recitation of what your claims are?
>
> Pretty much.
>
>>
>> Let's take these points one by one.
>>
>> Point 1. Headers can be part of an implementation but they don't
>> have to be. Under 6.10.2 p2, where headers are searched for and
>> how they are identified is implementation-defined. If I can
>> arrange to put a header in one of the places the implementation
>> searches, which on my system I certainly can, then that header
>> will be found by a #include <whatever.h> even though <whatever.h>
>> is not part of the implmentation.
>
> I would say that you are sticking the proverbial fork in the
> toaster by doing that. This use case is not interesting to me.
> I'm concerned with the case that #include <something.h> appears in
> the progarm, and no arrangement was made to sneak such a header
> into the place which is searched by the #include <...>.

My position is that non-standard headers may be, but needn't
necessarily be, part of an implementation. Your position is that
all headers, either standard or non-standard, are always part of
said implementation. Whether this case if of interest to you or
not, it proves my point. So either you didn't understand what
point I was making, or your logic is flawed.

> There is no requirement that this be possible to do. The places
> searched by <...> could be some read-only, non-user-serviceable
> part.

That is irrelevant since it is possible to do in some cases,
which proves my point that even though headers may be part of an
implementation, they need not be part of said impementation.
(Morever they may be part of one implementation but not another
on the same system.)

>> However, if a header /is/ part of an implementation (and not a
>> standard header), then it qualifies as an extension, and under
>> section 4 p8 all extensions must be documented.
>
> This does not help you. For example, void main(void) { }
> could be considerd an extension, and documented.
>
> Or the effect of fflush(stddin).

Your logic is screwy. The Standard requires all extensions to be
documented. Here is section 4 p8 in its entirety:

An implementation shall be accompanied by a document that
defines all implementation-defined and locale-specific
characteristics and all extensions.

Notice the phrase, "all extensions". The existence of other
documented extensions doesn't remove the obligation to document
non-standard headers, if they are part of the implementation.
Any non-standard header not documented therefore cannot be part
of a conforming implementation.

> Moreover, as stated already, implementations often have undocumented
> headers which are easily found, and are reachable by #include.

Again, giving existing implementations as examples is irrelevant.
The only thing that matters is what the Standard has to say on
the subject.

> Using those headers is not only undefined behavior in the ISO C
> standard sense, it is undefined by the implementation also; it is
> reliance upon undocumented behavior.

A completely unsupported statement. The issue here is exactly
the point under discussion, so you are simply begging the
question. If you can't offer any Standard citations supporting
your comments there is no reason anyone should believe them.

>> To say that
>> another way, if an implementation does /not/ document some
>> particular (non-standard) header, then that header cannot be
>> part of the implementation.
>
> That is incorrect. That header is not part of the documented
> interface to the implementation. It may be a key component of
> the implementation such that it if is tampered with or removed,
> the implementation breaks.

Even if the last predicate is true, it doesn't remove the
obligation to document all extensions. One of three things
is always true regarding any non-standard header: (1) the
implementation documents the header; (2) the header is not
part of the implementation; or (3) the implementation is
not a conforming implementation of C. No other cases are
possible. Feel free to re-read section 4, "Conformance".

>> Point 2. Section 5.1.1.1 p1 makes clear that any #include'd
>> headers are part of, after pre-processing, the corresponding
>> translation unit, and therefore part of the C program.
>
> 5.1.1.1 does not state that that program has a well-defined
> behavior after including an unknown header.

It doesn't need to because 5.1.1.2 defines the behavior.
Furthermore the Standard never gives an explicit statement,
except non-normatively in a few examples of specific code, that
behavior under certain circumstances is defined. To do so is
redundant when the text gives the definition.

>> Point 3. Section 5.1.1.2 p1 part 4 makes clear that #include'd
>> headers are processed in exactly the same way as #include'd
>> soure files. That is, the Standard requires the implementation
>> to treat them as C code.
>
> "C code" is a troublesome term, as I pointed out elsethread.
>
> As a synonym for "material acceptable to an implementation" it is
> not a suitable choice, because material acceptable to an
> implementation may not resemble C.

Are you being obtuse? To treat something as C code is to
interpret it according to the rules of 5.1.1.1, 5.1.1.2, etc,
with all the same semantics as what the Standard decribes
for the C language. To quote from the Abstract:

This International Standard specifies the form and
establishes the interpretation of programs expressed in
the programming language C.

The sections 5.1.1.1 and 5.1.1.2 make clear that any content that
was #include'ed, whether from a header or a source file, is
treated in exactly the same way as a physical source file that is
given at the top level, ie, not #include'ed.

> I believe that 5.1.1.2 is subject to the latitude between abstract
> machine and actual implementation.

You are mistaken. The section 5.1.2.3, titled Program Execution,
gives in paragraph 1 the statement about an abstract machine.
(This section is part of 5.1.2, titled Execution Environments.)
The entire section 5.1.2 has no bearing on section 5.1.1, which
is solely about program translation, not program execution.

> Moreover, that is completely moot. Whether or not headers must be
> processed exactly as stated (read as text from translation phase
> 1) or not (could be binary files or just data structures in a
> compiler)

The claim here (specifically Point 3) is that how headers are
processed is well-defined, and that is true for all headers with
5.1.1.1 and 5.1.1.2 giving the (top level of) the definition.
You have offered no evidence to support the last parenthetical
phrasee - again begging the question.

> it doesn't rescue the claim that the behavior is
> well-defined for any header whatsoever.
>
> Either way, the proverbial header can make the proverbial demons
> fly out of your nose.

More unsupported statements, and again begging the question.

>> Point 4. Section 6.10.2 p2 says where headers are looked for
>> and how they are identified. Section 5.1.1.2 explains how they
>> must be processed. How a #include'd header is processed is just
>> as constrained, and just as defined, as #include'ing a source
>> file.
>
> My point has never revolved around any notion that the mechanism
> of inclusion is not well-defined.

Why do you keep bringing it up then?

> An easy analogy can be made here to other features, like calling a
> function, declaring a function or dereferencing a pointer.
>
> There are aspects of "void main(void) { }" or "fflush(stdin)"
> which have a well-defined description.

These comments are related to the next point, so I will
fold any responses in to those for the next section.

(There were no responses given to Points 5, 7, or 8.)

>> Point 6. No one disagrees that, if the contents of a header are
>> ill-formed then undefined behavior can result.
>
> If the contents of a header can be ill-formed or nonexistent in
> any conforming implementation, then its use is not behavior which
> is defined in the standard.

You seem to think that's synonymous with undefined behavior.
It isn't. See section 3.4.3 for the definition.

> Some aspects of the inclusion of that header are covered by
> semantic description, but not the overall operation.
>
> #include <unistd.h>
> /* X */
>
> is not required anywhere in the standard to successfully bring in
> some material, and then continue scanning at point /* X */.

That's true, but that doesn't make the #include line undefined
behavior. The #include line might _result_ in undefined
behavior, depending on the contents of <unistd.h>, but the
#include line itself is defined, not undefined. Furthermore
you snipped out the important follow-on Point 8, which
explains that if the content of <unistd.h> is well-formed,
then the behavior of #include <unistd.h> is well-defined,
in which case scanning must continue at point /* X */.
So either you don't understand my position or you are
misrepresenting it.

> I do not dispute that #include <X> is a kind of operator, and X is
> is operand, and that the operator denotes a search for something
> identified by X.
>
> Your point rests on the very incorrect notion that since there
> *can* exist content behind X defined by some implementations as a
> documented extenion, the behavior is not undefined.

Again begging the question. Either that or you are confused
about the logic involved.

Your position is that #include <unistd.h> is always undefined
behavior. My position is that #include <unistd.h> is defined
behavior if the contents of <unistd.h> is well-formed. Note that
these positions are in conflict: they can't both be true.

A good analogy here is the behavior of a call to scanf().
Depending on the arguments supplied, and the state of the input
stream, a call to scanf() can result in undefined behavior. That
doesn't mean all calls to scanf() are undefined behavior -- some
of them are, others are not. Similarly a #include <unistd.h>
might result in undefined behavior, but that doesn't mean having
a #include <unistd.h> is always undefined behavior: whether it
is or not depends on the contents of <unistd.h> (and also it
being there, which doesn't matter here since that is covered by
a constraint violation).

>> Point 9. In every point you have raised, I have responded
>> with a citation from the Standard supporting my position and
>> contradicting yours. If you can't support your position(s)
>> with some specific citations from the Standard
>
> Of course I cannot!
>
> If a behavior is undefined because it is not defined anywhere in
> the standard (not because it is a violation of a requirement, or
> explicitly described as undefined behavior), we cannot cite the
> section which makes it undefined, because there is no such
> section.

I've explained repeatedly that the semantics of such #include's
is defined in 5.1.1.1, 5.1.1.2, etc.

> If you think some text does define the meaning of #include
> <unistd.h> (the complete meaning, beyond just the mechanism of
> searching for a header), then please cite that.

The meaning is determined by the contents of the header. See
the above analogy with scanf().

> The sections you have referenced so far do not help you, because
> you're ultimately appealing to the wrong principle that content
> made available by specific implementations as a documented
> extension somehow back-propagates a definition of behavior to the
> standard; i.e. that "gcc-defined", "POSIX-defined" or
> "Ubuntu-defined" or "Intel-defined" amount to "standard defined".

Once again you have failed to give even one citation from the
Standard supporting your various positions. In contrast I have
given a citation for nearly every point responded to.

Considering all the above, I think one of two things is true (or
possibly both points are true, but no matter). Either you are
confused about the meaning of the phrase "undefined behavior", or
you have just made up your mind and are attempting to supply a
Proof by Repeated Assertion. So I hope if you respond you'll
have something new to offer rather than just more unsupported
statements.

Kaz Kylheku

unread,
Jun 27, 2014, 2:16:07 PM6/27/14
to
On 2014-06-26, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Your position is that #include <unistd.h> is always undefined
> behavior.

That isn't my position. It's possible for a program to have a "unistd.h"
header, along with some instructions that it is to be presented for translation
in such a way that #include <unistd.h> finds that header (in preference to any
other matching header). In my experience, there exist plenty of code bases
that use <> inclusion for their own materials; these programs can be interpreted
to have a meaning if we accept the conventions of their structure.

And, of course, #include <unistd.h> can be conditionally
skipped.

I am not interested in these situations; my focus is on programs which do
include <unistd.h>, and expect it to be externally provisioned.

Simply put, If I hold such a program in my left hand, and the standard in the
right, then between these two deliverables, I cannot determine the meaning of
the program.

For instance, this program:

#include <unistd.h>
int main(void) { return 0; }

My whole argument is that this program has no meaning, based only on the
C standard (and neither do other programs which share the same issue as this
example).

Those who think it has a meaning have merely to reveal what it is.

> My position is that #include <unistd.h> is defined
> behavior if the contents of <unistd.h> is well-formed.

I agree with this conditional; unfortunately, a conditional is vacuous when
its antecedent is false.

In the situations covered by my argument, no requirement emanates from the
standard that the header should exist and have well-formed content.
Thus, we do not have well-formed content.

If content is provisioned, it is no different from any other conforming
extension based on undefined behavior, such as accepting void main(void) { } as
a startup function or fflush(stdin).

Tim Rentsch

unread,
Jul 3, 2014, 1:59:38 PM7/3/14
to
The mistake is thinking that this is the same as undefined
behavior. It isn't. If you think it is then you don't
understand the term (ie, as the ISO standard defines it).

Kaz Kylheku

unread,
Jul 3, 2014, 2:18:53 PM7/3/14
to
I believe that I am justified in my claim that it's undefined behavior, because
one ecategory of undefined behavior includes situations for which there is an
absence of requirements. The program contains some construct or run-time
situation or whatever and, try as we might, we cannot determine *at all* what the
beahvior should be from the standard alone.

If you don't believe it's undefined behavior, then what category
of behavior is it? We have:

1 unspecified
2 implementation-defined
3 defined
4 ... anything else?

1. I do not believe it is unspecified, because unspecified behavior requires a
choice from among some enumeration of possible behaviors. For instance,
in the function call f(g(), h()) it is unspecified behavior whether g()
or h() is called first, but it must be one of these two possibilities.
There is no limit on what #include <unistd.h> could bring in, or what
the subsequent behavior could be.

2. I do not believe it is implementation-defined, because that is similar
to unspecified behavior, with the additional requirement that the choice
be documented. I do not believe that implementations are requirement by ISO C
to document their response to #include <unistd.h>.
If it is indeed a deliberate extension, then I agree it should be documented;
but even that does not preclude it being undefined behavior. Just like if an
implementation chooses to extend the startup function to be void main(void),
and documents it, it is still undefined behavior for a program to use it.

3. It could be, but my own efforts have failed in finding a definition; I
here defer to someone else.

4. I'm not aware of any other classification of behavior.

Keith Thompson

unread,
Jul 3, 2014, 3:28:40 PM7/3/14
to
Kaz Kylheku <k...@kylheku.com> writes:
[...]
> If it is indeed a deliberate extension, then I agree it should be documented;
> but even that does not preclude it being undefined behavior. Just like if an
> implementation chooses to extend the startup function to be void main(void),
> and documents it, it is still undefined behavior for a program to use it.
[...]

I'm not convinced using void main(void) if the implementation documents
it as an extension is undefined behavior.

N1570 5.1.2.2.3p2 says:

If the return type is not compatible with int, the termination
status returned to the host environment is unspecified.

(IMHO it would make more sense for the termination status to be
implementation-defined rather than unspecified, but there it is.)

In any case, it's only the termination status, not the behavior prior
to termination, that's unspecified. Everything else is defined
(or not) in the same way as if the program used "int main(void)"
rather than "void main(void)". (Again, this applies only if the
implementation documents "void main(void)" as an acceptable form.)

If a program uses "void main(void)" under an implementation that
*doesn't* document that form, then its behavior is undefined (because
it violates a "shall" outside a constraint). If, as you claim,
it's also undefined even for implementations that do document such
a form, then the phrase "or in some other implementation-defined
manner" in 5.1.2.2.1p1 is completely meaningless, which I presume
is not the intent.

Summary: Under a hosted implementation that does not document
additional forms for defining main, this program:

void main(void){}

has undefined behavior. Under a hosted implementation that documents
"void main(void)", it has well defined behavior, except that the
termination status is unspecifed. (Of course an implementation
is free to document the termination status anyway, but I don't
believe the standard requires it to obey its own documentation in
such a case.)

(Not that it matters, but it's not clear that "void main(void)"
would be an "extension" as the Standard uses that term; rather it's
an "implementation-defined characteristic" as mentioned in 4p8.)

Richard Bos

unread,
Jul 3, 2014, 4:34:26 PM7/3/14
to
Keith Thompson <ks...@mib.org> wrote:

> If a program uses "void main(void)" under an implementation that
> *doesn't* document that form, then its behavior is undefined (because
> it violates a "shall" outside a constraint). If, as you claim,
> it's also undefined even for implementations that do document such
> a form, then the phrase "or in some other implementation-defined
> manner" in 5.1.2.2.1p1 is completely meaningless, which I presume
> is not the intent.

Hm. AIUI, the intent behind that clause was entirely political, meant to
appease certain compiler vendors (Microsoft, perhaps?); as such, it may
well be completely meaningless.

Richard

Kaz Kylheku

unread,
Jul 3, 2014, 6:06:38 PM7/3/14
to
On 2014-07-03, Keith Thompson <ks...@mib.org> wrote:
> Kaz Kylheku <k...@kylheku.com> writes:
> [...]
>> If it is indeed a deliberate extension, then I agree it should be documented;
>> but even that does not preclude it being undefined behavior. Just like if an
>> implementation chooses to extend the startup function to be void main(void),
>> and documents it, it is still undefined behavior for a program to use it.
> [...]
>
> I'm not convinced using void main(void) if the implementation documents
> it as an extension is undefined behavior.

This view seems too leads to the conclusion is that nothing is undefined
behavior. Everything that we understand to be undefined behavior could be
documented by some imaginary implementation. a[i] = i++, division by zero, void
main, fflush(stdin), you name it.

I'm convinced that these are all *ISO C* undefined behavior, local
definitions notwithstanding.

Local extensions are defined only when we consider the standard, and that
implementation's document together. The providence of the behavior is traced
back to the implementation's document, not to the standard; it
is not "ISO C standard-defined".

For implementations to make someting standard-defined, they would have to
cause new text to magically appear in the standard.

> N1570 5.1.2.2.3p2 says:
>
> If the return type is not compatible with int, the termination
> status returned to the host environment is unspecified.

That text is pointless. If the return type is not compatible with int,
then main does not have one of the two standard-defined type signatures.

And so any response to the construct is conforming.

> (IMHO it would make more sense for the termination status to be
> implementation-defined rather than unspecified, but there it is.)

It makes no sense to give requirements about something that is not
required in the first place, as a whole.

Think about it: one implementation can ignore the void main(void) situation,
with or without a diagnostic message, with unpredictable results.
The power supply of the build machine can catch fire, yet the implementation is
conforming.

But another implementation that supports a different main that returns int has
to have a stable termination status?

It just does not compute.

If you leave something undefined (so that implementors can pick up the
slack and add their own requirements) you can't dictate how those local
requirements should be written.

> If a program uses "void main(void)" under an implementation that
> *doesn't* document that form, then its behavior is undefined (because
> it violates a "shall" outside a constraint).

So if all I have is a hard copy of that program in one hand,
and the standard in the other, what is the meaning of the program?

It is in the hands of whatever implementation it is intended for.

> If, as you claim,
> it's also undefined even for implementations that do document such
> a form, then the phrase "or in some other implementation-defined
> manner" in 5.1.2.2.1p1 is completely meaningless, which I presume
> is not the intent.

The intent may not be that the text be meaningless, but that is the effect.

In documentation, meaningless happens.

> (Not that it matters, but it's not clear that "void main(void)"
> would be an "extension" as the Standard uses that term; rather it's
> an "implementation-defined characteristic" as mentioned in 4p8.)

"implementation-defined" is something that must be provided,
but a choice exists, which must be documented.

For instance, choices for aspects of the representation of int have to be
settled by the implementation and given. The type int is a characteristic of
the C language, and its features are characteristics also: some of those
features are implementation-defined, hence they are implementation-defined
characteristics.

Since void main (void) isn't described anywhere, it's not a characteristic.

Martin Shobe

unread,
Jul 7, 2014, 6:02:09 PM7/7/14
to
On 7/3/2014 5:06 PM, Kaz Kylheku wrote:
> On 2014-07-03, Keith Thompson <ks...@mib.org> wrote:
>> Kaz Kylheku <k...@kylheku.com> writes:
>> [...]
>>> If it is indeed a deliberate extension, then I agree it should be documented;
>>> but even that does not preclude it being undefined behavior. Just like if an
>>> implementation chooses to extend the startup function to be void main(void),
>>> and documents it, it is still undefined behavior for a program to use it.
>> [...]
>>
>> I'm not convinced using void main(void) if the implementation documents
>> it as an extension is undefined behavior.
>
> This view seems too leads to the conclusion is that nothing is undefined
> behavior. Everything that we understand to be undefined behavior could be
> documented by some imaginary implementation. a[i] = i++, division by zero, void
> main, fflush(stdin), you name it.

> I'm convinced that these are all *ISO C* undefined behavior, local
> definitions notwithstanding.

The void main(void) situation is something of a special case. The reason
is that implementations that document their acceptance of void
main(void) will have met the documentation requirements set under
5.1.2.2.1 p1 (in n1570). This means that it's not undefined behavior
since the rest of the C standard applies to the rest of the program.
This is unlike the other cases you mentioned where, even if the behavior
were documented by the implementation, the C standard would still
consider it undefined behavior.

[snip]

>> N1570 5.1.2.2.3p2 says:
>>
>> If the return type is not compatible with int, the termination
>> status returned to the host environment is unspecified.

> That text is pointless. If the return type is not compatible with int,
> then main does not have one of the two standard-defined type signatures.

> And so any response to the construct is conforming.

As long as it doesn't violate some other part of the standard. It would
be hard to see how it could violate some other part of the standard in
this case, but any behavior that did would be non-conforming.

[snip]

Martin Shobe

Tim Rentsch

unread,
Jul 9, 2014, 1:04:12 PM7/9/14
to
Believe it or not I understood what you were saying the first
five times you said it. Saying the same thing again does not
change either my understanding or the (in)correctness of your
comments.

In posts upthread I have repeatedly given citations explaining
why the behavior is defined and what portions of the Standard
define it. Your pattern is to ignore those citations, or simply
assert that they are irrelevant, give no citations to the
Standard in response, and just repeat your previously stated
position. Let me say this quite clearly: if you DO NOT CITE any
specific portions of the Standard in response, then what you are
offering is NOTHING MORE THAN A PROOF BY REPEATED ASSERTION. So
you might want to decide whether you want to support your claim
by providing specific citations that underpin your reasoning,
or simply continue repeating yourself without any evidence to
back up your arguments.

> If you don't believe it's undefined behavior, then what category
> of behavior is it?

See above.

> We have:
>
> 1 unspecified
> 2 implementation-defined
> 3 defined
> 4 ... anything else?
>
> 1. I do not believe it is unspecified, because unspecified
> behavior requires a choice from among some enumeration of
> possible behaviors. For instance, in the function call f(g(),
> h()) it is unspecified behavior whether g() or h() is called
> first, but it must be one of these two possibilities. There is
> no limit on what #include <unistd.h> could bring in, or what the
> subsequent behavior could be.
>
> 2. I do not believe it is implementation-defined, because that
> is similar to unspecified behavior, with the additional
> requirement that the choice be documented. I do not believe
> that implementations are requirement by ISO C to document their
> response to #include <unistd.h>.
>
> If it is indeed a deliberate extension, then I agree it should be
> documented; but even that does not preclude it being undefined
> behavior. Just like if an implementation chooses to extend the
> startup function to be void main(void), and documents it, it is
> still undefined behavior for a program to use it.

This inference is incorrect. If an implementation documents an
allowed form of main as 'void main(void)', then using that form
of main() on that implementation is defined behavior. In
particular, the function's behavior is defined by section 6.5 and
by 5.1.2.2.1 p1 and 5.1.2.2.3 p1 (and the preprocessor, etc).
The final clause of the first paragraph of 5.1.2.2.1 allows use
of such a form of main without violating a 'shall' requirement.
Since no 'shall' requirements are violated, and program behavior
is defined in absence of such a violation, programs can use such
a form of main() on those implementations and still have defined
behavior.

> 3. It could be, but my own efforts have failed in finding a
> definition; I here defer to someone else.
>
> 4. I'm not aware of any other classification of behavior.

Apparently you think program constructs that lead to undefined
behavior in some circumstances are undefined behavior in all
circumstances. This belief is wrong, and obviously so. For
example, the expression

32767 + 1

is undefined behavior on implementations with 16-bit int's, but
defined behavior on other implementations. Similarly a call to
scanf may result in undefined behavior if the input is bad, but
will be defined behavior with good input.

What happens when a preprocessor directive like

#include <unistd.h>

is encountered is defined behavior (sections 5.1.1 and 6.10).
Processing the contents of the header in question may lead to
undefined behavior, if the contents are ill-formed, but will not
lead to undefined behavior if the contents are well-formed. Like
what happens with a call to scanf(), the behavior of the construct
itself is defined. What happens when the construct is invoked may
lead to undefined behavior in some circumstances, but will
certainly not lead to undefined behavior in other circumstances.
The Standard doesn't say what the contents of any non-standard
header will be, but it does define how the header is located and
how its contents are processed. Implementations are obliged to
process the contents of headers following the specifications
stated in 5.1.1 and 6.10. Therefore it is certainly wrong to
say that #include <unistd.h> is necessarily undefined behavior,
since if it were then implementations could do whatever they
wanted regardless of the header's content. But headers with
well-formed content, which must be processed according to the
rules in 5.1.1 and 6.10, are required to produce well-defined
results. Because of that, the notion that any such #include
is (necessarily) undefined behavior is obviously wrong.

Kaz Kylheku

unread,
Jul 9, 2014, 3:26:48 PM7/9/14
to
On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Apparently you think program constructs that lead to undefined
> behavior in some circumstances are undefined behavior in all
> circumstances.

The best way to articulate my position is that constructs lead to undefined
behavior when we cannot deduce the behavior of the program, at least for some
implementations that have certain choices of characteristics, using only a copy
of the program and a copy of the standard, without resorting to our imagination
to *invent* a dialect which gives meaning to the program.

> This belief is wrong, and obviously so. For example, the expression
>
> 32767 + 1
>
> is undefined behavior on implementations with 16-bit int's, but defined
> behavior on other implementations.

That is true, and this hinges on the idea that the size and range of an integer
is an implementation-defined characteristic which must be selected.

In all implementations which choose the representation of int such that 32768
is in range of int, the calculation must yield 32768, and not any other value
or behavior. That requirement is standard-defined.

So when we interpret this construct, we just have to narrow down the possible
choices of an implementation-defined characteristic which make it work.
After noting this constraint, we can continue interpreting that program, still
relying on no other deliverables the standard, and that program itself.
Certainly not our imagination: we are making what is de facto a tight,
mathematical deduction.

Speaking of which, this is similar to derivation in mathematics. If a
beneficial step requires us to divide both sides of an equation by a variable,
we can keep going. But in that step, we note the constraint that further
derivation is only valid on the condition that the variable is nonzero. This
constraint doesn't change the meaning of division or the workings of algebra.

> Similarly a call to scanf may result in undefined behavior if the input is
> bad, but will be defined behavior with good input.
>
> What happens when a preprocessor directive like
>
> #include <unistd.h>
>
> is encountered is defined behavior (sections 5.1.1 and 6.10).

This is not similar to 32767 + 1 at all.

When we see #include <unistd.h>, we cannot simply assign a value, or range of
values, to some implementation-defined characteristic, and then keep
interpreting the meaning of the program, relying on nothing but that program
and the standard.

The problem is that now a portion of the program body is unknown; we do not
know what constitutes the translation unit after preprocessing!

I accept your well-supported, standard-backed argument that the program
consists of all the contents of all translation units after pre-processing.

But in cases like #include <unistd.h>, a chunk of the program is completely
unknown. Its providence is from outside of that program, and outside of the
standard.

We cannot deduce from an unknown. The only way we can interpret the meaning of
the program is to appeal to an external document, if there is one. The
definition is then courtesy of that document.

By contrast, we do not have to appeal to an external document to understand
32767 + 1; we can deduce that it requires int to be wider than 16 bits and keep
going.

So, apples and oranges.

By your argument regarding void main(void), the following also has defined
behavior, on those implementations where it does:

Program Foo;

Begin
Writeln('Hello World!');
End.

A C implementation just has to emit a diagnostic for the syntax rule violation,
and then treat it as Pascal, yet remain conforming.

Just like for void main (void), we can infer that, aha, this "C" program
requries a dialect that accepts Pascal programs. In those dialects, it is
well-defined.

But if we do that, we are guessing; our inference is not a deduction. We are
relying on exposure to other programming languages, and intuition. In essence,
our imagination. To be sure that that is in fact Pascal, we need to refer to an
implementation's document.

void main(void) is similarly not defined (in any sense) without a document
which specifies exactly what it does.

When we see a program that relies on void main(void), of course, the first step
in the reasoning is "it is undefined on numerous implementations that do not
accept void main(void)", just like "32676 + 1 is not defined on numerous
implemetations that have a 16 bit int".

However, we do not know what the definition is on the remaining set of possible
implementations, like we do for 32767 + 1.

To keep going, we have to invent a make-believe dialect in which it is a valid
form of the startup function, and pretend that the program is written in that
make-believe dialect.

That make-believe dialect is not produced by assigning a value to an
implementation-defined characteristic; it is purely a work of the imagination.

The imagination places few limits on what combination of characters may be
accepted as a valid program, so if the concept of standard-definedness is tied
to using your imagination, it is a broken concept.

Kaz Kylheku

unread,
Jul 9, 2014, 3:42:54 PM7/9/14
to
On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Believe it or not I understood what you were saying the first
> five times you said it. Saying the same thing again does not
> change either my understanding or the (in)correctness of your
> comments.
>
> In posts upthread I have repeatedly given citations explaining
> why the behavior is defined

What's needed is the HOW, not the WHY.

If the behavior is defined, well then, what is it?

> and what portions of the Standard define it.

You mean #include <unistd.h>?

Is this in C11 or a new draft?

I have an electronic copy of C99; do you believe C99 defines it?

Maybe my PDF reader is broken, because it fails to find the substring
"unistd" in the document. Maybe the string is encoded somehow, or
built from pieces using preprocesing?

> Your pattern is to ignore those citations, or simply
> assert that they are irrelevant, give no citations to the

None of your citations so far have revealed what the ISO-C-defined behavior is
for #include <unistd.h>: what are the contents of the resulting translation
unit and its meaning.

Please tell me what implementation-defined characteristics should be
given what values, so that then #include <unistd.h> has a meaning, deduced from
just the standard and the C program at hand.

> Standard in response, and just repeat your previously stated
> position.

I have made it clear in an earlier posting that I believe #include <unistd.h>
to be undefined because the standard doesn't contain anywhere, between its
front cover and back cover, a definition of the behavior.

I cannot cite a section from the standard, which indicates where it DOESN'T
define something. In effect, every section of the standard which doesn't
define it contributes to the undefined status.

Of course I do not believe that repetition has the power to prove!

I'm only repeating myself, with variations, because that strategy is rumoured
among educators to be somewhat effective on retarded children.

Keith Thompson

unread,
Jul 9, 2014, 4:35:21 PM7/9/14
to
Kaz Kylheku <k...@kylheku.com> writes:
> On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Apparently you think program constructs that lead to undefined
>> behavior in some circumstances are undefined behavior in all
>> circumstances.
>
> The best way to articulate my position is that constructs lead to undefined
> behavior when we cannot deduce the behavior of the program, at least for some
> implementations that have certain choices of characteristics, using only a copy
> of the program and a copy of the standard, without resorting to our imagination
> to *invent* a dialect which gives meaning to the program.
[...]

We are not limited to a copy of the program and a copy of the standard.
The standard explicitly requires each implementation to be "accompanied
by a document that defines all implementation- defined and
locale-specific characteristics and all extensions" (4p8). That
document may be used to determine the behavior of programs whose
behavior is implementation-defined.

Given this program:

void main(void) { }

plus a copy of the standard *and* a copy of the current implementation's
accompanying document that specifies
void main(void) { /* ... */ }
as a permitted definition, we can infer the behavior of the program for
that implementation (noting that "the termination status returned to the
host environment is unspecified", 5.1.2.2.3p1).

Of course the program's behavior is undefined for an implementation that
doesn't document that form.

Similar arguments apply to a program that does:

printf("%d\n", 32767+1);

Keith Thompson

unread,
Jul 9, 2014, 4:40:12 PM7/9/14
to
Kaz Kylheku <k...@kylheku.com> writes:
[...]
> You mean #include <unistd.h>?
>
> Is this in C11 or a new draft?
>
> I have an electronic copy of C99; do you believe C99 defines it?
>
> Maybe my PDF reader is broken, because it fails to find the substring
> "unistd" in the document. Maybe the string is encoded somehow, or
> built from pieces using preprocesing?
[...]

The C standard does not include the string "unistd", as I'm sure you
know.

The lack of any occurrence of "foobar" in the standard does not render
puts("foobar");
or
FILE *f = fopen("foobar", "r");
undefined.

6.10.2 discusses the syntax and semantics of
# include <h-char-sequence> new-line
of which
#include <unistd.h>
is an instance.

Kaz Kylheku

unread,
Jul 9, 2014, 6:09:27 PM7/9/14
to
On 2014-07-09, Keith Thompson <ks...@mib.org> wrote:
> Kaz Kylheku <k...@kylheku.com> writes:
>> On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>>> Apparently you think program constructs that lead to undefined
>>> behavior in some circumstances are undefined behavior in all
>>> circumstances.
>>
>> The best way to articulate my position is that constructs lead to undefined
>> behavior when we cannot deduce the behavior of the program, at least for some
>> implementations that have certain choices of characteristics, using only a copy
>> of the program and a copy of the standard, without resorting to our imagination
>> to *invent* a dialect which gives meaning to the program.
> [...]
>
> We are not limited to a copy of the program and a copy of the standard.

In that case, that will lead you to have different notions of defined behavior
from someone who believes to be so limited.

To me, the procedure involving just those two deliverables is the golden
procedure for any sort of standard-definedness testing.

(By the way, strictly speaking, we also require the documents which
are cited as normative references, at least if we require a definition
from one of them.)

> The standard explicitly requires each implementation to be "accompanied
> by a document that defines all implementation- defined and
> locale-specific characteristics and all extensions" (4p8). That

If we just use the standard and the program, we can supply these things
ourselves; we can be our own document.

> document may be used to determine the behavior of programs whose
> behavior is implementation-defined.

Without a doubt! It can even determine the behavior of acceptable programs
that are in a nonstandard dialect, or a different language entirely.

The standard begins with (test from C99):

1. Scope

This International Standard specifies the form and establishes the
interpretation of programs written in the C programming language.

It doesn't say, "this standard, plus the GCC and GNU C Library Manuals
and whatever else".

It is darn clear what is included and what isn't: footnotes and examples
out, normative references in.

So even if you see something that looks like a definition between the covers,
it might not actually be a definition: is it in a footnote, annex or example?

A third party header file does not even have the normative status of a footnote
or annex.

> Given this program:
>
> void main(void) { }
>
> plus a copy of the standard *and* a copy of the current implementation's
> accompanying document that specifies
> void main(void) { /* ... */ }
> as a permitted definition, we can infer the behavior of the program for

Yes; thanks for the obvious!

Given a document from some C implementation, we can also infer the
behavior of

program Foo;

begin
writeln('Hello!');
end.

That doesn't make it C, let alone "standard-defined".

It's only defined in the Merriam-Webster or OED sense of the word "defined".

> that implementation (noting that "the termination status returned to the
> host environment is unspecified", 5.1.2.2.3p1).

That text is meaningless bunk, because the behavior required to reach
the return from that function isn't defined.

Richard Bos, elsethread:
RB> Hm. AIUI, the intent behind that clause was entirely political, meant to
RB> appease certain compiler vendors (Microsoft, perhaps?); as such, it may
RB> well be completely meaningless.

> Of course the program's behavior is undefined for an implementation that
> doesn't document that form.

No implementation is required to document that form, or to establish
any implementation-defined characteristic which gives that form any
standard-defined meaning.

void main(void) is an utterance in a dialect which resembles ISO C
syntactically (indeed, requires no diagnostic), but has no meaning in any
dialects that are generated from the ISO C spec by the assignment of
characteristics.

> Similar arguments apply to a program that does:
>
> printf("%d\n", 32767+1);

No, they absolutely do not. An implementation must define the characteristics
of the type int. If int is defined in such a way that the result of 32767 + 1
is representable, the above must calculate 32768. And this is deducible
logically, without referring to third party documents.

By contrast, if an implementation defines void main(void), there is no
similar definition of behavior, like the rigor of having to compute 32768.

If I'm interpreting the above using nothing but the program and a standard,
then when I see 32767+1, I note "okay, this program requires an implementation
with ints 17 bits or wider". The program is still a program of some dialects
of ISO C, just not all of them: it is not an utterance of those ISO C variants
which feature a width of 16 bits for the value-contributing parts of type int.

Then suppose elsewhere in the same program I deduce, "this code
requires int to be precisely 16 bits wide". Then I have a contradiction:
int cannot be simultaneously 16 bits wide, and greater than 16 bits wide.
The program fails parameter satisfiability: implementation-defined
characteristics cannot be assigned in a non-conflicting way.

When I see #include <unistd.h> or void main(void), then I may immediately
jump to the conclusion that it's not a program written in the ISO C
dialect. It is in the hands of some implementation. It has exactly
the same status as a shell script, or a Fortran program, if such a specimen
were presented as a C program (except that depending on how these are crafted,
they may require a diagnostic).

Kaz Kylheku

unread,
Jul 9, 2014, 6:25:28 PM7/9/14
to
On 2014-07-09, Keith Thompson <ks...@mib.org> wrote:
> Kaz Kylheku <k...@kylheku.com> writes:
> [...]
>> You mean #include <unistd.h>?
>>
>> Is this in C11 or a new draft?
>>
>> I have an electronic copy of C99; do you believe C99 defines it?
>>
>> Maybe my PDF reader is broken, because it fails to find the substring
>> "unistd" in the document. Maybe the string is encoded somehow, or
>> built from pieces using preprocesing?
> [...]
>
> The C standard does not include the string "unistd", as I'm sure you
> know.
>
> The lack of any occurrence of "foobar" in the standard does not render
> puts("foobar");
> or
> FILE *f = fopen("foobar", "r");
> undefined.

"foobar" is a token in the program (one of the two deliverables we need to
establish meaning). With the other deliverable, the standard, we can deduce how
that token is transformed into a string literal object that makes its way into
the translated image.

At no point in this exercise do we have to refer to a third document.

Keith Thompson

unread,
Jul 9, 2014, 9:40:49 PM7/9/14
to
Kaz Kylheku <k...@kylheku.com> writes:
> On 2014-07-09, Keith Thompson <ks...@mib.org> wrote:
>> Kaz Kylheku <k...@kylheku.com> writes:
>>> On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>>>> Apparently you think program constructs that lead to undefined
>>>> behavior in some circumstances are undefined behavior in all
>>>> circumstances.
>>>
>>> The best way to articulate my position is that constructs lead to undefined
>>> behavior when we cannot deduce the behavior of the program, at least for some
>>> implementations that have certain choices of characteristics, using only a copy
>>> of the program and a copy of the standard, without resorting to our imagination
>>> to *invent* a dialect which gives meaning to the program.
>> [...]
>>
>> We are not limited to a copy of the program and a copy of the standard.
>
> In that case, that will lead you to have different notions of defined behavior
> from someone who believes to be so limited.
>
> To me, the procedure involving just those two deliverables is the golden
> procedure for any sort of standard-definedness testing.

You accept the authority of the C standard, yet you choose to
ignore the normative wording in which it discusses extensions and
implementation-defined definitions of "main".

[...]

>> Given this program:
>>
>> void main(void) { }
>>
>> plus a copy of the standard *and* a copy of the current implementation's
>> accompanying document that specifies
>> void main(void) { /* ... */ }
>> as a permitted definition, we can infer the behavior of the program for
>
> Yes; thanks for the obvious!
>
> Given a document from some C implementation, we can also infer the
> behavior of
>
> program Foo;
>
> begin
> writeln('Hello!');
> end.
>
> That doesn't make it C, let alone "standard-defined".

Does the required accompanying document define that code's behavior *as
an extension to the C language*, as explicitly permitted by the C
standard? If so, then that's relevant. If, on the other hand, we're
dealing with something that can compile either C or Pascal (which is far
more likely for this example), then it's not relevant. (I think there
are contrived programs that are valid C and valid Pascal; I wonder how
your hypothetical compiler would deal with them.)

[...]

>> Of course the program's behavior is undefined for an implementation that
>> doesn't document that form.
>
> No implementation is required to document that form, or to establish
> any implementation-defined characteristic which gives that form any
> standard-defined meaning.

True. And yet some happen to do so (Microsoft's documentatation
specifically permits "void main(void)"). If it happened that *no*
existing implementations permitted and documented "void main(void)",
then "void main(void){}" would have undefined behavior on all existing
implementations.

> void main(void) is an utterance in a dialect which resembles ISO C
> syntactically (indeed, requires no diagnostic), but has no meaning in any
> dialects that are generated from the ISO C spec by the assignment of
> characteristics.

Why do you choose to ignore the normative phrase "or in some other
implementation-defined manner" in 5.1.2.2.1?

>> Similar arguments apply to a program that does:
>>
>> printf("%d\n", 32767+1);
>
> No, they absolutely do not. An implementation must define the characteristics
> of the type int.

Yes, and it must do so in that accompanying document required by 4p8.

> If int is defined in such a way that the result of 32767 + 1
> is representable, the above must calculate 32768. And this is deducible
> logically, without referring to third party documents.
>
> By contrast, if an implementation defines void main(void), there is no
> similar definition of behavior, like the rigor of having to compute 32768.

Given that void main(void) is permitted by a particular
implementation, its behavior is defined in much the same way as
void foo(void). main is to a large degree just another function.
The ways in which it differs from other functions (that it's the
program's entry point, for example) are already addressed by the
standard.

If an implementation documents "void main(void)" as a permitted form,
and if this program:

#include <stdio.h>
void main(void) { puts("hello"); }

does something other than printing "hello" under that implementation,
then the implementation is non-conforming.

> If I'm interpreting the above using nothing but the program and a standard,
> then when I see 32767+1, I note "okay, this program requires an implementation
> with ints 17 bits or wider". The program is still a program of some dialects
> of ISO C, just not all of them: it is not an utterance of those ISO C variants
> which feature a width of 16 bits for the value-contributing parts of type int.

And if you interpret it using the program, the standard, and the
implementation's accompanying document, then you know either that it
prints "32768" or that its behavior is undefined, depending on the
documented value of INT_MAX.

> Then suppose elsewhere in the same program I deduce, "this code
> requires int to be precisely 16 bits wide". Then I have a contradiction:
> int cannot be simultaneously 16 bits wide, and greater than 16 bits wide.
> The program fails parameter satisfiability: implementation-defined
> characteristics cannot be assigned in a non-conflicting way.
>
> When I see #include <unistd.h> or void main(void), then I may immediately
> jump to the conclusion that it's not a program written in the ISO C
> dialect. It is in the hands of some implementation. It has exactly
> the same status as a shell script, or a Fortran program, if such a specimen
> were presented as a C program (except that depending on how these are crafted,
> they may require a diagnostic).

I think you can reach this conclusion only by assuming that some
normative parts of the standard are nonsense that can be safely ignored.
If that's your personal opinion, that's fine, but in the context of
discussing the standard itself it doesn't make much sense.

6.10.2 defines the syntax and semantics of

# include <h-char-sequence> new-line

That description is not restricted to the standard headers defined in
section 7.

The (compile-time) behavior of the #include directive itself is well
defined: the directive is replaced by the entire contents of the header.
Of course the program's behavior depends on just what those contents
are. In effect, the contents of the <unistd.h> header are part of the
program, and the program's definedness or lack thereof must be judged by
those contents (along with the standard and the implementation's
accompanying document, of course).

Keith Thompson

unread,
Jul 9, 2014, 9:45:04 PM7/9/14
to
unistd.h is an h-char-sequence, which is mapped to a uniquely identified
header in an implementation-defined manner. The meaning of a program
containing "#include <unistd.h>" depends on the contents of that header.
For some contents, the program's behavior will be undefined; for other
contents, it will be defined. Your unwillingness to refer to a third
document is not supported by your first document.

Tim Rentsch

unread,
Jul 23, 2014, 10:08:39 AM7/23/14
to
Kaz Kylheku <k...@kylheku.com> writes:

> On 2014-07-09, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Apparently you think program constructs that lead to undefined
>> behavior in some circumstances are undefined behavior in all
>> circumstances.
>
> The best way to articulate my position is that constructs lead to
> undefined behavior when we cannot deduce the behavior of the
> program, at least for some implementations that have certain
> choices of characteristics, using only a copy of the program and a
> copy of the standard, without resorting to our imagination to
> *invent* a dialect which gives meaning to the program.

This view is not consistent with how the Standard defines the
term 'undefined behavior'. If you want to think of such things
as "undefined", that's fine, but it's not 'undefined behavior' in
the sense of how that phrase is defined and used in the Standard.

jasper...@live.com

unread,
Nov 4, 2014, 3:56:38 AM11/4/14
to
On Wednesday, April 30, 2014 8:04:33 PM UTC+10, Kees Bakker wrote:
> Hi,
>
> This item has been discussed before on comp.std.c, but I still am
> unclear what the outcome is. Is it undefined behaviour or not?
> (( Search for Subject: scanf("%d", &n) behaviour on overflow ))
>
> With the danger of giving an example and then have the discussion
> go in unwanted directions, I still want to give one.
>
> Let's say I have a string "194" and the sscanf format is %hhd or %hhi.
>
> signed char c;
> sscanf("194", "%hhi", &c);
>
> For some implementations you'd get c==-62. Is this the defined
> behavior or is it undefined?
>
> I'd argue that 194 does not fit in a signed char (because of hhi),
> and that the last part of C99 7.19.6.2p10 comes into action.
> "... if the result of the conversion cannot be represented
> in the object, the behavior is undefined."
> --
> Kees Bakker

JKHNSKSKDSNKHFEK KMEFNEK ;AJEQPI U0PCMU0E9QU23YNBD38
0 new messages