On Saturday, December 25, 2021 at 11:52:06 AM UTC-5, Tim Rentsch wrote:
> James Kuyper <
james...@alumni.caltech.edu> writes:
...
> > I've long understood that, during translation phase 4, as soon as a
> > compiler reaches the new-line at the end of a #if directive, it knows
> > whether the #if group will be included. It not, and there's a
> > corresponding #else, it knows that the #else group will be included.
> > Either way, as soon as it starts reading a group that will be included,
> > it can immediately start preprocessing that group (and this is the
> > important part:) while searching for the #else or #endif directive that
> > terminates the group.
> >
> > I've also long understood that the #if, #else (if any) and #endif
> > directives that make up an if-section must all occur in the same file.
> > I'm not sure how I reached that conclusion - it's not anything that the
> > standard says explicitly. [...]
>
> The first rule of grammar in 6.10 paragraph 1 says (with \sub()
> to mean subscript)
>
> preprocessing-file:
> group \sub(opt)
>
> Thus each preprocessing file must consist of an integral number
> of group-part, and so cannot contain any unbalanced #if/#endif
> directives, or any #else directive outside an #if/#endif section.
I believe that what you're saying, using the terms defined in the C
preprocessing grammar, is that neither an if-group, an else-group,
nor a endif-line qualifies separately as a group-part, only a complete
if-section can do so.
When the standard defines the meaning of a term, that definition takes
precedence over any other interpretation you might reach by analyzing
the meaning of the words making up that term. "preprocessing-file" is
simply a symbol in the grammar - it's definition is the grammar rule
associated with that symbol.
I've always interpreted the specification given in 6.10.2 as meaning that
a given preprocessing file must match the grammar described in 6.10 up
until the point that it recognizes a #include directive, which 'causes the
replacement of that directive by the entire contents of the source file
identified by the specified sequence between the " delimiters.' It's only the
file after that replacement (and all other such replacements), which must
fully parse in accordance with the grammar in 6.10.
However, the term "preprocessing file" is also defined in 5.1.1.1p1. That's
a section of the standard that seldom comes up in discussion, so I'd
forgotten about that definition. I agree that it makes sense that a
"preprocessing file" is meant to match the syntax specified for a
"preprocessing-file". The standard often uses a grammar symbol name,
with '-' replaced by spaces, to refer to things matching that grammar
symbol. However, this is one of the few places where the name, with that
replacement, is formally defined separately from the grammar, implying a
connection between those two definitions.
This is not the clearest way to impose such a requirement. If each
preprocessing file is supposed to separately parse as a preprocessing-file,
I think it would have been better to explicitly mention that fact in the
description of 6.10.2 "Source file Inclusion." The "replacement" wording
actually used gave me the strong impression that there were no content
restrictions on the #included file itself, but only on the result after
replacing the directive with those contents.