Let's take the code below.
#define m !(m)+n
#define n(n) n(m)
m(m)
It looks weird that the Microsoft's compiler results this code in
!(m)+!(m)+n(!(m)+n)
and EDG frontend results in
!(m)+ !(m)+n(!(m)+n)
(with the space that is followed by "!").
As I understand, the expansion gives, at various stages:
m(m)
!(m)+n(m)
!(m)+!(m)+n(m)
If so, why the last invocation of the macro "m" expands?
Thank you.
--
ik
Because the n(m) expansion was not done in the course of the first m
expansion, so the recursion prevention doesn't kick in. I used this
feature in my IOCCC winner. http://www.ioccc.org/years.html#1998_fanf
Tony.
--
f.a.n.finch <d...@dotat.at> http://dotat.at/
RATTRAY HEAD TO BERWICK ON TWEED: SOUTHEAST 3 OR 4 INCREASING 4 OR 5 THIS
MORNING, LOCALLY 6 THIS AFTERNOON, EASING EAST OR SOUTHEAST 3 OR 4 OVERNIGHT.
MAINLY FAIR. MODERATE OR GOOD. SLIGHT TO MODERATE.
The expansion goes as follows, with ^ under the next thing to be expanded:
m(m)
^
!(m)+n(m)
^^^^
!(m)+m(m)
^
!(m)+!(m)+n(m)
^
[ n(m) isn't expanded because it's part of an expansion of n ]
!(m)+!(m)+n(!(m)+n)
I strongly recommend not to depend on this behavior.
m(m) is expanded as follows, where [] indicates a set of macro names
being expanded:
m(m)
~
[m]
!(m)+n(m)
~~~~~~
Now, it's not clear that n(m) should be considered to be a recursive
expansion of m; for more details, please see DR017Q19, Q23 and
http://groups.google.com/groups?selm=x5BK8.385%24r%254.582%40news.hananet.net
Thus, we have two cases for its expansion:
(1) If it's considered to be a recursive expansion of m,
!(m)+n(m)
~~~~
[m, n]
!(m)+m(m)
~~~~
(2) or if it's considered to be a new expansion,
!(m)+n(m)
~~~~
[n]
!(m)+!(m)+n(m)
~~~~~~~~~
Note that expansion of the argument "m" given to n() is not directly
related to (i.e., doesn't preclude expansion of) "(m)" following "n"
during rescanning. Thus,
[n, m]
!(m)+!(m)+n(!(m)+n)
~~~~~~~~~~~~~~
In the following example, you can see the same behavior as the above
expansion occurring during rescanning:
#define foo bar
#define mac(x) x(foo)
mac(foo)
results in "bar(bar)", not "bar(foo)".
--
Jun, Woong (myco...@hanmail.net)
Dept. of Physics, Univ. of Seoul
> Now, it's not clear that n(m) should be considered to be a recursive
> expansion of m; for more details, please see DR017Q19, Q23 and
The non-normative reference is incorrect. The behavior is specified, it just
isn't very clear. The secondary indirect macro expansion should expand because
the macro name is not found during the rescanning and further replacement of the
replacement list. This is somewhat hidden though. A function-like macro is not
invoked unless it is trailed by a left parenthesis. This causes the the
replacement list rescan to be complete prior to the invocation of the tail
macro:
#define A() B
#define B() A
A() // B
A()() // A
A()()() // B
// etc.
However, if the opening parethesis of the invocation *is* found during the
rescan, the macro is considered invoked, and therefore it is a recursive
invocation attempt:
#define A() B(
#define B() A
A() ) () // A()
The primary difficult in implementing macro expansion is how to handle this
situation, because a name may be disabled only partially through a macro
invocatations arguments, but still be disabled for entire replacement list
rescan of that macro. Nevertheless, the specified behavior is clear.
Regards,
Paul Mensonides
Then, could you show me a more reliable non-normative reference like
the committee's answer to another DR or the committee minutes, not
you own interpretation? Because what I cited were the authoritative
interpretations which the committee formed for C90, and C99 still has
the following non-normative wording (even if Doug said differently in
the discussion cited in my previous posting, which was the reason I
mentioned it),
J.1 (Unspecified behavior)
When a fully expanded macro replacement list contains a function-
like macro name as its last preprocessing token and the next
preprocessing token from the source file is a (, and the fully
expanded replacement of that macro ends with the name of the first
macro and the next preprocessing token from the source file is
again a (, whether that is considered a nested replacement.
I need an interpretation which has an equivalent authoritative to
overthrow these.
>
> However, if the opening parethesis of the invocation *is* found during the
> rescan, the macro is considered invoked,
#define a()
a(
According to the literal reading of the standard (and your
interpretation) this macro "a" is considered invoked but never
terminated, since there is no closing parenthesis. Is this code
invalid? I think that this means an implementation is required to
recognize the corresponding closing parenthesis to decide expasion for
the invoked macro.
> and therefore it is a recursive
> invocation attempt:
>
> #define A() B(
> #define B() A
>
> A() ) () // A()
>
My understanding was that this is not different from your first
example above, even if the committee mentioned nothing direct about
this. The invocation is completed with tokens from the remaining
source file, and in the Rationale (for both C90 and C99) the committee
provides a rationale to leave it as unspecified:
However, given the definitions
#define f(a) a*g
#define g(a) f(a)
the expansion will to be either 2*f(9) or 2*9*g: there are no
clear grounds for making a decision whether the f(9) token string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
resulting from the initial expansion of f and the examination of
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
the rest of the source file should be considered as nested within
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
the expansion of f or not.
~~~~~~~~~~~~~~~~~~~~~~~~~
which can apply to your second example, too.
Ooops, sorry. The above code is invalid. Ignore this paragraph.
Did I really?
Yes.
http://groups.google.com/groups?hl=ko&lr=&ie=UTF-8&selm=3D050F91.8734304A%40null.net
> Then, could you show me a more reliable non-normative reference like
> the committee's answer to another DR or the committee minutes, not
> you own interpretation? Because what I cited were the authoritative
6.10.3/10
"...Each subsequent instance of the function-like macro name followed by a ( as
the next preprocessing token introduces the sequence of preprocessing tokens
that is replaced by the replacement list in the definition (an invocation of the
macro)...."
This is the definition of a macro invocation and replacement in the context of a
function-like macro. I.e. both the identifier and a left parenthesis are
required for macro replacement of a function-like macro.
6.10.3.4/2
"...Furthermore, if any nested replacements encounter the name of the macro
being replaced, it is not replaced...."
This is the part that prohibits recursion. Application of these sections to the
following:
#define A() B
#define B() A
A()()() // B
The instance of B found during rescanning of the replacement list of A does not
constitute a nested invocation because there is no left parenthesis found during
the rescanning of the A's replacement list. Scanning must continue out of A's
replacement list in order to find the left parenthesis and therefore a macro
invocation. It is not nested. It is a different scenario when the open
parenthesis is found during the rescan, as in:
#define A() B(
A() ) () // A()
...because here the invocation occurs during the rescan of the replacement list
of A and is therefore nested, and A consequently gets disabled.
The non-normative reference contradicts the definitions and rules specified in
the normative sections, regardless of what the intention may have been.
Further (and I realize that this irrelevant--just some further points about this
behavior in practice), this is the behavior of every preprocessor that I have
encountered as the author of Boost's preprocessor library. Any other
interpretation would break a great deal of code, in the preprocessor library
itself, in Boost itself in general, and in user's of Boost all over the place.
There is also no such non-normative text in the C++ standard. Lastly, this
interpretation follows the normative sections of the standard to the letter, but
does not break the original reasoning to prohibit unrestricted recursion. It
cannot recurse indefinitely.
Ultimately, there are only three possible solutions: 1) a fundamental
divergence of the C and C++ preprocessors, causing support for C in this field
to be dropped--which would be a pity, 2) breaking a great deal of code in both C
and C++, or 3) clarifying what I believe the standard already says.
I have a vested interest in this area (obviously), which is why I am as adamant
as I am that that is the only correct behavior given the normative portions of
both the C and C++ standards.
Regards,
Paul Mensonides
The standard doesn't say that it's the definition.
Clause 3:
For the purposes of this International Standard, the following
definitions apply. Other terms are defined where they appear in
/italic/ type or on the left side of a syntax rule.
> of a macro invocation and replacement in the context of a
> function-like macro. I.e. both the identifier and a left parenthesis are
> required for macro replacement of a function-like macro.
And the matching right parenthesis and possible pp-tokens between the
two matching parentheses also constitute the invocation and
replacement of the function-like macro; see below.
>
> #define A() B
> #define B() A
>
> A()()() // B
>
> The instance of B found during rescanning of the replacement list of A does not
> constitute a nested invocation because there is no left parenthesis found during
> the rescanning of the A's replacement list. Scanning must continue out of A's
> replacement list in order to find the left parenthesis and therefore a macro
> invocation. It is not nested.
A part of the macro invocation (the macro name) comes from expansion
of "A", and the other part (two parentheses) from the remaining source
file, which is one of the major reasons that the Committee said
"unspecified" about that. And with the authoritative interpretation,
I don't see any reason to agree with your personal interpretation. If
you really think that the committee's interpretation is wrong, then a
DR should be in order. Unfortunately, you can't change the state of
C90, only can do C99.
> It is a different scenario when the open
> parenthesis is found during the rescan, as in:
>
> #define A() B(
>
> A() ) () // A()
>
> ...because here the invocation occurs during the rescan of the replacement list
> of A and is therefore nested, and A consequently gets disabled.
The closing parenthesis and possible pp-tokens given as arguments also
constitute the invocation and replacement of the macro:
The replaced sequence of preprocessing tokens is terminated by the
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
matching ) preprocessing token, skipping intervening matched pairs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
of left and right parenthesis preprocessing tokens. Within the
~~~~~~~~~~
sequence of preprocessing tokens making up an invocation of a
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
function-like macro, new-line is considered a normal white-space
~~~~~~~~~~~~~~~~~~~
character.
>
> The non-normative reference contradicts the definitions and rules specified in
> the normative sections, regardless of what the intention may have been.
Then, please show me any other non-normative reference where the
committee agrees with your interpretation and throws away one already
formed.
> this is the behavior of every preprocessor that I have
> encountered as the author of Boost's preprocessor library.
According to your intepretation, what's the result of the following
expansion?
#define A B(
#define B() A
A )
Most implementations I've used gave me "B (", not "A". Are they simply
broken?
> Any other
> interpretation would break a great deal of code, in the preprocessor library
> itself, in Boost itself in general, and in user's of Boost all over the place.
The committee's interpretation does/did not break existing
implementations if they give one of two possible behaviors. But every
program which depends on that behavior is simply non-portable if it
was made after publication of the DR.
> There is also no such non-normative text in the C++ standard. Lastly, this
> interpretation follows the normative sections of the standard to the letter,
But, it doesn't follows the intent of the committee.
> but
> does not break the original reasoning to prohibit unrestricted recursion. It
> cannot recurse indefinitely.
I don't think that the interpretation given in the DR permits the
indefinite recursion of the macro expansion.
>
> Ultimately, there are only three possible solutions: 1) a fundamental
> divergence of the C and C++ preprocessors, causing support for C in this field
> to be dropped--which would be a pity,
NB> C++ has the same as C90 and C99; but they deemed it not clear enough,
NB> hence the DR suggested resolution
NB>
NB> http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/cwg_active.html
NB>
NB> #268.
JW> If the Committee accepts the suggested
JW> resolution in the C++98 DR, then the example of the Rationale must be
JW> removed (or revised to say that it's well-defined). [...]
JW> Anyway, did the WG14 Committee accept the suggested resolution
JW> officially (or at least have a plan to do)?
DG> I don't think it was entered into the WG14 agenda, although it
DG> ought to be. (I thought there was an agreement that the C part
DG> of C++ would be interpreted by WG14; this is especially needed
DG> for preprocessing issues since so many C++ gurus disparage use
DG> of the preprocessing facility and hence would not be expected
DG> to be as concerned about maximizing its usefulness.)
> 2) breaking a great deal of code in both C
> and C++,
If they are already considered to be non-portable, I don't think they
are worth being protected by the standard now.
> The standard doesn't say that it's the definition.
Irrelevant. If the standard says X == Y in normative text, then that
constitutes a definition. I wasn't using the term as a formal terminological
definition.
> And the matching right parenthesis and possible pp-tokens between the
> two matching parentheses also constitute the invocation and
> replacement of the function-like macro; see below.
The point is not what constitutes a full invocation, but what constitutes an
invocation. I.e. as opposed to:
#define A() B
#define B
A() +
Even the name B exists in the rescan, it does not constitute an invocation.
> A part of the macro invocation (the macro name) comes from expansion
> of "A", and the other part (two parentheses) from the remaining source
> file, which is one of the major reasons that the Committee said
> "unspecified" about that. And with the authoritative interpretation,
> I don't see any reason to agree with your personal interpretation.
There is not authoritative non-normative interpretation, especially considering
the C++ DR that you mention below. This is what it fundamentally comes down to:
"A" does not constitute a macro invocation, while "A(" does. The question then
is only whether or not that happens during the rescan a replacement list. In
the first case, i.e.
#define A() B
#define B() A
...it does not. In the second case,
#define A() B(
#define B() A
...it does. Consequently, the identifier token "A" is not disabled in the first
scenario and is disabled in the second.
>> The non-normative reference contradicts the definitions and rules
>> specified in
>> the normative sections, regardless of what the intention may have
>> been.
>
> Then, please show me any other non-normative reference where the
> committee agrees with your interpretation and throws away one already
> formed.
The C++ DR #268 that you mention below, for example--though that isn't the C
committee.
>> this is the behavior of every preprocessor that I have
>> encountered as the author of Boost's preprocessor library.
>
> According to your intepretation, what's the result of the following
> expansion?
>
> #define A B(
> #define B() A
>
> A )
>
> Most implementations I've used gave me "B (", not "A". Are they simply
> broken?
Yes, in this case. You'd be surprised at how buggy most preprocessors
are--though I'm not complaining about this specific are in particular (i.e.
partial invocation). However, the partial invocation like the above is the only
area where preprocessors do different and/or strange things (of those I've had
experience with anyway). Boost preprocessor doesn't do that, but it does the
other type of test case (where the parentheses are both outside the replacement
list) all the time. Without this behavior, a massive dependency is created and
a massive inefficiency in the preprocessor itself, it also tosses out the window
nearly all of existing practice. I personally don't care about the partial
invocation case above, though I'd prefer a well-defined behavior of some sort or
rejection as ill-formed.
> The committee's interpretation does/did not break existing
> implementations if they give one of two possible behaviors. But every
> program which depends on that behavior is simply non-portable if it
> was made after publication of the DR.
I disagree with how you define "breaking code." If code existed, the standard
changed, and, as a result, code that was once valid according to the rules
specified (as they currently are, normatively) is now unspecified or undefined,
then the standard broke code. Obviously, at certain times the standard must
break code, but not when there is no logical reason. Any resolution that makes
trailing expansion like this...
#define A() B
#define B() A
A()()() // B
...undefined behavior would effectively destroy the validity of an entire class
of generative metaprogramming. The hacks necessary to circumvent that problem
are extremely cumbersome, and it would drastically increase the complexity of
the field.
>> There is also no such non-normative text in the C++ standard.
>> Lastly, this
>> interpretation follows the normative sections of the standard to the
>> letter,
>
> But, it doesn't follows the intent of the committee.
According to the C++ DR #268 it was:
"The original intent of the J11 committee in this text was that the result
should be 42, as demonstrated by the original pseudo-code description of the
replacement algorithm provided by Dave Prosser, its author. The English
description, however, omits some of the subtleties of the pseudo-code and thus
arguably gives an incorrect answer for this case."
>> but
>> does not break the original reasoning to prohibit unrestricted
>> recursion. It
>> cannot recurse indefinitely.
>
> I don't think that the interpretation given in the DR permits the
> indefinite recursion of the macro expansion.
Nor do I. The reason for name disabling, first an foremost, is to prevent
indefinite recursion from ever being an issue. None of the "possible"
resolutions to this can cause infinite recursion. That was my only point here.
>> If the Committee accepts the suggested
>> resolution in the C++98 DR, then the example of the Rationale must be
>> removed (or revised to say that it's well-defined). [...]
>> Anyway, did the WG14 Committee accept the suggested resolution
>> officially (or at least have a plan to do)?
I don't know. I'm not on the committee myself, though I plan to be in Kona this
fall as a technical expert regarding the preprocessor and preprocessor
metaprogramming.
>> I don't think it was entered into the WG14 agenda, although it
>> ought to be. (I thought there was an agreement that the C part
>> of C++ would be interpreted by WG14; this is especially needed
>> for preprocessing issues since so many C++ gurus disparage use
>> of the preprocessing facility and hence would not be expected
>> to be as concerned about maximizing its usefulness.)
That is true. However, C++ gurus are getting less afraid of it then they used
to be. It is all about using the right tool for the right job. A collection of
uses of the preprocessor, for which core C++ had better alternatives, are the
primary cause of this.
I have no great issue with the C committee handling the preprocessor in
general--except in situations that are specific to C++ (and, bitand, etc.) or
have a much stronger context in C++.
>> 2) breaking a great deal of code in both C
>> and C++,
>
> If they are already considered to be non-portable, I don't think they
> are worth being protected by the standard now.
They aren't considered non-portable according to (what I consider to be) strict
interpretation of the C and C++ standards (sans the non-normative reference in
the C99 standard). Nor are they non-portable in practice.
Regards,
Paul Mensonides
? The initial replacement buffer is A() which is replaced
by B. Then that is rescanned for further replacement
opportunities (with any occurrences of A painted blue).
In C89 there were words in 6.8.3.4 to ensure that the
following pp-tokens *are* used in the rescan; that should
be unchanged in C99 (although my copy isn't at hand to
check). Therefore B() *is* matched and expanded, so the
extended replacement buffer then contains A, which is the
name of the original macro being expanded, which raises
the following question: If this expansion of B() is
considered "nested" under the original expansion of A(),
then that occurrence of A has to be painted blue. Since
"nesting" was not carefully defined in C89 (it could be
either a spatial subsetting or a procedural hierarchy),
this was an issue open to debate, and *as I recall* we
decided for C99 to define it in the affirmative (although
like some other resolutions it might not have made it into
the published document). If this isn't addressed in the
C99 wording, then it remains a debatable question suitable
for a DR. Assuming that the encounter of the name A
during that expansion of B() *is* considered "nested",
then the extended replacement buffer now contains A' where
' denotes permanent blue paint. A' does not trigger any
macro replacement, ever. At this point the original pp
sequence has become A'() and macro substitution is
finished. The paint ' does not appear in any form in the
resulting phase 7 token sequence, which looks like A().
If, on the other hand, that is *not* considered nesting,
then one would expect that inner A to be rescanned within
the context of B()-replacement along with further pp-
tokens, which would require it to trigger another
replacement, resulting in the token sequence B at phase 7.
As I recall we intended to resolve the ambiguity about the
meaning of "nested" for C99, and the resolution was
supposed to be such that this example does constitute
nesting of the inner A within the replacement of the outer
A(), so blue paint gets applied to the inner A.
As to the C++ standard, they don't much care for
preprocessing, and their evident intent has been that it
be whatever the C standard specifies. Since the two
standards are not updated concurrently, they tend to get
slightly out of sync in such areas of overlap.
If you aren't using the term as a formal terminological definition,
then it can be argued that the wording in question is just for an
explanation for the term.
>
> > And the matching right parenthesis and possible pp-tokens between the
> > two matching parentheses also constitute the invocation and
> > replacement of the function-like macro; see below.
>
> The point is not what constitutes a full invocation, but what constitutes an
> invocation. I.e. as opposed to:
>
> #define A() B
> #define B
You meant "#define B()"?
>
> A() +
>
> Even the name B exists in the rescan, it does not constitute an invocation.
On the other hand,
A() ( +
constitutes an invocation, but not full invocation, thus results in
undefined behavior which means that the implementation can do
anything. The valid (thus meaningful) invocation of the function-like
macro is made up with all pp-tokens between two matching parentheses
(inclusive).
By the way, in this case, because the committee already provided the
intended interpretation through the answer to the DR, it's not very
useful to discuss what the text of the standard means. Sure, a DR can
be formed if the wording has a big problem in it when comparing the
committee's interpretation.
>
> > A part of the macro invocation (the macro name) comes from expansion
> > of "A", and the other part (two parentheses) from the remaining source
> > file, which is one of the major reasons that the Committee said
> > "unspecified" about that. And with the authoritative interpretation,
> > I don't see any reason to agree with your personal interpretation.
>
> There is not authoritative non-normative interpretation, especially considering
> the C++ DR that you mention below.
Why we, as readers of the C standard, should bother about C++DR which
has never been processed by the C committee? When discussing what the
C standard means, the important thing is what the C committee intended
with the text, neither what any other committee thinks about it nor
how you interpret it.
> >
> > According to your intepretation, what's the result of the following
> > expansion?
> >
> > #define A B(
> > #define B() A
> >
> > A )
> >
> > Most implementations I've used gave me "B (", not "A". Are they simply
> > broken?
>
> Yes, in this case.
Only if we assume that the authoritative interpretation agree with
what you claimed, which is not true.
[...]
>
> > The committee's interpretation does/did not break existing
> > implementations if they give one of two possible behaviors. But every
> > program which depends on that behavior is simply non-portable if it
> > was made after publication of the DR.
>
> I disagree with how you define "breaking code." If code existed, the standard
> changed, and, as a result, code that was once valid according to the rules
> specified (as they currently are, normatively) is now unspecified or undefined,
> then the standard broke code.
The story can change if the program is made *after* publication of the
DR where the committee provides the correct (or intended)
interpretation for the wording in question.
[...]
>
> According to the C++ DR #268 it was:
>
> "The original intent of the J11 committee in this text was that the result
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> should be 42, as demonstrated by the original pseudo-code description of the
~~~~~~~~~~~~
> replacement algorithm provided by Dave Prosser, its author. The English
> description, however, omits some of the subtleties of the pseudo-code and thus
> arguably gives an incorrect answer for this case."
I have no idea about whether or not the underlined wording is true,
but it's not what the C committee said in many places. The C89
committee explicitly formed their intent that differs from it.
> >
> > If they are already considered to be non-portable, I don't think they
> > are worth being protected by the standard now.
>
> They aren't considered non-portable according to (what I consider to be) strict
> interpretation of the C and C++ standards
It's non-portable according to the interpretation of the committee in
the current state (or at least in C90). I don't see how it can be
thought differently here.
> > Let's take the code below.
> >
> > #define m !(m)+n
> > #define n(n) n(m)
> > m(m)
> >
> The expansion goes as follows, with ^ under the next thing to be expanded:
>
> m(m)
> ^
>
> !(m)+n(m)
> ^^^^
>
> !(m)+m(m)
> ^
>
> !(m)+!(m)+n(m)
> ^
> [ n(m) isn't expanded because it's part of an expansion of n ]
Yes, "n(m)" is not expanded. But why "m" in the parenthesis is?
--
ik
> > Let's take the code below.
> >
> > #define m !(m)+n
> > #define n(n) n(m)
> > m(m)
> >
>
> I strongly recommend not to depend on this behavior.
This code is a part of a test suite for my own commercial C preprocessor.
So, I've no choice. ;-) And I strongly need to know whether this code causes
undefined behaviour or not. And if it's not, who's right.
> m(m) is expanded as follows, where [] indicates a set of macro names
> being expanded:
>
> m(m)
> ~
> [m]
> !(m)+n(m)
> ~~~~~~
>
> Now, it's not clear that n(m) should be considered to be a recursive
> expansion of m;
Firstly, that's not what I asking about.
Secondly, I think it's clear that it should be. If not, why not?
> for more details, please see DR017Q19, Q23 and
>
>
http://groups.google.com/groups?selm=x5BK8.385%24r%254.582%40news.hananet.net
We've some other wording now (C99 J.1):
"When a fully expanded macro replacement list contains a function-like macro
name as its last preprocessing token and the next preprocessing token from
the source file is a (, and the fully expanded replacement of that macro
ends with the name of the first macro and the next preprocessing token from
the source file is again a (, whether that is considered a nested
replacement (6.10.3)."
So, these references are about some other cases.
> In the following example, you can see the same behavior as the above
> expansion occurring during rescanning:
>
> #define foo bar
> #define mac(x) x(foo)
>
> mac(foo)
>
> results in "bar(bar)", not "bar(foo)".
And again, this code doesn't illustrates the same case.
But thank you in any way.
--
ik
The following pp-tokens aren't used during the "rescan." Rather, scanning
proceeds out of the rescan and into the trailing tokens. This is fundamental,
as any invocation of a macro would disable that macro for the entire rest of the
translation unit otherwise.
> that should
> be unchanged in C99 (although my copy isn't at hand to
> check). Therefore B() *is* matched and expanded, so the
> extended replacement buffer then contains A, which is the
> name of the original macro being expanded, which raises
> the following question: If this expansion of B() is
> considered "nested" under the original expansion of A(),
> then that occurrence of A has to be painted blue.
If so, yes.
> As I recall we intended to resolve the ambiguity about the
> meaning of "nested" for C99, and the resolution was
> supposed to be such that this example does constitute
> nesting of the inner A within the replacement of the outer
> A(), so blue paint gets applied to the inner A.
If this is the case, it isn't contained in the C99 standard. The non-normative
reference does say that it is unspecified, but IMO that conflicts with what the
normative portions. The normative section explicitly states that a left
parenthesis is required to begin the replacement process of a function-like
macro. It then further says, when discussing the rescanning of the replacement
list, if a nested replacement yields an "outer" macro name and identifier
referring to an outer macro, that identifier is disabled. There is no way that
it can be "nested," by any possible definition of the work, because there is no
left parenthesis in the replacement list and therefore it cannot be a
replacement:
#define A() B
#define B() A
A()()()()
On the contrary, a partially opened invocation is not so clear:
#define A() B(
#define B() A
A() ) () )
Though I argue that it does constitute a nested replacement, because of the left
parenthesis.
> As to the C++ standard, they don't much care for
> preprocessing, and their evident intent has been that it
> be whatever the C standard specifies. Since the two
> standards are not updated concurrently, they tend to get
> slightly out of sync in such areas of overlap.
I don't have a problem with that. What I have a problem with is a great deal of
code that existed prior to C99 is apparently broken or unspecified because the C
committee added non-normative text to the standard that supposedly alters the
meaning of the normative text in the standard.
Regards,
Paul Mensonides
I am arguming a formal definition of what constitutes a macro invocation. Just
not the term itself. If you cannot do this, then the standard is meaningless.
>> The point is not what constitutes a full invocation, but what
>> constitutes an
>> invocation. I.e. as opposed to:
>>
>> #define A() B
>> #define B
>
> You meant "#define B()"?
Yes, sorry.
> On the other hand,
>
> A() ( +
>
> constitutes an invocation, but not full invocation, thus results in
> undefined behavior which means that the implementation can do
> anything. The valid (thus meaningful) invocation of the function-like
> macro is made up with all pp-tokens between two matching parentheses
> (inclusive).
It isn't undefined, it is ill-formed. The right parenthesis must exist.
> By the way, in this case, because the committee already provided the
> intended interpretation through the answer to the DR, it's not very
> useful to discuss what the text of the standard means. Sure, a DR can
> be formed if the wording has a big problem in it when comparing the
> committee's interpretation.
I'm not arguing current intent. I'm arguing that that apparent resolution does
not mesh with existing practice, nor does it mesh with what the standard itself
says.
> Why we, as readers of the C standard, should bother about C++DR which
> has never been processed by the C committee? When discussing what the
> C standard means, the important thing is what the C committee intended
> with the text, neither what any other committee thinks about it nor
> how you interpret it.
Because the preprocessor has a context in C++ as well. It is pointless to not
consider C++ when discussing a common feature that may or may not change or be
more clearly specified. Plus, I don't agree that this was necessarily the
original design.
> Only if we assume that the authoritative interpretation agree with
> what you claimed, which is not true.
There is only one authoritative text: the normative portions of the standard.
Interpretation of the normative portions can result in only one possible
resolution for the A()() case and several possible resolutions for the A() )
case.
>> According to the C++ DR #268 it was:
>>
>> "The original intent of the J11 committee in this text was that the
>> result
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> should be 42, as demonstrated by the original pseudo-code
>> description of the
> ~~~~~~~~~~~~
>> replacement algorithm provided by Dave Prosser, its author. The
>> English
>> description, however, omits some of the subtleties of the
>> pseudo-code and thus
>> arguably gives an incorrect answer for this case."
>
> I have no idea about whether or not the underlined wording is true,
> but it's not what the C committee said in many places. The C89
> committee explicitly formed their intent that differs from it.
Given the non-normative reference, that's obvious. My entire point from the
beginning of this thread is that the non-normative text is incorrect when
compared to the wording of the standard itself. You can't "form" an intent,
only an interpretation. The original intent is a different thing.
Regards,
Paul Mensonides
It results in unspecified behavior. I don't know why you and Paul
Mensonides don't believe this even if Doug (who is a committee member)
and I've already provided some evidences. That ("the behavior is
unspecified/undefined") is completely said by the committee even if
it's not done in the normative part; the normative part is left in an
ambiguous way intentionally since C89. If you don't want to care for
the committee, why care for the standard itself?
>
> > m(m) is expanded as follows, where [] indicates a set of macro names
> > being expanded:
> >
> > m(m)
> > ~
> > [m]
> > !(m)+n(m)
> > ~~~~~~
> >
> > Now, it's not clear that n(m) should be considered to be a recursive
> > expansion of m;
>
> Firstly, that's not what I asking about.
But, it's important because, in the current state, the net result can
differ depending on it.
>
> Secondly, I think it's clear that it should be. If not, why not?
See below.
>
> > for more details, please see DR017Q19, Q23 and
> >
> >
> http://groups.google.com/groups?selm=x5BK8.385%24r%254.582%40news.hananet.net
>
> We've some other wording now (C99 J.1):
>
> "When a fully expanded macro replacement list contains a function-like macro
> name as its last preprocessing token and the next preprocessing token from
> the source file is a (, and the fully expanded replacement of that macro
> ends with the name of the first macro and the next preprocessing token from
> the source file is again a (, whether that is considered a nested
> replacement (6.10.3)."
>
> So, these references are about some other cases.
The informative wording in J.1 is not intended to be exhaustive.
Please don't apply it to cases literally, and read the rationale for
the unspecified (undefined) behavior given in the DR mentioned above.
>
> > In the following example, you can see the same behavior as the above
> > expansion occurring during rescanning:
> >
> > #define foo bar
> > #define mac(x) x(foo)
> >
> > mac(foo)
> >
> > results in "bar(bar)", not "bar(foo)".
>
> And again, this code doesn't illustrates the same case.
It exactly illustrates the same case, i.e., what you were asking (why
the "m" in the parentheses should be expanded).
If "m(m)" is expanded as follows anyway,
!(m) + n(m)
"m" is given to n() as an argument. During expansion of "n()", the
argument is fully expanded, thus you get this:
!(m) + !(m')+n'(m)
[m' and n' indicate that they are permanently blue-painted]
Now, your question is why "m" in the parentheses should be expanded,
even if it looks like nested expansion of "m" which is done during the
argument expansion, isn't it? But the expansion of the argument "m" is
irrelevant (spatially and procedurally) to the *rescan* occurred after
it's done. My simplified "foo/bar" example above is meant to show
this.
#define foo bar
#define mac(x) x(foo)
mac(foo)
To expand mac(), the argument "foo" should be fully expanded:
bar(foo)
But the expansion of the argument "foo" doesn't affect expansion of
"foo" in the parentheses during the rescan, which is the reason we get
"bar(bar)". If you agree with that we should get "bar(bar)" as the
result, the same thing applies to the above case you asked.
Then, you picked up just a part of the definition. The closing
parenthesis and pp-tokens between two matching parentheses also
constitute macro invocation. You aren't able to say whether a given
macro invocation has well-defined behavior (i.e., is valid) without
looking at them.
> >>
> >> #define A() B
> >> #define B()
> >
[...]
>
> > On the other hand,
> >
> > A() ( +
> >
> > constitutes an invocation, but not full invocation, thus results in
> > undefined behavior which means that the implementation can do
> > anything. The valid (thus meaningful) invocation of the function-like
> > macro is made up with all pp-tokens between two matching parentheses
> > (inclusive).
>
> It isn't undefined, it is ill-formed. The right parenthesis must exist.
Yes, the right parenthesis must exist, thus the behavior always
results in undefined behavior. There is nothing like "ill-formed" in
the standard.
>
> > By the way, in this case, because the committee already provided the
> > intended interpretation through the answer to the DR, it's not very
> > useful to discuss what the text of the standard means. Sure, a DR can
> > be formed if the wording has a big problem in it when comparing the
> > committee's interpretation.
>
> I'm not arguing current intent. I'm arguing that that apparent resolution does
> not mesh with existing practice, nor does it mesh with what the standard itself
> says.
In the *current* state, the correct and intended interpretation is
what the committee said. As I said repeatedly, if you think it doesn't
match what the standard itself says or what the real world works,
please submit a DR to the committee. Before the committee accepts it,
you can't claim that yours is officially correct. The standard is
means to express the intent. And even if the committee agrees with you
and it's published as a DR or even a TC, the official interpretation
for C90, the superseded standard, can't differ.
>
> > Why we, as readers of the C standard, should bother about C++DR which
> > has never been processed by the C committee? When discussing what the
> > C standard means, the important thing is what the C committee intended
> > with the text, neither what any other committee thinks about it nor
> > how you interpret it.
>
> Because the preprocessor has a context in C++ as well.
As Doug said, the preprocessor part which C and C++ share should be
interpreted by the C committee, not by the C++ committee alone. As I
know, the C++DR in question has never reviewed by the C committee.
> > Only if we assume that the authoritative interpretation agree with
> > what you claimed, which is not true.
>
> There is only one authoritative text: the normative portions of the standard.
The standard is means to express the intent. If you completely ignore
the committee's intent given in the DRs, it'd make much more holes in
the standard.
> Interpretation of the normative portions can result in only one possible
> resolution for the A()() case and several possible resolutions for the A() )
> case.
That's just your belief, not the way the standard is interpreted.
[...]
> >
> > I have no idea about whether or not the underlined wording is true,
> > but it's not what the C committee said in many places. The C89
> > committee explicitly formed their intent that differs from it.
>
> Given the non-normative reference, that's obvious. My entire point from the
> beginning of this thread is that the non-normative text is incorrect when
> compared to the wording of the standard itself.
My entire point is that you should submit a DR to the C committee if
you think so, and before that the intended interpretation doesn't
agree with yours.
> You can't "form" an intent,
> only an interpretation. The original intent is a different thing.
>
The original intent is what the C committee said in the DR. There is
no doubt about it.
It constitutes a macro invocation if there is a left parenthesis. If their is
no closing parentheses, or not enough arguments, etc., it is an error. The same
is not true without the left parenthesis. An identifier representing a
function-like macro name does not constitute an invocation.
>> It isn't undefined, it is ill-formed. The right parenthesis must
>> exist.
>
> Yes, the right parenthesis must exist, thus the behavior always
> results in undefined behavior. There is nothing like "ill-formed" in
> the standard.
A.k.a. an error. The right parenthesis is explicitly required. It isn't
undefined, it is an outright compile-time error.
>>> By the way, in this case, because the committee already provided the
>>> intended interpretation through the answer to the DR, it's not very
>>> useful to discuss what the text of the standard means. Sure, a DR
>>> can
>>> be formed if the wording has a big problem in it when comparing the
>>> committee's interpretation.
>>
>> I'm not arguing current intent. I'm arguing that that apparent
>> resolution does
>> not mesh with existing practice, nor does it mesh with what the
>> standard itself
>> says.
>
> In the *current* state, the correct and intended interpretation is
> what the committee said. As I said repeatedly, if you think it doesn't
> match what the standard itself says or what the real world works,
> please submit a DR to the committee. Before the committee accepts it,
> you can't claim that yours is officially correct. The standard is
> means to express the intent. And even if the committee agrees with you
> and it's published as a DR or even a TC, the official interpretation
> for C90, the superseded standard, can't differ.
Yes, I realize that. I am also not arguing the interpretation of the committee.
I'm arguing that the normative text of the standard does contradicts the
interpretation of the committee. I agree that this needs a DR.
>> Because the preprocessor has a context in C++ as well.
>
> As Doug said, the preprocessor part which C and C++ share should be
> interpreted by the C committee, not by the C++ committee alone. As I
> know, the C++DR in question has never reviewed by the C committee.
I agree. The point being that C does not exist in a void within the community.
It exists alongside C++, and therefore changes to shared facilities such as the
preprocessor by either the C or C++ committee should take into account (to a
degree) the effects on the other language. That is the only way not to work
against the larger community. I'm not saying that the C++ committee can dictate
anything to the C committee or vice versa. I'm simply saying that acting in the
interests of, relative to existing code especially, C alone is unwise and a
disservice to the community.
>>> Only if we assume that the authoritative interpretation agree with
>>> what you claimed, which is not true.
>>
>> There is only one authoritative text: the normative portions of the
>> standard.
>
> The standard is means to express the intent. If you completely ignore
> the committee's intent given in the DRs, it'd make much more holes in
> the standard.
I'm not ignoring it. I'm saying 1) I don't like it (which is obvious :)) and 2)
the standard does not specify the presumably unspecified behavior in any
normative section. Further, regardless of the intent of the committee or DRs,
the standard text itself defines what the language is for a certain minimum
number of years. Obviously, it is an imperfect world, and I'm not criticizing
the committee. Nor am I saying that you can't consider the language to be
changing dynamically rather than in fixed steps defined by official releases of
the standard document.
This is the position that I am in right now. Based entirely on the standards
prior to C99, the behavior was presumably specified by the normative text.
Given that assumption, without a separate external source of information such as
committee minutes, the behavior is well-defined by the parts that I mentioned
before. Given that intepretation, a large body of code has been created that
relies on that interpretation as a core element. *If* this remains unspecified
or it is defined to be a nested invocation, it more or less renders the
preprocessor library (the only C/C++ library in Boost) and all libraries or
codebases that depend on it are 100% unspecified or 100% broken (depending on
some possible resolution). That is unacceptable, and could cause C++ to diverge
from C in this regard--because of the existing codebase. This is definitely not
what I want to see.
>> Interpretation of the normative portions can result in only one
>> possible
>> resolution for the A()() case and several possible resolutions for
>> the A() )
>> case.
>
> That's just your belief, not the way the standard is interpreted.
That is a literal interpretation of the English language used in the standard.
Whether or not that is the intent does not change that fact.
> [...]
>>>
>>> I have no idea about whether or not the underlined wording is true,
>>> but it's not what the C committee said in many places. The C89
>>> committee explicitly formed their intent that differs from it.
>>
>> Given the non-normative reference, that's obvious. My entire point
>> from the
>> beginning of this thread is that the non-normative text is incorrect
>> when
>> compared to the wording of the standard itself.
>
> My entire point is that you should submit a DR to the C committee if
> you think so, and before that the intended interpretation doesn't
> agree with yours.
Will do.
>> You can't "form" an intent,
>> only an interpretation. The original intent is a different thing.
>>
>
> The original intent is what the C committee said in the DR. There is
> no doubt about it.
Yes there is. Especially considering what the C++ DR mentioned previously says.
That is not to say exactly *what* the original intent was, but rather to say
that doubt definitely exists.
Regards,
Paul Mensonides
If this is true, then it's in order to revise the text of the standard
in accordance with the committee's intent, not to revise the intent
in accordance with the text of the standard, though I don't think the
text itself explicitly conflicts with what the committee intends
considering the term "nested" left in an ambiguous way intentionally
and the wording in the informative annex. To change the intended
interpretation, you should submit a DR rather than keep claiming your
own interpretation is correct one here.
> It then further says, when discussing the rescanning of the replacement
> list, if a nested replacement yields an "outer" macro name and identifier
> referring to an outer macro, that identifier is disabled. There is no way that
> it can be "nested,"
The procedural hierarchy. An implementation of the standard C
preprocessor can be enough different from what you have in your mind.
[...]
> > As to the C++ standard, they don't much care for
> > preprocessing, and their evident intent has been that it
> > be whatever the C standard specifies. Since the two
> > standards are not updated concurrently, they tend to get
> > slightly out of sync in such areas of overlap.
>
> I don't have a problem with that. What I have a problem with is a great deal of
> code that existed prior to C99
Did the great deal of codes exist prior to C90 or publication of the
DR in question?
> is apparently broken or unspecified because the C
> committee added non-normative text to the standard that supposedly alters the
> meaning of the normative text in the standard.
>
The codes are broken, because they didn't follow the intended
interpretation of the standard (C90), not because the committee broke
them during drafting C99.
Yes, just *partial* constitution.
Within the sequence of preprocessing tokens making up an
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
invocation of a function-like macro,
~~~~~~~~~~
AFAIK the standard doesn't distinguish "to make up" from "to
constitute" here.
> >
> > Yes, the right parenthesis must exist, thus the behavior always
> > results in undefined behavior. There is nothing like "ill-formed" in
> > the standard.
>
> A.k.a. an error. The right parenthesis is explicitly required. It isn't
> undefined, it is an outright compile-time error.
>
I'm talking about the standard terminology. The standard doesn't use
the term "error", "compile-time error" and so on. What the standard
uses are "undefined behavior", "constraints violation", ...
> >
> > In the *current* state, the correct and intended interpretation is
> > what the committee said. As I said repeatedly, if you think it doesn't
> > match what the standard itself says or what the real world works,
> > please submit a DR to the committee. Before the committee accepts it,
> > you can't claim that yours is officially correct. The standard is
> > means to express the intent. And even if the committee agrees with you
> > and it's published as a DR or even a TC, the official interpretation
> > for C90, the superseded standard, can't differ.
>
> Yes, I realize that. I am also not arguing the interpretation of the committee.
> I'm arguing that the normative text of the standard does contradicts the
> interpretation of the committee. I agree that this needs a DR.
>
And you need to agree with that what the committee addressed in the DR
is the intended interpretation until the committee accepts your DR.
This is my point.
> >
> > As Doug said, the preprocessor part which C and C++ share should be
> > interpreted by the C committee, not by the C++ committee alone. As I
> > know, the C++DR in question has never reviewed by the C committee.
>
> I agree. The point being that C does not exist in a void within the community.
> It exists alongside C++, and therefore changes to shared facilities such as the
> preprocessor by either the C or C++ committee should take into account (to a
> degree) the effects on the other language. That is the only way not to work
> against the larger community. I'm not saying that the C++ committee can dictate
> anything to the C committee or vice versa. I'm simply saying that acting in the
> interests of, relative to existing code especially, C alone is unwise and a
> disservice to the community.
>
Sorry, but I don't know the committee's business in details,
especially about interaction between the C and C++ committees.
> >
> > The standard is means to express the intent. If you completely ignore
> > the committee's intent given in the DRs, it'd make much more holes in
> > the standard.
>
> I'm not ignoring it. I'm saying 1) I don't like it (which is obvious :))
C has a few (or many?) parts that other people including me dislike.
But most of them don't think those are broken or incorrect just
because of the reason.
> and 2)
> the standard does not specify the presumably unspecified behavior in any
> normative section. Further, regardless of the intent of the committee or DRs,
> the standard text itself defines what the language is for a certain minimum
> number of years. Obviously, it is an imperfect world, and I'm not criticizing
> the committee. Nor am I saying that you can't consider the language to be
> changing dynamically rather than in fixed steps defined by official releases of
> the standard document.
>
> This is the position that I am in right now. Based entirely on the standards
> prior to C99, the behavior was presumably specified by the normative text.
Nope. The wording in the C99's annex was added by C90 COR1, IIRC, even
if COR1's wording was less exact. And the annex of the standard is not
normative as the DR in question is. If you care for the annex wording
at all, you should do for the DR wording.
> Given that assumption, without a separate external source of information such as
> committee minutes, the behavior is well-defined by the parts that I mentioned
> before.
At the same time, the behavior is unspecified (or undefined) by the
parts that I mentioned before. And those parts I mentioned contain
the committee's interpretation for the parts you mentioned.
> Given that intepretation,
But it's wrong (or problematic) interpretation.
> a large body of code has been created that
> relies on that interpretation as a core element.
The codes shouldn't have depended on that behavior, if the
implementers aren't quite sure about what the committee intends.
[...]
> >
> > That's just your belief, not the way the standard is interpreted.
>
> That is a literal interpretation of the English language used in the standard.
> Whether or not that is the intent does not change that fact.
>
I'm not native speaker of English, but most members of the committee
are. I partly agree with your literl interpretation of the wording,
but think that there are ambiguity especially on the meaning of
"nested", about which the committee said "it's intentional".
> >
> > The original intent is what the C committee said in the DR. There is
> > no doubt about it.
>
> Yes there is. Especially considering what the C++ DR mentioned previously says.
I have no idea why the C++ DR was not reviewed by the C committee.
But I get an impression that the interpretation of the preprocessor
part the standards share is entirely (or primarily) up to the C
commitee.
> That is not to say exactly *what* the original intent was, but rather to say
> that doubt definitely exists.
>
Okay, I also agree that this problem should be tided up by this
chance.
For the record, this particular error violates a constraint (6.10.3#4).
--
Clive D.W. Feather, writing for himself | Home: <cl...@davros.org>
Tel: +44 20 8371 1138 (work) | Web: <http://www.davros.org>
Fax: +44 870 051 9937 | Work: <cl...@demon.net>
Written on my laptop; please observe the Reply-To address
I think that you're missing my point. I'm saying that "MACRO(" constitutes the
an invocation *as opposed to* just "MACRO" which does not constitute an
invocation. The definition of invocation in this context is the minimum
preprocessing tokens required. Yes, for an _actual_ invocation that isn't
invalid code, it needs the complete arguments and closing parenthesis. That
isn't what I'm saying. I'm saying only that the minimum tokens required to say
that a token sequence is an invocation is "MACRO(". Anything less than that
does not constitute an invocation as defined by the standard for a function-like
macro.
>> A.k.a. an error. The right parenthesis is explicitly required. It
>> isn't
>> undefined, it is an outright compile-time error.
>>
>
> I'm talking about the standard terminology. The standard doesn't use
> the term "error", "compile-time error" and so on. What the standard
> uses are "undefined behavior", "constraints violation", ...
6.10.3/4 explicitly says that there must be a closing parenthesis. If there is
no closing parenthesis, it must be rejected by a conforming implementation.
Otherwise, it is not C and the C standard is irrelevant because it is an
extension that directly ignores a requirement of the standard. It is not left
open to the implementation in this case.
>> Yes, I realize that. I am also not arguing the interpretation of
>> the committee.
>> I'm arguing that the normative text of the standard does contradicts
>> the
>> interpretation of the committee. I agree that this needs a DR.
>
> And you need to agree with that what the committee addressed in the DR
> is the intended interpretation until the committee accepts your DR.
> This is my point.
The language is defined by the normative text in the standard. It is not
defined by DR resolutions, nor is it defined by minutes--those only become
binding if they are incorporated into a future release of the standard. I agree
with you about what the current general *intention* is. I disagree that that
intention is protrayed by the text in any normative part of the standard. I
also don't agree that the text in the standard is ambiguous. If the behavior is
re-specified, or if it is clarified in the normative text as being unspecified
(which would be a mistake, IMHO) then I will agree that that is the meaning of
the standard. I do not, and never will, believe that the language is defined by
some kind of abstract intention and the standard itself is just a rough
approximation of that intention. Intention is a vague thing. It is something
that is drastically different from person to person--even on the committee. The
common ground is the wording itself--which obviously needs to be well chosen
(because of situations like this, for example). Further, the wording is what we
the users have to go on as the primary source for correctness.
>> I'm not ignoring it. I'm saying 1) I don't like it (which is
>> obvious :))
>
> C has a few (or many?) parts that other people including me dislike.
> But most of them don't think those are broken or incorrect just
> because of the reason.
Okay, let me be absolutely clear. I am saying several things. Some of them are
factual, some are observation, and some are pure opinion: 1) the normative part
of the standard is not ambiguous--though I agree is isn't as clear as it could
be, 2) the apparent interpretation by the committee is in opposition to the
literal meaning of the standard, 3) if the normative text does not match the
committee's intent, the normative text must be changed for the intent to be
relevant, 4) if the normative text changes, that constitutes breaking code--or
breaking outstanding portability of code at minimum, 5) several major and
well-respected public domain libraries--in C++ that I know of first hand from
Boost--Python, the MPL, type traits, not to mention the pp-lib itself, as well
as several libraries that aren't at this point part of Boost such as John
Torjo's smart assertion facility recently published on the CUJ website--rely on
this behavior and pointless divergence from heavily used existing C++ practice
is not a good option for C or C++, 6) the supposed interpretation is less
functional, causes a massive increase in dependencies for programmers, and is
generally worse in every way (that one is pure personal opinion based on
extensive experience in this field), 7) the supposed interpretation causes a
decrease in preprocessor efficiency because more information must be maintained
during macro expansion (under any implementation model) because it increases the
amount of processing that is all the result of a single invocation, and 8) I am
referring specifically to the case where the left parenthesis is outside of the
replacement list rather than inside. I can easily see where it is unclear
regarding partial invocation where the left parenthesis is *inside* the
replacement list because that achieves the minimum requirement for invocation.
>> This is the position that I am in right now. Based entirely on the
>> standards
>> prior to C99, the behavior was presumably specified by the normative
>> text.
>
> Nope. The wording in the C99's annex was added by C90 COR1, IIRC, even
> if COR1's wording was less exact. And the annex of the standard is not
> normative as the DR in question is. If you care for the annex wording
> at all, you should do for the DR wording.
Where does that specification exist in normative text in the current standard?
A DR that caused a non-normative note does not constitute normative
text--regardless of the intended purpose and especially not when it is contrary
to existing normative text. The normative text of the standard itself is the
contract between the designers of the language and the users of the
language--not the reasoning behind it nor abstract concepts like intention which
are impossible to pinpoint precisely. Once again, I'm not disagreeing that that
is the current general interpretation of the committee. I'm disagreeing that
non-normative notes, intentions, and arbitrary reasoning supercede the literal
text of the standard.
>> Given that assumption, without a separate external source of
>> information such as
>> committee minutes, the behavior is well-defined by the parts that I
>> mentioned
>> before.
>
> At the same time, the behavior is unspecified (or undefined) by the
> parts that I mentioned before. And those parts I mentioned contain
> the committee's interpretation for the parts you mentioned.
The parts that you mention are either non-normative parts of the standard or
text from sources external to the standard document. The parts that I mention
are entirely normative. Given an inconstistency between the two, the normative
text wins. If, OTOH, there were two opposing normative references, looking into
non-normative references gives you a basis for choosing the alternative most
likely to be used in a future iteration of the standard.
>> Given that intepretation,
>
> But it's wrong (or problematic) interpretation.
In what way is it wrong or problematic--other than that it deviates from the
current, *non-normative* interpretation?
>> a large body of code has been created that
>> relies on that interpretation as a core element.
>
> The codes shouldn't have depended on that behavior, if the
> implementers aren't quite sure about what the committee intends.
I'm not sure what you are referring to here? Implementers of preprocessors or
implementers of the code in question? I have never encountered a preprocessor
(though that isn't to say that none exist) that produces different results for
the situation where the left parenthesis is outside the replacement list. I'll
say again that the case where the left parenthesis is inside the replacement
list is not so clear cut. Regarding the code in question, there was no question
at the time of its initial implementation based on the literal meaning of the
normative text in the standard. I find it absolutely clear regarding the
"A()()()" context, but not so clear regarding the "A() )" context which I have
avoided using for specifically that reason. Further, this code existed before I
took it over, and many programs and libraries where already using it at that
time. If a resolution to this dilemma means that the A()()() case is nested, it
utterly destroys the entire field of preprocessor metaprogramming--not just my
work--because this is absolutely base level behavior. The pure physical number
of macros required to mimic the currently accepted semantics in use would grow
expontentially. It is fundamental and absolutely necessary that a replacement
list that terminates solely in the name of a function-like macro _cannot_ be a
nested invocation. This is not because of extremely simple examples of
iterative recursion like A()()(), but rather much more involved examples
involving normal logical flow structures. With this behavior, something as
simple as a general purpose IF construct cannot be implemented without external
context induced by multiple scans of the same text caused by use in both
parameters and replacement lists. This would cause such a pathological increase
in both user code size and library code size that it would no longer be
worthwhile to use code generation facilities in-source, thereby increasing the
number of tools external to the language itself and decreasing portability even
further. The ramifications are legion and are not just restricted to obscure
corners.
For the sake of argument though, this brings in other aspects of the behavior of
macro expansion, such as the scope of arguments to a macro, whether or not
arguments to "indirect" calls like this one are included as "nested" when they
are replaced and rescannned:
#define X() // ...
#define A() B(
#define B(x) x
A() X() )
Is X rescanned with A disabled? If that is the intent, that is wholly
counterintuitive, because the entirety of the invocation is outside of the
replacement list of A. The rules can be utterly simple: only macros that are
invoked with a left parenthesis inside the replacement list of another macro
have that outer macro disabled when the replacement list of said macro is
rescanned. This simplifies everything to a simple singular point in the
sequence of preprocessing tokens that marks the point where a macro becomes
enabled again.
>> Yes there is. Especially considering what the C++ DR mentioned
>> previously says.
>
> I have no idea why the C++ DR was not reviewed by the C committee.
> But I get an impression that the interpretation of the preprocessor
> part the standards share is entirely (or primarily) up to the C
> commitee.
I agree that it should be primarily up to the C committee--unless the C
committee does something that is an extremely bad move regarding the C++
context--which is why I say that C++ is relevant though not the absolute
be-all-and-end-all. A good example of this is variadic macros and
placemarkers--which likely never would have been added by C++, yet are excellent
features. In fact, they are arguable more useful in C++ than in C given C++
types that contain open commas:
std::pair<int, int>
^
>> That is not to say exactly *what* the original intent was, but
>> rather to say
>> that doubt definitely exists.
>>
>
> Okay, I also agree that this problem should be tided up by this
> chance.
I definitely agree here. Hopefully, a reasonable solution can be produced. If
not, I and others have wasted a great deal of time gaining expertise and
providing tools that others use to make things easier. That is not something
that I'm willing to throw away without good cause.
Regards,
Paul Mensonides
It constitutes a macro invocation if there is a left parenthesis. If their is
no closing parentheses, or not enough arguments, etc., it is an error. The same
is not true without the left parenthesis. An identifier representing a
function-like macro name does not constitute an invocation.
>> It isn't undefined, it is ill-formed. The right parenthesis must
>> exist.
>
> Yes, the right parenthesis must exist, thus the behavior always
> results in undefined behavior. There is nothing like "ill-formed" in
> the standard.
A.k.a. an error. The right parenthesis is explicitly required. It isn't
undefined, it is an outright compile-time error.
>>> By the way, in this case, because the committee already provided the
>>> intended interpretation through the answer to the DR, it's not very
>>> useful to discuss what the text of the standard means. Sure, a DR
>>> can
>>> be formed if the wording has a big problem in it when comparing the
>>> committee's interpretation.
>>
>> I'm not arguing current intent. I'm arguing that that apparent
>> resolution does
>> not mesh with existing practice, nor does it mesh with what the
>> standard itself
>> says.
>
> In the *current* state, the correct and intended interpretation is
> what the committee said. As I said repeatedly, if you think it doesn't
> match what the standard itself says or what the real world works,
> please submit a DR to the committee. Before the committee accepts it,
> you can't claim that yours is officially correct. The standard is
> means to express the intent. And even if the committee agrees with you
> and it's published as a DR or even a TC, the official interpretation
> for C90, the superseded standard, can't differ.
Yes, I realize that. I am also not arguing the interpretation of the committee.
I'm arguing that the normative text of the standard does contradicts the
interpretation of the committee. I agree that this needs a DR.
>> Because the preprocessor has a context in C++ as well.
>
> As Doug said, the preprocessor part which C and C++ share should be
> interpreted by the C committee, not by the C++ committee alone. As I
> know, the C++DR in question has never reviewed by the C committee.
I agree. The point being that C does not exist in a void within the community.
It exists alongside C++, and therefore changes to shared facilities such as the
preprocessor by either the C or C++ committee should take into account (to a
degree) the effects on the other language. That is the only way not to work
against the larger community. I'm not saying that the C++ committee can dictate
anything to the C committee or vice versa. I'm simply saying that acting in the
interests of, relative to existing code especially, C alone is unwise and a
disservice to the community.
>>> Only if we assume that the authoritative interpretation agree with
>>> what you claimed, which is not true.
>>
>> There is only one authoritative text: the normative portions of the
>> standard.
>
> The standard is means to express the intent. If you completely ignore
> the committee's intent given in the DRs, it'd make much more holes in
> the standard.
I'm not ignoring it. I'm saying 1) I don't like it (which is obvious :)) and 2)
the standard does not specify the presumably unspecified behavior in any
normative section. Further, regardless of the intent of the committee or DRs,
the standard text itself defines what the language is for a certain minimum
number of years. Obviously, it is an imperfect world, and I'm not criticizing
the committee. Nor am I saying that you can't consider the language to be
changing dynamically rather than in fixed steps defined by official releases of
the standard document.
This is the position that I am in right now. Based entirely on the standards
prior to C99, the behavior was presumably specified by the normative text.
Given that assumption, without a separate external source of information such as
committee minutes, the behavior is well-defined by the parts that I mentioned
before. Given that intepretation, a large body of code has been created that
relies on that interpretation as a core element. *If* this remains unspecified
or it is defined to be a nested invocation, it more or less renders the
preprocessor library (the only C/C++ library in Boost) and all libraries or
codebases that depend on it are 100% unspecified or 100% broken (depending on
some possible resolution). That is unacceptable, and could cause C++ to diverge
from C in this regard--because of the existing codebase. This is definitely not
what I want to see.
>> Interpretation of the normative portions can result in only one
>> possible
>> resolution for the A()() case and several possible resolutions for
>> the A() )
>> case.
>
> That's just your belief, not the way the standard is interpreted.
That is a literal interpretation of the English language used in the standard.
Whether or not that is the intent does not change that fact.
> [...]
>>>
>>> I have no idea about whether or not the underlined wording is true,
>>> but it's not what the C committee said in many places. The C89
>>> committee explicitly formed their intent that differs from it.
>>
>> Given the non-normative reference, that's obvious. My entire point
>> from the
>> beginning of this thread is that the non-normative text is incorrect
>> when
>> compared to the wording of the standard itself.
>
> My entire point is that you should submit a DR to the C committee if
> you think so, and before that the intended interpretation doesn't
> agree with yours.
Will do.
>> You can't "form" an intent,
>> only an interpretation. The original intent is a different thing.
>>
>
> The original intent is what the C committee said in the DR. There is
> no doubt about it.
Yes there is. Especially considering what the C++ DR mentioned previously says.
That is not to say exactly *what* the original intent was, but rather to say
that doubt definitely exists.
Regards,
Paul Mensonides
#define func(x) x
#define bar func(
#define foo bar foo
foo )
should expand to "foo" according to the C89 pseudo-code that Dave
Prosser showed me, which is what was translated into English for the
standard. Many preprocessors issue a diagnostic about an unterminated
argument list invoking "foo" because they expand too much because they
lose the blue paint on the parenthesized foo in the process of dropping
to the file level to find the ')'.
Neil.
This one isn't directly relevant to what we're discussing though. In the
example above, foo directly expands to itself. No invocation attempt is
required to disable the 'foo' token in foo's replacement list. The standard is
clear on this point.
There are fundamentally two types of disabling (blue paint): 1) disabling of
macro names, which in turn causes 2) disabling of specific identifier
preprocessing tokens. The first one occurs only during the rescan of a macro's
replacement list (including nested replacements/rescans) and the second occurs
when an identifier token is found (whether it is part of an invocation or not)
that refers to a macro name that is currently disabled. The first type is a
temporary disabling, the second is permanent. Once a specific token is painted,
it is always painted--unless it becomes a new token caused by token-pasting
(excluding placemarker pasting).
The context that we're discussing is the first type--the disabling of macro
names themselves rather than specific tokens. However, this example does show
how pathologically bad current preprocessors' conformance is though. Another
simple one is this:
#define TWO a, b
#define UNARY(x) BINARY(x)
#define BINARY(x, y) x + y
UNARY(TWO)
A great many preprocessors fail this test, but the behavior is absolutely clear
from the standard--the parameter (TWO) should be replaced and rescanned before
it is inserted into the replacement list of UNARY. At that point, UNARY's
replacement list is rescanned causing a valid invocation of BINARY.
Regards,
Paul Mensonides
Okay, but I don't see why this fact is really connected to
interpreting what the term "nested" used in the standard means; see
below.
[...]
> >
> > I'm talking about the standard terminology. The standard doesn't use
> > the term "error", "compile-time error" and so on. What the standard
> > uses are "undefined behavior", "constraints violation", ...
>
> 6.10.3/4 explicitly says that there must be a closing parenthesis. If there is
> no closing parenthesis, it must be rejected by a conforming implementation.
Nope. In this case, a conforming implementation can do anything it
wants provided that an implementation-defined diagnostic is issued.
> Otherwise, it is not C and the C standard is irrelevant because it is an
> extension that directly ignores a requirement of the standard.
The only requirement I can see is to issue a diagnostic. The standard
doesn't require an implementation to reject translation of a program
which contains undefined behavior or constraints violation.
[...]
> I disagree that that
> intention is protrayed by the text in any normative part of the standard. I
> also don't agree that the text in the standard is ambiguous. If the behavior is
> re-specified, or if it is clarified in the normative text as being unspecified
> (which would be a mistake, IMHO) then I will agree that that is the meaning of
> the standard.
If the name of the macro being replaced is found during this scan
of the replacement list (not including the rest of the source
file's preprocessing tokens), it is not replaced. Furthermore, if
any nested replacements encounter the name of the macro being
replaced, it is not replaced.
If we remove these two statements from the standard, what should the
following code results in? Is there any other part to prohibit the
nested macro expansion in the standard?
#define foo bar foo
foo
As I understand, the first statement quoted above says about the
one-level nested expansion of macros like this:
#define foo() bar foo
foo()() /* results in "bar foo()", not "bar bar foo" */
And the second one is for all indirectly-nested expansions like what
we are discussing. Then, now we have the interpretation problem on
what the term "nested" means in the above context. The committee
decided, a long time ago, to leave it as unspecified and made various
interpretations possible depending on implementation details. The
interpretation you are claiming to be right,
- a macro name alone doesn't constitute invocation of function-like
macros
- but, a macro name with the opening parenthesis does constitute the
invocation
is just possible one among them, not absolute one.
> >
> > At the same time, the behavior is unspecified (or undefined) by the
> > parts that I mentioned before. And those parts I mentioned contain
> > the committee's interpretation for the parts you mentioned.
>
> The parts that you mention are either non-normative parts of the standard or
> text from sources external to the standard document. The parts that I mention
> are entirely normative. Given an inconstistency between the two, the normative
> text wins.
That's what you want, not what is always right. If there is an
inconsistency between them, a DR to revise the normative text in
accordance with the intent should be in order, not a DR to revise the
intent in accordance with the text.
> >
> > The codes shouldn't have depended on that behavior, if the
> > implementers aren't quite sure about what the committee intends.
>
> I'm not sure what you are referring to here? Implementers of preprocessors or
> implementers of the code in question?
I meant anyone whose codes depend on that behavior.
[...]
> It is fundamental and absolutely necessary that a replacement
> list that terminates solely in the name of a function-like macro _cannot_ be a
> nested invocation. This is not because of extremely simple examples of
> iterative recursion like A()()(), but rather much more involved examples
> involving normal logical flow structures. With this behavior, something as
> simple as a general purpose IF construct cannot be implemented without external
> context induced by multiple scans of the same text caused by use in both
> parameters and replacement lists.
Could you elaborate on your last statement here? I'm not sure about
what you meant.
And I've never seen that the behavior on which the codes in question
depend is used for a good purpose, as I recall, which was a reason
that the committee deicded not to specify it.
>
> For the sake of argument though, this brings in other aspects of the behavior of
> macro expansion, such as the scope of arguments to a macro, whether or not
> arguments to "indirect" calls like this one are included as "nested" when they
> are replaced and rescannned:
>
> #define X() // ...
>
> #define A() B(
> #define B(x) x
>
> A() X() )
>
> Is X rescanned with A disabled?
As I know, it's also unspecified.
[...]
> Hopefully, a reasonable solution can be produced. If
> not, I and others have wasted a great deal of time gaining expertise and
> providing tools that others use to make things easier. That is not something
> that I'm willing to throw away without good cause.
Unfortunately, according to Doug, the supposed revision the committee
has/had in their mind is not the same as what you want. I don't think
that the problem is quite easy; if the committee wants to specify any
piece of the behavior, they should take account of almost all possible
cases which were left as (explicitly or implicitly) unspecified after
C89, in order to define the exact meaning of "nested".
> Nope. In this case, a conforming implementation can do anything it
> wants provided that an implementation-defined diagnostic is issued.
> The only requirement I can see is to issue a diagnostic. The standard
> doesn't require an implementation to reject translation of a program
> which contains undefined behavior or constraints violation.
The terminology used here is irrelevant. The only relevant point here is that
the standard requires the closing right parenthesis and the right number of
parameters.
> If the name of the macro being replaced is found during this scan
> of the replacement list (not including the rest of the source
> file's preprocessing tokens), it is not replaced. Furthermore, if
> any nested replacements encounter the name of the macro being
> replaced, it is not replaced.
>
> If we remove these two statements from the standard, what should the
> following code results in? Is there any other part to prohibit the
> nested macro expansion in the standard?
>
> #define foo bar foo
>
> foo
It would be infinite replacement yielding an infinite sequence of "bar"
preprocessing tokens. There is no other part of the standard that discusses it
this concept directly, though other parts of the standard discuss the terms
used.
> As I understand, the first statement quoted above says about the
> one-level nested expansion of macros like this:
>
> #define foo() bar foo
>
> foo()() /* results in "bar foo()", not "bar bar foo" */
>
> And the second one is for all indirectly-nested expansions like what
> we are discussing.
As mentioned in another post, there are two types of disabling. 1) The
disabling of the macro name itself. This is done while a macro's replacement
list is rescanned. Any time that a corresponding identifier token is found
while a macro is disabled causes 2) a specific identifier token to become
"painted blue" permanently. The two rules aren't really two rules, but rather
one rule with a clarification for nested replacements found during the
rescan--i.e. it clarifies that, when further macros are expanded during the
rescan, the outer macro name is still considered to be rescanning the
replacement list and is therefore still disabled. Further still, a macro name
that is disabled is only temporarily disabled during the rescan of its
replacement list. Once that rescan is complete, the restriction on the macro
name itself is lifted, though all specific identifier tokens that were disabled
remain disabled. More on this in answer to your question below...
> Then, now we have the interpretation problem on
> what the term "nested" means in the above context. The committee
> decided, a long time ago, to leave it as unspecified and made various
> interpretations possible depending on implementation details. The
> interpretation you are claiming to be right,
Are you saying, that because the term "nested" is not explicitly defined by the
standard itself, that the term has no meaning? Or has meaning only as a general
notion? If so, then the entire section has no meaning whatsoever, and could
just as easily mean that this should expand forever:
#define A() B()
#define B() A()
A()
> - a macro name alone doesn't constitute invocation of function-like
> macros
> - but, a macro name with the opening parenthesis does constitute the
> invocation
>
> is just possible one among them, not absolute one.
It is the only interpretation that is not contrary to the details of
function-like macro invocation presented elsewhere--which you've summarized in
the above two points. The fundamental reason that this is the case is that
macro expansion is not defined by the standard as a recursive process. The text
of the invocation is replaced *prior* to rescanning according to strict
interpretation of the standard. This is not recursion, but rather in-place
scanning iteration. There is no "replacement buffer" in the pure model. That
is only a detail of a possible implementation intended to mimic the exact
behavior. That exact behavior is replacement _before_ rescanning allowing
rescanning of a replacement list to logically proceed directly into regular
scanning (or rescanning of some outer macro replacement). This model--which is
defined by the standard--in essence defines a point at which a name becomes
enabled again. In the case that I'm arguing, that point is here:
#define A() B
#define B() A
A () ()
B ()
^
In order for the macro B to be considered invoked in any way, the preprocessor
can no longer be rescanning the replacement list of A which terminated at the
circumflex according to the most minimal possible definition of "invocation"
which is "MACRO(". This is the only logical result. Further, from the other
example:
#define A() B(
#define B(x) x
#define C() // ...
A() C() )
The absolute most that can be said is that "B(" constitutes a nested invocation,
so "A" would be disabled when B gets rescanned. At the same time, even though
"C()" is an argument to B it is not nested because it is throughly invoked
outside of the replacement list of A and therefore beyond the point at which A
becomes re-enabled. Therefore, A is not disabled when C's replacement list is
rescanned. The only thing that could change here is that "B(" not be considered
a full invocation and therefore is not nested. The semantics of "C()" would
remain the same whether or not "B(" constitutes an invocation inside the
replacement list of A. This case I'm not arguing, as the case can be made for
either of two possible definitions of "invocation"--either "MACRO(" or
"MACRO(...)". "MACRO(" constitutes only the minimum by which the token sequence
can be considered an invocation.
>>> At the same time, the behavior is unspecified (or undefined) by the
>>> parts that I mentioned before. And those parts I mentioned contain
>>> the committee's interpretation for the parts you mentioned.
>>
>> The parts that you mention are either non-normative parts of the
>> standard or
>> text from sources external to the standard document. The parts that
>> I mention
>> are entirely normative. Given an inconstistency between the two,
>> the normative
>> text wins.
>
> That's what you want, not what is always right. If there is an
> inconsistency between them, a DR to revise the normative text in
> accordance with the intent should be in order, not a DR to revise the
> intent in accordance with the text.
It isn't what I necessarily want. It is a fact. DRs have no relevance unless
they impact future iterations of the standard. Before that future standard is
released, the normative text of the standard remains fixed and binding as
required by the standards bodies for the purpose of stabilization.
> [...]
>> It is fundamental and absolutely necessary that a replacement
>> list that terminates solely in the name of a function-like macro
>> _cannot_ be a
>> nested invocation. This is not because of extremely simple examples
>> of
>> iterative recursion like A()()(), but rather much more involved
>> examples
>> involving normal logical flow structures. With this behavior,
>> something as
>> simple as a general purpose IF construct cannot be implemented
>> without external
>> context induced by multiple scans of the same text caused by use in
>> both
>> parameters and replacement lists.
>
> Could you elaborate on your last statement here? I'm not sure about
> what you meant.
A macro name is disabled only during the initial rescan of a replaced macro.
Further scans of that same token sequence do not have that name disabled.
Specific identifier preprocessing tokens that were found during that initial
rescan that refer to a currently disabled macro are permanently disabled. Which
is why the reference to "other contexts" exists. When an argument is passed to
a macro, and that argument is used in the replacement list of the macro without
being adjacent to # or ##, it effectively causes the token sequence to be
scanned twice...
#define ID(x) x
...once when the parameter is replaced and rescanned and once when the
replacement list of ID is rescanned. This introduces the notion of deferred
evaluation which is what I was referring to above. The left parenthesis can be
hidden through the rescan of a replacement list, preventing the name of a
currently disabled macro from being generated:
#define LP (
#define A() B LP )
#define B() A LP )
A()
A normal replacement of this sequence of preprocessing results in:
B ( )
At this point, the rescanning of A's replacement list is complete and the
restriction on A is lifted. Further scans induced by use as parameter +
replacement-list will cause this to expand back and forth from B ( ) to A ( ) to
B ( ) and so on--indefinitely:
#define SCAN(x) x
A() // B ( )
SCAN( A() ) // A ( )
SCAN(SCAN( A() )) // B ( )
SCAN(SCAN(SCAN( A() ))) // A ( )
This is a technique called deferral, and is the root of what I was referring to
above. Massive, large-scale deferral, with the outer context being manual
increases in the number of scans.
> And I've never seen that the behavior on which the codes in question
> depend is used for a good purpose, as I recall, which was a reason
> that the committee deicded not to specify it.
Wow, that's quite a judgement given that some very popular and respected
libraries are implemented using this behavior in order to avoid a massive
increase in code size and consequently a massive increase in maintainence
points. You just tossed some major components of Boost for example as well as
many other libraries and programs that use the same techniques that Boost uses
in both C and C++.
...simply because you haven't seen code that uses this behavior to good purpose
and making a naive and uneducated assumption based purely on that fact. I'm
also guessing that you haven't even tried, and, lacking expertise in the
specific field, you have no way of knowing whether it can or cannot be used for
good purpose. Frankly, I find that implication of poor design offensive. What
I have given you are specific examples of existing libraries that use this
behavior to increase functionality, decrease code size, and decrease the number
of maintenance points, and that list continues.
[Another more specific, practical example is the implementation of the
function(s) that determines whether a universal character name is valid in an
identifier (i.e. Annex D in C99). I guarantee that that code can be implemented
cleaner and faster using preprocessor metaprogramming than without using it
while simultaneously separating the data (the ranges of valid values and
isolated values) from the algorithm itself in the source, thereby reducing the
number of maintenance points and causing the dataset to be specified as a
trivial copy and replacement from the standard itself with a few global
replacements. In fact, I can implement that algorithm in all of ten minutes
instead of meticulous entry of the dataset.]
Further, you say that the code should not have used this behavior to begin with
(which is beside the point now) because of a C DR, yet at the same time an
existing C++ DR points towards the exact opposite resolution--including a
reference to original intent. That DR is just as valid as any other from a
user's point of view.
What this ultimately comes down to is that this could cause C and C++ to diverge
in this relatively small area, because of the sheer amount of existing modern
C++ code that relies on this behavior. This is unnecessary and would be
unfortunate for C, because it would lose the field altogether--and this field
revolves around use of the preprocessor as a program manipulation tool which is
the overarching reason for having a preprocessor in the first place.
>> Hopefully, a reasonable solution can be produced. If
>> not, I and others have wasted a great deal of time gaining expertise
>> and
>> providing tools that others use to make things easier. That is not
>> something
>> that I'm willing to throw away without good cause.
>
> Unfortunately, according to Doug, the supposed revision the committee
> has/had in their mind is not the same as what you want. I don't think
> that the problem is quite easy; if the committee wants to specify any
> piece of the behavior, they should take account of almost all possible
> cases which were left as (explicitly or implicitly) unspecified after
> C89, in order to define the exact meaning of "nested".
Not necessarily. The number of cases that rely on A()() constituting a nested
invocation are quite rare to the point of being nil. Of course, that is my
personal observation because that behavior has no utility in practice.
Regards,
Paul Mensonides
> It results in unspecified behavior. I don't know why you and Paul
> Mensonides don't believe this even if Doug (who is a committee member)
> and I've already provided some evidences. That ("the behavior is
I don't need to believe. I need to understand.
> > And again, this code doesn't illustrates the same case.
>
> It exactly illustrates the same case, i.e., what you were asking (why
> the "m" in the parentheses should be expanded).
>
> If "m(m)" is expanded as follows anyway,
>
> !(m) + n(m)
>
> "m" is given to n() as an argument. During expansion of "n()", the
> argument is fully expanded, thus you get this:
>
> !(m) + !(m')+n'(m)
>
> [m' and n' indicate that they are permanently blue-painted]
>
> Now, your question is why "m" in the parentheses should be expanded,
> even if it looks like nested expansion of "m" which is done during the
> argument expansion, isn't it? But the expansion of the argument "m" is
Yes.
> irrelevant (spatially and procedurally) to the *rescan* occurred after
> it's done. My simplified "foo/bar" example above is meant to show
All locks that we made during an argument expansion shall be considered
during further rescanning, when all arguments are substituted already.
Seeing that, the code below doesn't cause an endless expansion (a "recursive
death").
#define a(x) x(b)
#define b a
a(b)
> this.
> #define foo bar
> #define mac(x) x(foo)
>
> mac(foo)
>
> To expand mac(), the argument "foo" should be fully expanded:
>
> bar(foo)
>
> But the expansion of the argument "foo" doesn't affect expansion of
> "foo" in the parentheses during the rescan, which is the reason we get
> "bar(bar)". If you agree with that we should get "bar(bar)" as the
> result, the same thing applies to the above case you asked.
I agree, but this code is still about other case: the argument "foo" in the
parentheses is not a part (neither directly nor recursively) of the
replacement list of the macro "foo". When the macro "m" that is in my code
is.
--
ik
Is that in fact the case? DR resolutions can be put into several
categories:
1. Non-responsive. Real life example: a yes or no question for which
the committee's entire response was something along the lines of "The
standard's wording is clear". I believe such a response constitutes a
dereliction of the committee's responsibility. The wording was
certainly not clear to the person who filed the DR, even if the
committee felt that he was an idiot for not figuring it out. The
committee's response left him still uncertain whether the correct
answer was "yes", "no", or "that's a meaningless question".
2. A response that explains the clear meaning of the text. This
clearly doesn't change anything.
3. A response that clarifies unclear text of the standard. I believe
that such a resolution has the same normative effect as if that
version of the standard had been re-written to make the meaning clear.
4. A response that corrects something the committee feels is a defect
in the standard, directly contradicting the actual text. I believe
that such a resolution has the same normative effect as if that
version of the standard had been re-written to correct the defect.
I've seen people argue that responses in categories 3 and 4 should be
considered to apply normatively even to later versions of the same
standard, but that doesn't make sense to me. Issuing a new standard
provides an opportunity to correct those problems. If the committee
fails to approve new wording that implements such DRs, I believe that
should be considered a repudiation of those DRs, at least with respect
to the new standard.
Of course, what I think doesn't really matter. What does matter is
what ANSI/ISO rules say about DRs. Can anyone elighten me as to what
those rules say?
Yes, you need to understand what the standard really means, which can
be done by believing what the committee said in this special case. I
said "believe" since you might claim that the DR mentioned before is
irrelevant since it's not normative and that the normative text says
different things. Anyway this problem being discussed in other branch
of this thread is not directly related to what you don't understand;
see below.
[...]
>
> All locks that we made during an argument expansion shall be considered
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> during further rescanning,
~~~~~~~~~~~~~~~~~~~~~~~~~
Nope. If this is true, the simplified example I made should result in
"bar(foo)", not "bar(bar)".
> when all arguments are substituted already.
> Seeing that, the code below doesn't cause an endless expansion (a "recursive
> death").
>
> #define a(x) x(b)
> #define b a
>
> a(b)
>
Which results in:
a(b)
| [a]
[argument replacement]
|
b
| [a, b]
a'
|
[argument replacement done]
| [a]
a'(b)
| [a, b]
a'(a')
Note that the "b" in the replacement list of "a(x)" is expanded
because it's not from the expansion of the argument "b", which is what
you asked and is what I explained with the simplified example.
As I know, the DRs against the previous version of the standard
doesn't have any official effect to the new version of the standard,
which means that the committee should have carefully considered all of
C90 DRs and made wordings more clear related to those DRs. Because
almost nothing was done for many of C90 DRs, someone should submit
them again to know whether or not the committee's answers in them are
still effective.
> but that doesn't make sense to me. Issuing a new standard
> provides an opportunity to correct those problems. If the committee
> fails to approve new wording that implements such DRs, I believe that
> should be considered a repudiation of those DRs, at least with respect
> to the new standard.
In this case, because the wording in the annex was not removed during
drafting C99 (it was rather clarified), the committee seems to believe
that the normative text itself has no problem to express their intent
(the second case above), while one believe that the text directly
contradicts the committee's intent.
> > All locks that we made during an argument expansion shall be considered
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > during further rescanning,
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Nope. If this is true, the simplified example I made should result in
> "bar(foo)", not "bar(bar)".
I don't think so.
> #define foo bar
> #define mac(x) x(foo)
>
> mac(foo)
It results in:
mac(foo) (original expression)
mac(bar[foo]) (argument expansion)
bar[foo.mac](foo[mac]) (argument substitution)
bar[foo.mac](bar[foo.mac]) (rescanning)
bar(bar) (fully expanded result)
All locks that are appeared during the expansion of the argument "foo" are
considered and original expression is resulted in "bar(bar)". So, this
example is absolutely not concerned with the deal.
> Which results in:
>
> a'(a')
Okay, I'll give you an example which is quite simple and ultimate clear.
#define a a|
#define b(x) x
b(a)
which results in:
b(a) (original epxression)
b(a[a]|) (argument expansion)
a[a.b]| (argument substitution)
a[a.b]| (rescanning)
a| (fully expanded result)
But what if you are right and the argument expansion locks are not
considered during rescanning? As I understand, then we should expect the
following:
b(a) (original expression)
b(a[a]|) (argument expansion)
a[b]| (argument substitution)
a[a.b]|| (rescanning)
a|| (fully expanded result)
Which is something wrong, of course.
--
ik
No; there is an amgiguity, as I pointed out before. The ambiguity
doesn't matter for "normal" use of the macro facility, where both
alternative meanings are in agreement.
While "B (" triggers the specification for invocation of a
function-like macro, it does *not* constitute the entirety of
the invocation of the macro; there are also arguments and a
closing ")".
> #define A() B
> #define B() A
> A () ()
> B ()
> ^
> In order for the macro B to be considered invoked in any way, the preprocessor
> can no longer be rescanning the replacement list of A which terminated at the
> circumflex according to the most minimal possible definition of "invocation"
> which is "MACRO(". This is the only logical result.
Except that's wrong. The "B" is produced during replacement
for the outer A() and is seen and its *entire* invocation
replaced *during the rescan* which involves *use of following
pp-tokens* as specified in the standard.
The *replacement processes* are certainly nested. The *spatial
scopes* are not nested. Which one is mean when the standard says
"nested"? This is not determinable by logical analysis alone.
Normally we would have said "clear *enough*", meaning clear enough
that additional wording would run as much risk of confusing as of
clarifying. ("Why does this seem to say the same thing twice? Is
it a genuine redundancy or am I missing something?")
We do try to not respond by impugning the intelligence or motives
of the originator, even when we in fact have our doubts about them.
The procedure followed when I was collecting, editing, and
formatting the responses to public comments during the initial
review period in the late 1980s was that each subgroup submitted
a two-letter "response code", that would be macro-expanded into a
single introductory summary of the disposition of the comment,
*and* additional narrative further explaining the response. This
worked well, and the one-line summaries were valuable when
collected into an overview listing.
Since then, occasionally some subgroup tries to submit (no longer
to me) just a summary response and not the additional narrative.
This is appropriate as a suggested direction while the issue is
still open, but runs the danger of being approved as the entirety
of the final response.
> 2. A response that explains the clear meaning of the text. This
> clearly doesn't change anything.
Since with our limited resources we can't really afford to act as
C programming tutors, this is the case that most often gets a
terse response.
> 3. A response that clarifies unclear text of the standard. I believe
> that such a resolution has the same normative effect as if that
> version of the standard had been re-written to make the meaning clear.
Originally we had, rightly or not, the impression that revisions
to the normative specification were undesirable. During that
period, so long as there was consensus on the correct interpretation,
we endeavored to explain how to read and understand whatever was
in the existing text whenever it seemed a reasonable way to arrive
at the desired interpretation.
More recently, we have tended to treat such cases as worthy of
revising the standard, at least to the extent of adding some
non-normative text such as examples or footnotes. These stack
up until there are enough more serious revisions needed, ...
> 4. A response that corrects something the committee feels is a defect
> in the standard, directly contradicting the actual text. I believe
> that such a resolution has the same normative effect as if that
> version of the standard had been re-written to correct the defect.
... which takes effect when the responses are incorporated into
a Technical Corrigendum that goes through an approval process.
Approved TCs officially modify the current standard. Anything
else indicates committee intentions, but has no official clout.
> I've seen people argue that responses in categories 3 and 4 should be
> considered to apply normatively even to later versions of the same
> standard, but that doesn't make sense to me. Issuing a new standard
> provides an opportunity to correct those problems. If the committee
> fails to approve new wording that implements such DRs, I believe that
> should be considered a repudiation of those DRs, at least with respect
> to the new standard.
We tried to review all C90 DR responses and incorporate them into
C99. Sometimes we thought that other changes in C99 would resolve
the issues better than trying to turn explanatory responses into
normative text. Some issues were reopened during preparation of
C99 and the outcome of the new discussion might have differed from
the previous one.
It is certainly not intended that any of the C90/C95 DR responses,
other than perhaps some that are of a purely explanatory nature,
be applied to C99.
> Of course, what I think doesn't really matter. What does matter is
> what ANSI/ISO rules say about DRs. Can anyone elighten me as to what
> those rules say?
We're obliged to (eventually) respond to them. How we respond is
up to the committee and in particular its Convenor. How we have
responded in the recent past is to post the current state of
resolution on the SC22/WG14 web site; resolutions that were
accepted for incorporation into a TC accumulate until it "weighs"
enough to justify the work to submit it for ratification. On
rare occasion a pending TC item can be reopened when there are
new developments.
I have not said that it does, only that it is the minimum that *could* be
considered an invocatin.
>> #define A() B
>> #define B() A
>> A () ()
>> B ()
>> ^
>> In order for the macro B to be considered invoked in any way, the
>> preprocessor can no longer be rescanning the replacement list of A
>> which terminated at the circumflex according to the most minimal
>> possible definition of "invocation" which is "MACRO(". This is the
>> only logical result.
>
> Except that's wrong. The "B" is produced during replacement
> for the outer A() and is seen and its *entire* invocation
> replaced *during the rescan* which involves *use of following
> pp-tokens* as specified in the standard.
No, that is wrong. The following pp-tokens do not constitute part of the
rescanning caused by replacement. That is fundamentally incorrect. Rather
rescanning *becomes* regular scanning (or outer rescanning) at the end of the
replacement list in question. Because of that, it is unclear whether or not a
partial invocation (i.e. the minimum two tokens) is considered invoked with the
rest of the invocation outside of the replacement list. This lack of clarity
does not extend to the case above, because under no possible definition of
"invocation" can B be considered invoked during the rescan of the replacement
list.
> The *replacement processes* are certainly nested. The *spatial
> scopes* are not nested. Which one is mean when the standard says
> "nested"? This is not determinable by logical analysis alone.
The replacement processes are certainly not nested. You seem to have the idea
that the macro expansion process is recursive. It is not. It is in-place
iteration over pp-tokens.
Regards,
Paul Mensonides
The correct result of this expansion is "bar(bar)" and is not relevant to the
discussion in the other thread. The argument to mac, "foo", is expanded and
rescanned throughly before it every gets inserted into the replacement list of
mac. This replacement and rescanning yields only "bar" which does not
constitute an invocation at all--irrelevant to the other discussion because no
trailing source tokens are available during the replacement/rescan of the
argument. That result, "bar" is inserted into the replacement list of mac,
yielding "bar(foo)". At this point, there is no context disabling foo, only a
context disabling mac, which causes the final result to be "bar(bar)".
> Okay, I'll give you an example which is quite simple and ultimate
> clear.
>
> #define a a|
> #define b(x) x
You are correct here, the result is "a|" but not for the same reasons as above.
Macro replacement creates a context that specifies a range of tokens for which a
certain *macro* is disabled. This is range is defined as the tokens that make
up the replacement list of a macro. That context exists *only* while the
replacement list is rescanned, and its only purpose is to "paint" specific
identifier tokens as disabled. After the replacement list is rescanned, that
context no longer exists, yet all tokens that were painted remain painted
forever (unless they are converted to a new token via token-pasting). To boil
down what this means: disabling ultimately occurs on specific tokens.
Disabling is not a specific range of tokens for which macro XYZ is not
available. That context exists *only* during the rescan of the replacement list
of the relevant macro (including nested replacements). The context that causes
(or _can_ cause) an identifier to be painted is not carried around. Only the
tokens are, some of which may still be painted. To illustrate:
#define lp (
#define a b lp ) a
#define b() a
#define scan(x) x
a // b ( ) a'
scan(a) // a a' -> b' ( ) a' a'
Regards,
Paul Mensonides
I misunderstood what you mean with "considered". Yes, the locks are
considered, but does nothing with the rescan of the remaining part of
the replacement list in this case, which is the reason I said it's not
"considered".
>
> > Which results in:
> >
> > a'(a')
>
> Okay, I'll give you an example which is quite simple and ultimate clear.
>
> #define a a|
> #define b(x) x
>
> b(a)
b(a)
| [b]
[argument expansion]
| [b, a]
a'|
|
[argument expansion done]
| [b]
a'|
Which is completely same as what you get below. I guess that you are
confusing here the permanently blue-painted tokens (which is indicated
as ' following the token) with the disabled names (which is indicated
as a set in []). This is the reason I carefully showed those two in my
every example.
>
> which results in:
>
> b(a) (original epxression)
>
> b(a[a]|) (argument expansion)
>
> a[a.b]| (argument substitution)
>
> a[a.b]| (rescanning)
>
> a| (fully expanded result)
>
>> #define a a|
>> #define b(x) x
>>
>> b(a)
>
> b(a)
> | [b]
> [argument expansion]
> | [b, a]
> a'|
> |
> [argument expansion done]
> | [b]
> a'|
>
> Which is completely same as what you get below. I guess that you are
> confusing here the permanently blue-painted tokens (which is indicated
> as ' following the token) with the disabled names (which is indicated
> as a set in []). This is the reason I carefully showed those two in my
> every example.
The [b, a] part is incorrect. b is not disabled when the arguments to b are
replaced and rescanned. That context doesn't exist until b's replacement list
is rescanned. E.g.
#define i(x) x
i(i(i(1))) // 1
Regards,
Paul Mensonides
Yes, you're right. I should have said "[a]" there. It was a mistake
from just copying and editing an example in my previous post where the
mistake started for a reason.
>> The [b, a] part is incorrect.
>
> Yes, you're right. I should have said "[a]" there. It was a mistake
> from just copying and editing an example in my previous post where the
> mistake started for a reason.
Okay, just making sure. :)
Regards,
Paul Mensonides
There's no clear distinction between your categories 2 and 3 -- both
simply explain the existing text. Whether the existing text is
sufficiently clear to be comprehensible to the average motiviated reader
is a matter of opinion. So, any such explanitory DR response should
also apply to a later version of the standard, provided it still
contains the wording in question; the committee did not feel that
revising the text was advisable. Category 4 responses actually change
the standard and should therefor be incorporated in future versions of
the standard; if that doesn't happen, it's almost certainly an oversight
on the committee's part.
-Larry Jones
It's hard to be religious when certain people are never
incinerated by bolts of lightning. -- Calvin
> I misunderstood what you mean with "considered". Yes, the locks are
> considered, but does nothing with the rescan of the remaining part of
> the replacement list in this case, which is the reason I said it's not
> "considered".
Agreed.
Now, let's get back to the first example.
#define m !(m)+n
#define n(n) n(m)
m(m)
It results in:
m(m)
!(m[m])+n[m](m)
!(m[m])+n[m](!(m[m])+n[m])
!(m[m])+!(m[n.m])+n[n.m](m[n]) or
!(m[m])+!(m[n.m])+n[n.m](m[m.n])
And that's unspecified which of the two cases is a subject for a further
expansion.
Did I understood you correctly?
--
ik
The standard says otherwise.
> The replacement processes are certainly not nested. You seem to have the idea
> that the macro expansion process is recursive. It is not. It is in-place
> iteration over pp-tokens.
The specification is recursive. Implementations might use
a well-known technique for converting tail recursion to
iteration, but care has to be taken at the boundary.
No it does not, and this is explicit. 6.10.3.4/1 does not say that subsequent
preprocessing tokens are "rescanned". It only says that scanning continues
directly from the rescan of a replacement list to the subsequent pp-tokens.
Whether or not it is called "scanning" or called "rescanning" depends only on
whether a disabling context exists. Further, *all* of the remaining pp-tokens
in the source file are included here, not just what is needed to "finish" a
nested expansion.
-----
If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens), it is not replaced. Furthermore, if any nested replacements encounter
the name of the macro being replaced, it is not replaced....
-----
The pp-tokens that follow a macro invocation are not part of what is called
"rescanning." (They may be part of an outer macro replacement's rescanning, of
course.) If at file level, what is "rescanning" becomes regular "scanning" when
scanning passes the end of the replacement list:
#define m() 2
1 m() 3
1 2 3
|_____||_____||_____|
| | |
* ** *
* scanning
** rescanning
[--
Note also that the term "nested", interpreted in the context of the paragraph in
which it is used, implies an invocation found during "this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens)". Given that contextual interpretation, which is the only
interpretation that can be deduced from the given text, the following cannot
constitute a nested invocation because it does not meet the absolute minimum
requirement for invocation within the replacement list itself as defined by
6.10.3/10:
#define A() B
#define B() A
A()()()
--]
>> The replacement processes are certainly not nested. You seem to
>> have the idea that the macro expansion process is recursive. It is
>> not. It is in-place iteration over pp-tokens.
>
> The specification is recursive. Implementations might use
> a well-known technique for converting tail recursion to
> iteration, but care has to be taken at the boundary.
Your conceptual idea of macro expansion is flawed. The macro expansion process
is not recursive, though in most contexts it can be viewed as if it were. A
macro invocation is replaced by the replacement list _before_ rescanning
occurs--which eliminates the possibility of a recursive specification:
-----
6.10.3/9
...defines an object-like macro that causes each subsequent instance of the
macro namee to be replaced by the replacement list of preprocessing tokens that
constitute the remainder of the directive.
6.10.3/10
.... Each subsequent instance of a function-like macro name followed by a ( as
the next preprocessing token introduces the sequence of preprocessing tokens
that is replaced by the replacement list in the definition (an invocation of the
macro).
-----
A macro invocation is replaced by the macro definition's replacement list--not
by the result of rescanning applied to the replacement list. The pure model is
non-recursive, it is iterative:
#define A() - B() -
#define B() 2
1 A() 3
"A()" is replaced with "- B() -" yielding
1 - B() - 3
|_______|
|
A
with a context that will cause "A" identifier pp-tokens to be disabled in the
marked range. Rescanning begins at the beginning of the marked range, causing
"B()" to be replaced by "2" yielding (ignore the extra whitespace here):
1 - 2 - 3
| |___| |
| | |
| B |
|_______|
|
A
with a context that will cause "B" identifier pp-tokens to be disabled in the
nested marked range while the A marked range remains because rescanning is still
in effect for A's replacement. Rescanning begins again at the beginning of the
B marked range (rescanning is a no-op here). When the end of the B marked range
is reached, the B context is removed, yielding
1 - 2 - 3
|_____|
|
A
And the rescanning of A's replacement list continues. When the end of the A
marked range is reached, the A context is removed, yielding
1 - 2 - 3
|
with normal top-level scanning continuing at the marked point. This is an
in-place iterative process.
(An implementation can of course use a recursive algorithm so long as it mimics
the exact behavior of the pure iterative model.)
Regards,
Paul Mensonides
No it does not, and this is explicit. 6.10.3.4/1 does not say that subsequent
preprocessing tokens are "rescanned". It only says that scanning continues
directly from the rescan of a replacement list to the subsequent pp-tokens.
Whether or not it is called "scanning" or called "rescanning" depends only on
whether a disabling context exists. Further, *all* of the remaining pp-tokens
in the source file are included here, not just what is needed to "finish" a
nested expansion.
-----
If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens), it is not replaced. Furthermore, if any nested replacements encounter
the name of the macro being replaced, it is not replaced....
-----
The pp-tokens that follow a macro invocation are not part of what is called
"rescanning." (They may be part of an outer macro replacement's rescanning, of
course.) If at file level, what is "rescanning" becomes regular "scanning" when
scanning passes the end of the replacement list:
#define m() 2
1 m() 3
1 2 3
|_____||_____||_____|
| | |
* ** *
* scanning
** rescanning
[--
Note also that the term "nested", interpreted in the context of the paragraph in
which it is used, implies an invocation found during "this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens)". Given that contextual interpretation, which is the only
interpretation that can be deduced from the given text, the following cannot
constitute a nested invocation because it does not meet the absolute minimum
requirement for invocation within the replacement list itself as defined by
6.10.3/10:
#define A() B
#define B() A
A()()()
--]
>> The replacement processes are certainly not nested. You seem to
>> have the idea that the macro expansion process is recursive. It is
>> not. It is in-place iteration over pp-tokens.
>
> The specification is recursive. Implementations might use
> a well-known technique for converting tail recursion to
> iteration, but care has to be taken at the boundary.
Your conceptual idea of macro expansion is flawed. The macro expansion process
No it does not, and this is explicit. 6.10.3.4/1 does not say that subsequent
preprocessing tokens are "rescanned". It only says that scanning continues
directly from the rescan of a replacement list to the subsequent pp-tokens.
Whether or not it is called "scanning" or called "rescanning" depends only on
whether a disabling context exists. Further, *all* of the remaining pp-tokens
in the source file are included here, not just what is needed to "finish" a
nested expansion.
-----
If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens), it is not replaced. Furthermore, if any nested replacements encounter
the name of the macro being replaced, it is not replaced....
-----
The pp-tokens that follow a macro invocation are not part of what is called
"rescanning." (They may be part of an outer macro replacement's rescanning, of
course.) If at file level, what is "rescanning" becomes regular "scanning" when
scanning passes the end of the replacement list:
#define m() 2
1 m() 3
1 2 3
|_____||_____||_____|
| | |
* ** *
* scanning
** rescanning
[--
Note also that the term "nested", interpreted in the context of the paragraph in
which it is used, implies an invocation found during "this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens)". Given that contextual interpretation, which is the only
interpretation that can be deduced from the given text, the following cannot
constitute a nested invocation because it does not meet the absolute minimum
requirement for invocation within the replacement list itself as defined by
6.10.3/10:
#define A() B
#define B() A
A()()()
--]
>> The replacement processes are certainly not nested. You seem to
>> have the idea that the macro expansion process is recursive. It is
>> not. It is in-place iteration over pp-tokens.
>
> The specification is recursive. Implementations might use
> a well-known technique for converting tail recursion to
> iteration, but care has to be taken at the boundary.
Your conceptual idea of macro expansion is flawed. The macro expansion process
No it does not, and this is explicit. 6.10.3.4/1 does not say that subsequent
preprocessing tokens are "rescanned". It only says that scanning continues
directly from the rescan of a replacement list to the subsequent pp-tokens.
Whether or not it is called "scanning" or called "rescanning" depends only on
whether a disabling context exists. Further, *all* of the remaining pp-tokens
in the source file are included here, not just what is needed to "finish" a
nested expansion.
-----
If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens), it is not replaced. Furthermore, if any nested replacements encounter
the name of the macro being replaced, it is not replaced....
-----
The pp-tokens that follow a macro invocation are not part of what is called
"rescanning." (They may be part of an outer macro replacement's rescanning, of
course.) If at file level, what is "rescanning" becomes regular "scanning" when
scanning passes the end of the replacement list:
#define m() 2
1 m() 3
1 2 3
|_____||_____||_____|
| | |
* ** *
* scanning
** rescanning
[--
Note also that the term "nested", interpreted in the context of the paragraph in
which it is used, implies an invocation found during "this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens)". Given that contextual interpretation, which is the only
interpretation that can be deduced from the given text, the following cannot
constitute a nested invocation because it does not meet the absolute minimum
requirement for invocation within the replacement list itself as defined by
6.10.3/10:
#define A() B
#define B() A
A()()()
--]
>> The replacement processes are certainly not nested. You seem to
>> have the idea that the macro expansion process is recursive. It is
>> not. It is in-place iteration over pp-tokens.
>
> The specification is recursive. Implementations might use
> a well-known technique for converting tail recursion to
> iteration, but care has to be taken at the boundary.
Your conceptual idea of macro expansion is flawed. The macro expansion process
No it does not, and this is explicit. 6.10.3.4/1 does not say that subsequent
preprocessing tokens are "rescanned". It only says that scanning continues
directly from the rescan of a replacement list to the subsequent pp-tokens.
Whether or not it is called "scanning" or called "rescanning" depends only on
whether a disabling context exists. Further, *all* of the remaining pp-tokens
in the source file are included here, not just what is needed to "finish" a
nested expansion.
-----
If the name of the macro being replaced is found during this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens), it is not replaced. Furthermore, if any nested replacements encounter
the name of the macro being replaced, it is not replaced....
-----
The pp-tokens that follow a macro invocation are not part of what is called
"rescanning." (They may be part of an outer macro replacement's rescanning, of
course.) If at file level, what is "rescanning" becomes regular "scanning" when
scanning passes the end of the replacement list:
#define m() 2
1 m() 3
1 2 3
|_____||_____||_____|
| | |
* ** *
* scanning
** rescanning
[--
Note also that the term "nested", interpreted in the context of the paragraph in
which it is used, implies an invocation found during "this scan of the
replacement list (not including the rest of the source file's preprocessing
tokens)". Given that contextual interpretation, which is the only
interpretation that can be deduced from the given text, the following cannot
constitute a nested invocation because it does not meet the absolute minimum
requirement for invocation within the replacement list itself as defined by
6.10.3/10:
#define A() B
#define B() A
A()()()
--]
>> The replacement processes are certainly not nested. You seem to
>> have the idea that the macro expansion process is recursive. It is
>> not. It is in-place iteration over pp-tokens.
>
> The specification is recursive. Implementations might use
> a well-known technique for converting tail recursion to
> iteration, but care has to be taken at the boundary.
Your conceptual idea of macro expansion is flawed. The macro expansion process
Regards,
Paul Mensonides
A preprocessor gives "a a" for "scan(a)". The expansion process I
first thought was:
#1
scan(a)
| [scan]
(argument expansion)
|
a
| [a]
b lp ) a'
| [a, lp]
b ( ) a'
|
(argument expansion done)
| [scan]
b ( ) a' (rescan)
| [scan, b]
a a'
| [scan, b, a]
b' lp ) a' a'
| [scan, b, a, lp]
b' ( ) a' a'
This case is somewhat different from the other examples we've seen,
because the tokens ("b() a") from the argument expansion are still
available for the further replacement if rescanned. But, the standard
says:
6.10.3.1p1:
Before being substituted, each argument’s preprocessing tokens
are completely macro replaced as if they formed the rest of the
preprocessing file; no other preprocessing tokens are available.
which implies that the following is corret.
#2
scan(a)
| [scan]
(argument expansion)
|
a
| [a]
b lp ) a'
| [a, lp]
b ( ) a' (rescan to form the rest of the pp-file)
| [a, lp, b]
a' a'
|
(argument expansion done)
| [scan]
a' a'
Or even if the rescan doesn't occur during the argument expansion as
in the above, the set [a, lp] (used during the argument expansion) can
be effective for the tokens from the argument expansion:
#3
scan(a)
| [scan]
(argument expansion)
|
a
| [a]
b lp ) a'
| [a, lp]
b ( ) a'
|
(argument expansion done)
| [scan] and {a, lp}
b ( ) a' (rescan)
| [scan, b], and {a, lp}
a' a'
Note that the disabling set {a, lp} only for the tokens resulting from
the argument expansion was not needed (or could be ignored) in the
previous examples, since in those examples the tokens from the
argument expansion were not available for the further replacement
during recanning anyway.
Among the three expansion processes above, what's the correct behavior
according to the standard?
No, it says the replacement buffer is rescanned *with* subsequent
tokens of the source file. "With" means inclusion.
The correct result is: b' ( ) a' a'
> Among the three expansion processes above, what's the correct behavior
> according to the standard?
None. :) The argument is replaced and rescanned (as if it constituted all the
available tokens) before it is inserted into the replacement list. The
replacement list is then rescanned separately later. All that is true.
However, the fundamental reason that result is incorrect is that
scanning/rescanning is left-to-right, and it does not repeatedly scan/rescan
over and over until it makes an entire pass of a file with no replacements. It
gets exactly one macro expansion scan, unless the same tokens are put into
multiple scanning contexts (such as use as an argument and in the replacement
list, which causes two.) E.g. the following:
#define lp (
#define macro() 1
macro lp )
Does not expand to 1. It expands to "macro ( )". VC makes this mistake during
rescanning in some contexts, for example:
#define lp (
#define macro() 1
#define other macro lp )
macro lp ) // macro ( )
other // macro ( )
VC expands the first one correctly and the second one incorrectly. Both should
expand to "macro ( )".
During the rescanning of a's replacement list above (the scan(a) example), a
left parenthesis pp-token is not the next token following b, so b is not an
invocation (yet). This causes the initial replacement of a to result in:
b ( ) a'
Use as an argument to "scan" causes a secondary rescan when the replacement list
of "scan" is rescanned. That secondary scan *will* pick up the "b ( )"
invocation. That invocation is no longer occurring during the rescanning of the
replacement list of a. Consequently, the "b ( )" expands to "a" which expands
to "b' ( ) a'". Ultimately, yielding a fully disabled and final result of b'
( ) a'--regardless if more scanning is added by outer context.
#define lp (
#define a b lp ) a
#define b() a
#define scan(x) 1 x 2
scan(a) // original
|
b lp ) a // replacement of a
|^ | // rescanning...
|________|
|
a // disabling context of a
b ( ) a // replacement of lp
| |^ | | // rescanning...
| |__| |
| | |
| lp | // disabling context of lp
|________|
|
a // disabling context of a
b ( ) a' // a found in disabling context and painted
^
b ( ) a' // full result of replacement/rescanning of a
|
scan( )
1 b ( ) a' 2 // replacement of scan(...)
|^ | // rescanning...
|____________|
|
scan // disabling context of scan
1 a a' 2 // replacement of b()
| | ^ | | // rescanning...
| |___| |
| | |
| b | // disabling context of b
|__________|
|
scan // disabling context of scan
1 b lp ) a a' 2 // replacement of a
| ||^ || | // rescanning...
| ||________|| |
| | | | |
| | a | | // disabling context of a
| |__________| |
| | |
| b | // disabling context of b
|________________|
|
scan // disabling context of scan
1 b' lp ) a a' 2 // b found in disabling context and painted
^
1 b' ( ) a a' 2 // replacement of lp
| || |^ | || | // rescanning
| || |__| || |
| || | || |
| || lp || | // disabling context of lp
| ||_________|| |
| | | | |
| | a | | // disabling context of a
| |___________| |
| | |
| b | // disabling context of b
|__________________|
|
scan // disabling context of scan
1 b' ( ) a' a' 2 // a found in disabling context and painted
^
result:
1 b' ( ) a' a' 2
EDG gets this wrong, for example, because it doesn't remove the disabling
contexts on scans other than the initial rescan, though the standard says
(emphasis mine):
"If the name of the macro being replaced is found during **this** scan of the
replacement list..."
I.e. the context that causes a specific pp-token to be painted exists only
during the rescan of its replacement list; any pp-tokens that get painted remain
so--even outside the original context in which they were painted
"These nonreplaced macro name preprocessing tokens are no longer a available for
further replacement even if they are later (re)examined in contexts in which
that macro name preprocessing token would otherwise have been replaced."
(That other context, BTW, is created by argument replacement/rescanning followed
by rescanning of the replacement list of the macro invocation to which the
argument is a parameter. As in "scan" above.)
Regards,
Paul Mensonides
> No, it says the replacement buffer is rescanned *with* subsequent
> tokens of the source file. "With" means inclusion.
What are debating exactly? The name of an operation? "Scanning" and
"rescanning" are only terms that depend on whether or not some pp-tokens have
been scanned before. If that's all your arguing, which makes no semantic
difference, then I'll concede this point. If you are arguing that the inclusion
mentioned above means a recursive specification, then you are wrong, as cited in
the previous post of this sub-thread. The preprocessing of a file does not do a
recursive step every time that a macro gets invoked, which is what you are
saying. Further, you're using that prove what constitutes "nested", in which
case, the entire remainder of the file is "nested." That is incorrect.
Regards,
Paul Mensonides
Apologies, the first version is almost exactly right, it just skips some steps.
Regards,
Paul Mensonides
It expands to either
!(m')+!(m')+n'(!(m')+n')
or
!(m')+m'(m')
depending on whether the n at the end of the replacement of m is considered
nested in the first invocation. I believe that it does, others believe that it
is unspecified whether it does or does.
Regards,
Paul Mensonides
Distinguishing which DRs fall into those two categories is a matter of
opinion. However, having made that distinction, there's a major
difference between them in terms of what needs to be done. Category 3
DRs call for a re-write of the relevant text to make the intent
clearer; that's not called for by category 2.
Why not? It obviously isn't part of an expansion of m -- it's
in the file-level text! Any "intent" by the committee that m be
treated as part of an expansion of m is technical doo-doo.
Sorry for the delay. Due to a problem of my news reader, I couldn't
reply this post.
>
> > Then, now we have the interpretation problem on
> > what the term "nested" means in the above context. The committee
> > decided, a long time ago, to leave it as unspecified and made various
> > interpretations possible depending on implementation details. The
> > interpretation you are claiming to be right,
>
> Are you saying, that because the term "nested" is not explicitly defined by the
> standard itself, that the term has no meaning? Or has meaning only as a general
> notion?
The ambiguity of the wording matters only for the cases we are
discussing. Other normal invocations cause no problem because it's
clear whether a given macro invocation is nested or not.
[...]
> It is the only interpretation that is not contrary to the details of
> function-like macro invocation presented elsewhere--which you've summarized in
> the above two points. The fundamental reason that this is the case is that
> macro expansion is not defined by the standard as a recursive process.
I don't think that "recursive process" or "in-place interation" is
very clear only from the text of the standard.
[...]
>
> #define A() B
> #define B() A
>
> A () ()
>
> B ()
> ^
>
> In order for the macro B to be considered invoked in any way, the preprocessor
> can no longer be rescanning the replacement list of A which terminated at the
> circumflex according to the most minimal possible definition of "invocation"
> which is "MACRO(". This is the only logical result. Further, from the other
> example:
>
> #define A() B(
> #define B(x) x
> #define C() // ...
>
> A() C() )
>
> The absolute most that can be said is that "B(" constitutes a nested invocation,
> so "A" would be disabled when B gets rescanned.
OTOH in both cases the full invocation of B() is made up with tokens
from the replacement list and the remaining part of the source file,
where the "nested or not" problem occurs.
> >
> > That's what you want, not what is always right. If there is an
> > inconsistency between them, a DR to revise the normative text in
> > accordance with the intent should be in order, not a DR to revise the
> > intent in accordance with the text.
>
> It isn't what I necessarily want. It is a fact. DRs have no relevance unless
> they impact future iterations of the standard. Before that future standard is
> released, the normative text of the standard remains fixed and binding as
> required by the standards bodies for the purpose of stabilization.
But this is not a good attitude to understand the standard. If the
normative text isn't very clear, then a DR is proper. When we have a
committee's answer to explain the correct or intended interpretation
in the DR, it gives no gain to stick the normative text ignoring the
DR, even if the DR itself is informative.
> > And I've never seen that the behavior on which the codes in question
> > depend is used for a good purpose, as I recall, which was a reason
> > that the committee deicded not to specify it.
>
> Wow, that's quite a judgement given that some very popular and respected
> libraries are implemented using this behavior in order to avoid a massive
> increase in code size and consequently a massive increase in maintainence
> points. You just tossed some major components of Boost for example as well as
> many other libraries and programs that use the same techniques that Boost uses
> in both C and C++.
The fact that many implementations and programs depend on the behavior
can affect the next revision of the standard, but not change the
current (or previous) state of the standard.
>
> ...simply because you haven't seen code that uses this behavior to good purpose
> and making a naive and uneducated assumption based purely on that fact. I'm
> also guessing that you haven't even tried, and, lacking expertise in the
> specific field, you have no way of knowing whether it can or cannot be used for
> good purpose. Frankly, I find that implication of poor design offensive. What
> I have given you are specific examples of existing libraries that use this
> behavior to increase functionality, decrease code size, and decrease the number
> of maintenance points, and that list continues.
It's not that I'm claiming the behavior on which those programs depend
is useless just because I've never seen a case where it's used for
good purpose. What I'm saying is that, as I recall, one of the reasons
to leave it as unspecified was to discourage its use, so you need
convince the committee with a hard evidence to show that it's already
used to reasonably good purpose, especially when the future revision
the committee has in mind is quite different from yours.
> >
> > Unfortunately, according to Doug, the supposed revision the committee
> > has/had in their mind is not the same as what you want. I don't think
> > that the problem is quite easy; if the committee wants to specify any
> > piece of the behavior, they should take account of almost all possible
> > cases which were left as (explicitly or implicitly) unspecified after
> > C89, in order to define the exact meaning of "nested".
>
> Not necessarily. The number of cases that rely on A()() constituting a nested
> invocation are quite rare to the point of being nil. Of course, that is my
> personal observation because that behavior has no utility in practice.
>
If the revised wording is not enough solid to cover almost all
possible odd cases, one could submit a DR to request the intended
interpretation, which makes the same situation as this again.
> Sorry for the delay. Due to a problem of my news reader, I couldn't
> reply this post.
No problem.
> [...]
>> It is the only interpretation that is not contrary to the details of
>> function-like macro invocation presented elsewhere--which you've
>> summarized in
>> the above two points. The fundamental reason that this is the case
>> is that
>> macro expansion is not defined by the standard as a recursive
>> process.
>
> I don't think that "recursive process" or "in-place interation" is
> very clear only from the text of the standard.
It is clear, and it is absolutely unarguable. A macro invocation is literally
replaced by the replacement list of the of macro before rescanning occurs. This
defines an iterative model, not a recursive model. With macros, there is no
"procedural hierarchy"; further, the macro expansion mechanism is not a
procedural language and such a viewpoint (though it normally results in the same
thing) runs into fundamental flaws when discussing detailed technical border
cases. The only thing that could come even close to being called recursive in
the specification is the independent replacement and rescanning or the arguments
to macro. An implementation can of course implement the macro expansion process
any way it wants so long as it simulates the iterative model exactly.
>> The absolute most that can be said is that "B(" constitutes a nested
>> invocation,
>> so "A" would be disabled when B gets rescanned.
>
> OTOH in both cases the full invocation of B() is made up with tokens
> from the replacement list and the remaining part of the source file,
> where the "nested or not" problem occurs.
The point is that the preprocessor should never find the left parenthesis--which
constitutes the minimum required for invocation--until after the disabling
context that exists during the rescan of the replacement list is gone.
>> It isn't what I necessarily want. It is a fact. DRs have no
>> relevance unless
>> they impact future iterations of the standard. Before that future
>> standard is
>> released, the normative text of the standard remains fixed and
>> binding as
>> required by the standards bodies for the purpose of stabilization.
>
> But this is not a good attitude to understand the standard. If the
> normative text isn't very clear, then a DR is proper. When we have a
> committee's answer to explain the correct or intended interpretation
> in the DR, it gives no gain to stick the normative text ignoring the
> DR, even if the DR itself is informative.
It is a perfectly good attitude to have when 1) a C++ DR says the opposite and
cites the original psuedo-code from which the text of the standard was derived
and 2) a great deal of existing code requires this behavior. If it goes back to
original intent, then I'm right. If it goes back to the C DR and subsequent
non-normative text in C99, then it is unspecified (if you go only by intent).
If you go by the even-more-current C++ DR, which references the original intent,
then I am right again. Ultimately, it comes down to this simple idea: the
normative text of the standard is absolute, because "intent" changes. What you
are saying is that the current "intent" effectively defines the language. This
is an extremely unstable foundation on which to write code.
> The fact that many implementations and programs depend on the behavior
> can affect the next revision of the standard, but not change the
> current (or previous) state of the standard.
I'm not suggesting that it should. I am saying that the normative text and
original intent already favors my point-of-view. It is only the C committee's
divergence from original intent and a non-normative section of text that does
not.
The point is that intent has obviously changed, and if I was to go by intent
alone, I would have absolutely no stability, because "intent" is often
interpreted from the normative text--rather than being that actual intent which
produced the normative text. This causes the intent itself to become
unreliable--unless you have the original intent clearly specified (by
psuedo-code in this case).
Regards,
Paul Mensonides
Clearly not, or there would be no blue-paint issue.
Rescanning occurs *within* the procedural scope of
the replacement process, thus any replacement that
occurs at that point is recursive.
There is no procedural scope. The only place that you're getting that from is
the normal semantics of function calls--which are completely different than
macro expansion. Macros are not "called" as if they were functions; they are
replaced and subsequently rescanned. There is only a *context* that dictates
that corresponding pp-tokens should be painted. A macro invocation can be
considered "nested" within a disabling context (and obviously is in the
non-spanning case) which is why we have the blue-paint issue, but not recursive.
This is a fundamental and well-defined fact: replacement occurs prior to
rescanning in the pure model. That, by definition, is not a recursive or even
procedural model. A recursive (procedural) model would exist only if the
macro's replacement was rescanned prior to "returning." In short, a procedural
model follows a totally different form (invoke -> execute -> return) from macro
expansion (invoke -> return -> execute). The specification *is* a flat
(excluding argument replacement/rescanning), iterative model.
This fact indirectly relates to a end-of-replacement-list spanning invocation
because there is no independent rescanning of a replacement list (except, of
course, with macro arguments) that at times needs to "look outside" to get more
preprocessing tokens before finally "returning." Rather, the preprocessing
tokens are *already there* because the macro has *already returned*, and the
disabling context is defined only by a point between two pp-tokens at which
subsequent pp-tokens of the respective name are not painted. There is no
distinction between spatial nesting and semantic nesting. The relevance of this
to the replacement-boundary-spanning invocations is only that the preprocessor
*must* move beyond the boundary in order to invoke a macro if the replacement
list terminates only in the identifier preprocessing token. This is not true
for the partial invocation case such as... #define A() B( ...because the
standard makes it clear case that the sequence... IDENTIFIER( ...introduces an
invocation (though not the entire invocation--just the minimum).
[Note--in situations like this:
#define A(m) m(
#define B(x) x
A(B) A() ))
To expand to () because the second invocation of A is completely outside of the
replacement list of A and because whether B is expanded within a context that
disables A is irrelevant.--]
Regards,
Paul Mensonides
No, I was using ordinary computer-science vernacular.
> for the partial invocation case such as... #define A() B( ...because the
> standard makes it clear case that the sequence... IDENTIFIER( ...introduces an
> invocation (though not the entire invocation--just the minimum). ...
We clearly disagree about the "model" as well as details.
Since it is getting repetitive, I'll refrain from posting
essentially the same explanation over again. I felt
obliged to respond when you stated your model as the
"clear intent" of the standard, which it is not. If you
want a definitive resolution, have a national body submit
a DR on the issue; it takes time but eventually we have
to provide an official respose.
The scanning of the remaining source file is described where the
rescanning of the tokens replaced is described, which provides base
for an implementer to interpret it as to mean the scanning of the
tokens (following the replaced tokens) should belong to the expansion
process of the macro (which produces the replaced tokens). Even if he
implements the macro expansion process in his preprocessor using
in-place iteration, there still can be a procedural hierarchy
*conceptually*.
> >
> > But this is not a good attitude to understand the standard. If the
> > normative text isn't very clear, then a DR is proper. When we have a
> > committee's answer to explain the correct or intended interpretation
> > in the DR, it gives no gain to stick the normative text ignoring the
> > DR, even if the DR itself is informative.
>
> It is a perfectly good attitude to have when 1) a C++ DR says the opposite and
> cites the original psuedo-code from which the text of the standard was derived
> and
It's irrelevant officially to interpreting the C standard, especially
when the C committee never reviewed the answer to the C++ DR.
> 2) a great deal of existing code requires this behavior.
This can be a basis to claim that the future standard should be
revised as you want, but not a basis to claim that the current
interpretation of the standard is broken.
> If it goes back to
> original intent, then I'm right. If it goes back to the C DR and subsequent
> non-normative text in C99, then it is unspecified (if you go only by intent).
It doesn't matter what the original intent was, when having the
*current* intent of the committee.
> If you go by the even-more-current C++ DR, which references the original intent,
> then I am right again.
It's certainly unfortunate that the answer to the C++ DR wasn't
reviewed by the C committee, even if there is a note in the DR to say
that the answer should be reviewed by the C committee.
>
> > The fact that many implementations and programs depend on the behavior
> > can affect the next revision of the standard, but not change the
> > current (or previous) state of the standard.
>
> I'm not suggesting that it should. I am saying that the normative text
If your interpretation is correct.
> and
> original intent
Which is irrelevant as I said above.
> already favors my point-of-view. It is only the C committee's
> divergence from original intent
Which is most important in this context.
> and a non-normative section of text that does
> not.
Which just reflects the C committee's intent.
>
> The point is that intent has obviously changed, and if I was to go by intent
> alone, I would have absolutely no stability, because "intent" is often
> interpreted from the normative text--rather than being that actual intent which
> produced the normative text. This causes the intent itself to become
> unreliable--unless you have the original intent clearly specified (by
> psuedo-code in this case).
>
The C committee has never said that any other intent than one provided
in the C DR is valid. The C committee has never said that the C++
committee's answer to the C++ DR is valid. Why do you depend on them
in interpreting the C standard, not the C++ standard?