C99 has the following in its informative part (J.1 Unspecified
behavior):
"When a fully expanded macro replacement list contains a function-like
macro name as its last pp-token and the next pp-token from the source
file is a (, and the fully expanded replacement of that macro ends
with the name of the first macro and the next pp-token from the
source file is again a (, whether that is considered a nested
replacement (6.10.3)."
Of course, this is only informative, so it can be forgiven even if it
doesn't cover the possible all cases; you can see more possible cases
that can cause the similar problem in
http://groups.google.com/groups?as_umsgid=8s45d8%2453h%2...@nnrp1.deja.com&lr
=&hl=en
From han.comp.lang.c, I got another case:
#define TESTCONCAT(a, b) a ## b
#define RECURSION() TESTCONCAT
#define CC RECURSION()(1,0)
#define DD RECURSION()(C,C)
DD
A version of GCC expands this as follows:
DD
RECURSION()(C,C)
TESTCONCAT(C,C)
CC
RECURSION()(1,0)
TESTCONCAT(1,0)
[Notice that the remaining part which constitute the whole invocation
of the function-like macro TESTCONCAT() is from the replacement list
of the previous macro invocation, not from the source file text, and
that the connection of the two macro invocations in question is
indirect by the token-pasting operator ##.]
My conclusion of this is, the result from the invocation should be
considered unspecified behavior too, and the cited wording of the
Standard can be elaborated to cover more possible cases if we replace
"source file" and the connection of two macro invocations in question
with more inclusive wording; fixing the wording of the Standard is not
my main concern since it's just informative part so we need not try
to cover all "incorrect" cases.
What do you think about my conclusion?
Thanks.
--
Jun Woong (myco...@hanmail.net)
Dept. of Physics, Univ. of Seoul
>
> [Notice that the remaining part which constitute the whole invocation
> of the function-like macro TESTCONCAT() is from the replacement list of
> the previous macro invocation, not from the source file text, and that
> the connection of the two macro invocations in question is indirect by
> the token-pasting operator ##.]
>
> My conclusion of this is, the result from the invocation should be
> considered unspecified behavior too, and the cited wording of the
> Standard can be elaborated to cover more possible cases if we replace
> "source file" and the connection of two macro invocations in question
> with more inclusive wording; fixing the wording of the Standard is not
> my main concern since it's just informative part so we need not try to
> cover all "incorrect" cases.
>
> What do you think about my conclusion?
I think you haven't justified why it is unspecified behaviour.
I think it is well-defined. Which compilers don't handle it
the same way as GCC? I'd be tempted to say they're buggy.
Neil.
[From my original question]
#define TESTCONCAT(a, b) a ## b
#define RECURSION() TESTCONCAT
#define CC RECURSION()(1,0)
#define DD RECURSION()(C,C)
DD
is expanded as follows:
1: DD
2: RECURSION()(C,C)
3: TESTCONCAT(C,C)
4: CC
5: RECURSION()(1,0)
6: TESTCONCAT(1,0)
[Line numbers are added by me.]
If it has well-defined behavior, the expansion between 5 and 6 implies
that the second expansion of RECURSION() shall NOT be regarded as
nested. Which line in the expansion sequence removes RECURSION from
the checking buffer for nest expansion? How can I get the
interpretation from the Standard?
The reason I think this example as unspecified is, I don't think that
a conforming implementation is not required to deal the first
expansion of TESTCONCAT along with following (C,C) as start of a "new"
expansion as examples in Rationale. I don't see any requirement to do
so in the Standard.
If I am missing something, please let me know.
Thanks in advnace.
No text has been taken from outside of the TESTCONCAT invocation, so it is
to any reasonable interpretation entirely self-contained and "still in
progress", i.e. nested.
However, I refer you to an amendment to the C++ 98 description of macro
expansion owing to a DR that, I think, probably covers this case.
The DR involved a similar macro expansion question, and in the
long process of resolving it, Dave Prosser made available (to me at
least) the original pseudo-code algorithm that the relevant members
of the C89 committee and he originally had agreed to as the "standard"
macro expansion algorithm, and that they tried to translate into
"standardese" English.
The algorithm is unambiguous, and GCC >= 3.1 implements it perfectly
(3.0 implements it 99.999% perfectly, you have to be pretty good to
find a difference 8-)). This algorithm involved the concept of per-token
"hide sets" FWIW, though GCC does not use this concept in its
implementation.
I believe LCC does use this concept in its implementation, and it gives
the same answers in the corner-cases I've checked.
Anyway, why complicate the standard? I ask you again - which compilers
don't give the answer that GCC gives? If the answer is the null set,
and I suspect it is, then what's the point?
Neil.
I perfectly agree with you on this point. But you don't answer what I'd
like to know yet: what's the rationale for that the second invocation of
RECURSION() [5-6] shall be regarded as a NEW expansion, even if there is
the prior invocation of RECURSION() [2-3] which affects the second
invocation in part? At which line does the new expansion process start?
And why?
>
> However, I refer you to an amendment to the C++ 98 description of macro
> expansion owing to a DR that, I think, probably covers this case.
Does C99 (or one of C90 CORs) reflect the C++98's description?
How can I find the DR?
[...]
>
> The algorithm is unambiguous, and GCC >= 3.1 implements it perfectly
> (3.0 implements it 99.999% perfectly, you have to be pretty good to
> find a difference 8-)).
I don't have C99 Rationale handy. Does C99 define the resultant
expansion of the example in C90 Rationale now? If not, one (well-
implemented) implementation can't perfectly specify the meaning of
the Standard in this case, I'm afraid.
> This algorithm involved the concept of per-token
> "hide sets" FWIW, though GCC does not use this concept in its
> implementation.
>
> I believe LCC does use this concept in its implementation, and it gives
> the same answers in the corner-cases I've checked.
>
> Anyway, why complicate the standard? I ask you again - which compilers
> don't give the answer that GCC gives? If the answer is the null set,
> and I suspect it is, then what's the point?
Yes, the answer is null set at least on implementations on which I can
test this problem. But I want to hear that my conclusion is correct, or
if it's not (i.e., my example has well-defined behavior, not unspecified
one) I want to hear the correct "interpretation" focusing on the two
invocations of RECURSION(). It seems counter-intuitive to me that the
example is well-defined in comparison with the example in Rationale. I'm
not complicating the Standard, I just want to understand it.
Thanks in advance.
>> > #define TESTCONCAT(a, b) a ## b
>> > #define RECURSION() TESTCONCAT
>> > #define CC RECURSION()(1,0)
>> > #define DD RECURSION()(C,C)
>> >
>> > DD
>> >
>> > is expanded as follows:
>> >
>> > 1: DD
>> > 2: RECURSION()(C,C)
>> > 3: TESTCONCAT(C,C)
>> > 4: CC
>> > 5: RECURSION()(1,0)
>> > 6: TESTCONCAT(1,0)
> Does C99 (or one of C90 CORs) reflect the C++98's description? How can I
> find the DR?
C++ has the same as C90 and C99; but they deemed it not clear enough,
hence the DR suggested resolution
http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/cwg_active.html
#268.
> But you don't answer what I'd
> like to know yet: what's the rationale for that the second invocation of
> RECURSION() [5-6] shall be regarded as a NEW expansion
IMO, because to get there you've had to take the (C, C) from 2:
and therefore completely left and finished the expansion of RECURSION().
There are no tokens left from it, and you've taken tokens from beyond it.
FWIW the bit that the appendix says is undefined is a new macro
expansion according to the hideset rules.
Neil.
Thanks.
I think it's useful to simplify the problem for this discussion.
[original example]
#define TESTCONCAT(a, b) a ## b
#define RECURSION() TESTCONCAT
#define CC RECURSION()(1,0)
#define DD RECURSION()(C,C)
DD
I simplify this example as follows:
#define R() T
#define T() R()()
R()()
The only differences from the original example are that the tokens
(the argument list) in question come from the source file, not the
replacement list of the previous macro expansion and that there is no
intervening token pasting operation.
As the result of the original example is TESTCONCAT(1,0), the result
of the simplified one is also T() on GCC with the following expansion
sequence:
R()()
T()
R()()
T()
Let's compare this with the example in C99 (and C90) Rationale:
#define f(a) a*g
#define g(a) f(a)
The resulting expansion of f(2)(9) is unspecified (either 2*f(9) or
2*9*g), which is explicitly specified by the Rationale. This example
also can be simplified with no change of its unspecified behavior:
#define f() g
#define g() f()
The resultant expansion of f()() is unspecified (either f() or g). And
there is no change of the result despite adding ():
#define f() g
#define g() f()()
Still, the resultant expansion of f()() is unspecified. Now my
"simplified" example and the Rationale's one are same! If we say that
the (not simplified) original example is well-defined, it's the two
differences mentioned above that make it well-defined, which seems
very unreasonable to me! How can I understand this situation?
My original example and the example in the C++98 DR seem essentially
identical. So I think, if the Committee accepts the suggested
resolution in the C++98 DR, then the example of the Rationale must be
removed (or revised to say that it's well-defined). Or if the
Committee accepts the resolution but still regards the Rationale's
example as unspecified, it's strange that the token from the previous
expansion ((C,C) in the original example) *must* start a new expansion
sequence while the token from the source file ((9) in Rationale's
example) just *can* starts a new sequence. Is this logic incorrect?
Anyway, did the WG14 Committee accept the suggested resolution
officially (or at least have a plan to do)?
>
> > But you don't answer what I'd
> > like to know yet: what's the rationale for that the second invocation of
> > RECURSION() [5-6] shall be regarded as a NEW expansion
>
> IMO, because to get there you've had to take the (C, C) from 2:
> and therefore completely left and finished the expansion of RECURSION().
> There are no tokens left from it, and you've taken tokens from beyond it.
This exactly applies to the Rationale's example, too.
>
> FWIW the bit that the appendix says is undefined is a new macro
> expansion according to the hideset rules.
>
Could you elaborate on what you say with "a new macro expansion
according to the hideset rules"?
There is a general principle in the C standard that the
description of a transformation process always pertains
to one specific instance (grammar match), and if there
is another invocation of the same rule during that
process, the nested instance is processed entirely
within the context where it occurred. The one exception
to this is that pp token substitution occurs in a context
where any identifier matching an identifier currently
being replaced (in any context) is permanently marked
("painted blue") and *never* substituted.
In C89, we deliberately left a "grey area" concerning
picking up a "(" token from just beyond the replacement
buffer, since there were reasonable implementations both
ways and we didn't care to encourage such programming
by making this usage have well-defined behavior.
As I recall, for C99 an argument was made and accepted
that the behavior should be defined, and C99 requires
that the subsequent tokens be seen *during the rescan*,
i.e. still within the context of the original replacement
process. This means that a replacement context might
be "stretched" by acquiring more pp-tokens behind the
original span of replacement.
In the example:
> > #define TESTCONCAT(a, b) a ## b
> > #define RECURSION() TESTCONCAT
> > #define CC RECURSION()(1,0)
> > #define DD RECURSION()(C,C)
> > DD
> > ... expanded as follows:
> > 1: DD
> > 2: RECURSION()(C,C)
> > 3: TESTCONCAT(C,C) // C99 says (c,C) is picked up
> > 4: CC
> > 5: RECURSION()(1,0)
> > 6: TESTCONCAT(1,0) // WRONG, for C99 at least
RECURSION() is *not* expanded from line 5 to 6 during
the rescan within the context of the replacement from
4 to 5, because the context from 2 to 3 is still active
(replacement is not finished until there are no more
rescan replacements, not even by picking up additional
tokens); blue paint is applied to that "RECURSION" and
hence it (with its apparent but not actual arguments)
will never be replaced. This is somewhat strange in
that the (1,0) are then not macro arguments but just
pp-tokens, and if they had happened to contain
expandable macro invocations then those would be
processed (replaced) after the DD process is complete.
At that point, nothing within the pp-tokens resulting
from the full DD-expansion would be further examined
(during preprocessing), let alone the blue-painted
"RECURSION".
I will say that I don't think anybody who is really
"into" macro processing much likes the way this is done.
Standard C's rules are constrained by compatibility with
historical practice (especially the Reiser cpp) and have
been only partly cleaned up, in order to disrupt only a
minimal amount of existing code. I will further say
that it is not good industrial coding practice to make
a C program depend on details of token aggregation of
the kind under discussion, even when the official rules
are clear.
I don't think it was entered into the WG14 agenda, although it
ought to be. (I thought there was an agreement that the C part
of C++ would be interpreted by WG14; this is especially needed
for preprocessing issues since so many C++ gurus disparage use
of the preprocessing facility and hence would not be expected
to be as concerned about maximizing its usefulness.)
But C99 Rationale seems to keep the viewpoint that the example has
unspecified result. Whether it will be defined officially as a nested
expansion or a start of a new expansion, the Committee will have to
change the wording in Rationale, if they want to define the behavior
at all.
And if you remember correctly and C99 Rationale (or C99 itself)
reflects it, then gcc and some implementations which claim conformance
to C99 would have to revise their preprocessor. Of course, this would
also make the C++ Committee reject the suggested resolution as they
said.
>
[snip the example and kind explanation]
Thanks for kind explanation.
>
> I will say that I don't think anybody who is really
> "into" macro processing much likes the way this is done.
> Standard C's rules are constrained by compatibility with
> historical practice (especially the Reiser cpp) and have
> been only partly cleaned up, in order to disrupt only a
> minimal amount of existing code. I will further say
> that it is not good industrial coding practice to make
> a C program depend on details of token aggregation of
> the kind under discussion, even when the official rules
> are clear.
I absolutely agree with you on this point. I got the example from some
project realted to C++ preprocessing. If they are depending on the
tricky expansion, they are walking on the thin ice.
Thanks.
The Rationale did not manage to track every change that was
made since C89; we're still revising it to make it better.
We probably should have had a policy that no change would
be accepted without corresponding Rationale wording; this
was attempted at times but not consistently.
Yes, the corresponding change of Rationale is sometimes very important
because the only reading of the Standard doesn't make things very
clear. DRs or Rationale should contains wording for the changes since
C89 especially when they occurs without modification of the Standard
itself.
Thanks.
--
Jun, Woong (myco...@hanmail.net)