I'm looking at define directives in winnt.h and notice some constants
are defined in brackets and have an upper case 'L' at the end
#define DELETE (0x00010000L)
#define READ_CONTROL (0x00020000L)
#define WRITE_DAC (0x00040000L)
#define WRITE_OWNER (0x00080000L)
#define SYNCHRONIZE (0x00100000L)
while others don't have the 'L' or the brackets
#define CONTEXT_i386 0x00010000 // this assumes that i386 and
#define CONTEXT_i486 0x00010000 // i486 have identical context
What's the deal?
--
Gerry Hickman (London UK)
The 'L' suffix means that the literal is `long int' as opposite to
`int'. However, both types have the same binary representation in
VC++, so it doesn't matter much. The brackets are just a matter of
style.
Alex
History and inertia. Many of the Windows include files began their life in
Windows 3.x, which was a 16-bit operating system. In the 16-bit compilers,
an unadorned integer constant was 16 bits wide. So, 0x00010000 compiled as
0. You had to use the L suffix to force a 32-bit constant.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
> #define DELETE (0x00010000L)
> #define READ_CONTROL (0x00020000L)
> #define WRITE_DAC (0x00040000L)
> #define WRITE_OWNER (0x00080000L)
> #define SYNCHRONIZE (0x00100000L)
>
> What's the deal?
The L was once needed to define a 32bit integer, as has been said.
Brackets are quite generally a very good habit to get into in defining
things. They may not make much difference in this case but in many cases
they are absolutely crucial.
Consider either
#define one_more_than(x) x+1
or
#define one_more_than(x) (x+1)
and work out the result of
int x = 42;
int y = 2*one_more_than(x);
int z = one_more_than(x)*2;
in each case :-)
[Remember that #define just does a text substitution before the code is
compiled.]
Once you've been bitten a few times, it becomes second nature to include
lots of brackets in #define's. Clearly the guy who wrote the first lot,
didn't bother thinking about whether he could safely miss them out. :-)
Dave
--
David Webber
Author of 'Mozart the Music Processor'
http://www.mozart.co.uk
For discussion/support see
http://www.mozart.co.uk/mozartists/mailinglist.htm
True, but for a numerical constant using brackets may actually *hide* coding
errors. For instance, consider the following:
#define SHRINK_BOX_AMOUNT -20
CRect rct(l,t,r,b);
rct.DeflateRect(-SHRINK_BOX_AMOUNT, -SHRINK_BOX_AMOUNT);
leads to a compiler error (cannot decrement an rvalue). So by leaving out
the brakets you are implicitly saying that this quantity is *negative* which
is something you can't tell without looking at the actual #define.
Conversely:
#define SHRINK_BOX_AMOUNT (-20)
CRect rct(l,t,r,b);
rct.DeflateRect(-SHRINK_BOX_AMOUNT, -SHRINK_BOX_AMOUNT);
Compiles fine, even though that it's clear that the intention is to make a
smaller box, not a larger one.
- Alan Carre
> "Gerry Hickman" <gerry...@newsgroup.nospam> wrote in message
> news:%23nDeGJN...@TK2MSFTNGP06.phx.gbl...
>
>> #define DELETE (0x00010000L)
>> #define READ_CONTROL (0x00020000L)
>> #define WRITE_DAC (0x00040000L)
>> #define WRITE_OWNER (0x00080000L)
>> #define SYNCHRONIZE (0x00100000L)
>>
> The L was once needed to define a 32bit integer, as has been said.
OK, as we are now in a similar situation with x86 and x64, are there any
similar issues?
> Brackets are quite generally a very good habit to get into in defining
> things. They may not make much difference in this case but in many
> cases they are absolutely crucial.
> Consider either
>
> #define one_more_than(x) x+1
>
> or
>
> #define one_more_than(x) (x+1)
Yes, I can see the problem here, but when it's just one number it looks
strange, but I guess it's easy to get used to. So would you use brackets
for every definition in your own header files?
-SHRINK_BOX_AMOUNT won't get parsed as --20, but as - -20 (which will
likely compile just fine, but I'm too lazy to check). Preprocessor works
after lexer, handling tokens rather than individual characters. You need
## operator to create new tokens.
> Conversely:
>
> #define SHRINK_BOX_AMOUNT (-20)
>
> CRect rct(l,t,r,b);
> rct.DeflateRect(-SHRINK_BOX_AMOUNT, -SHRINK_BOX_AMOUNT);
>
> Compiles fine
Again, there's actually no difference between these two examples.
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
LL or ULL suffix for 64-bit.
>> The L was once needed to define a 32bit integer, as has been said.
>
> OK, as we are now in a similar situation with x86 and x64, are there any
> similar issues?
The int type on Intel x64 is actually 32-bit. A big gotcha in x64 is
when you try to add a negative integer offset to a pointer. Consider the
following:
char * ptr = new char[100];
int delta = -1;
...
ptr = ptr + delta;
The problem is that delta's value is not FFFFFFFFFFFFFFFF, but
00000000FFFFFFFF, which is definitely not -1 in 64-bit. The solution is
to use a 64-bit integer type, such as ptrdiff_t or INT_PTR, for pointer
offsets. In other words, constants like 0xFFFFFFFF should raise an eyebrow.
Go to http://www.viva64.com/articles/ , and read the article "20 issues
of porting C++ code on the 64-bit platform", especially "3. Magic numbers".
Tom
Nope, just try it. It won't compile, no spaces are inserted after the minus
sign. The preprocessor simply replaces the #define with -20.
And --20 is not a *new* token, it is not a token at all!
What is probably confusing you is the fact that there's only one macro here
and no arguments. If I wanted to append two "macros arguments" then I would
need to use ##. As in
#define __ADDSIGN__(SIGN,NUMBER) SIGN##NUMBER
But that's not what I wrote. I placed a minus sign before the number -20
(-SHRINK_VALUE_AMOUNT) which the preprocessor spits out as --20 (as it's
supposed to) and results in a compile error.
- Alan Carre
#define SHRX -10
#define SHRY -20
#define SHRINK(rect) rect.DeflateRect(-6-SHRX,6-SHRY)
You get a compile error.
- Alan Carre
and why would you expect this to work?
#define SHRINK(rect) rect.DeflateRect(-6-(SHRX),6-(SHRY))
does the trick.
--
It's a matter of catching bugs, not hiding them. By not using brackets for
the negatives I discover that I made a mistake in SHRINK automatically. If I
wrote it your way it would compile, but it would still be wrong.
- Alan Carre
This program successfully compiles:
#define X -1
int main()
{
return -X;
}
> And --20 is not a *new* token, it is not a token at all!
--20 is parsed as two tokens, "--" and "20". -X in the above example
gets expanded to three tokens, "-", "-" and "1". Compiler proper works
on a stream of tokens (after lexer and preprocessor), not a stream of
characters. For more details, see C++ standard 2.1 "Phases of
translation".
This looks like a bug in MSVC compiler. This program:
void f(int, int) {}
#define SHRX -10
#define SHRY -20
#define SHRINK() f(-6-SHRX,6-SHRY)
int main()
{
SHRINK();
return 0;
}
produces an error in VC (I tried VC7.1 and VC8), but compiles fine with
Comeau. Anybody has later versions handy to check?
I don't get it. I would never expect the above to compile. I don't
view this as a MSVC bug. Why would you consider it valid and thus the
MSVC compiler wrong for not compiling?
Think of the OP CODES here.
Its a MACRO, not a variable. Either you put parenthesis around the
MACRO or you change that to variables
int SHRX = -10;
int SHRY = -20;
#define SHRINK(rect) rect.DeflateRect(-6-SHRX,6-SHRY)
--
SHRINK() should expand to the following stream of tokens (note - tokens,
not characters):
f ( - 6 - - 10 , 6 - - 20 ) ;
This is perfectly valid code. Under no circumstances should a new "--"
token be formed. But MSVC does precisely that, if the error message is
to be believed.
According to the standard, the compiler works through the proprocessor to
the lexical analysis on a sequence of tokens. '-' is a separate tocken from
SHRX. Any macro substitution doesn't re-tokenize the stream. It doesn't
concatenate two separate '-' tokens into a single '--' one.
What you're talking about is a real problem, but your example doesn't
demonstrate it. "delta" is a signed int, so when it is promoted to 64-bits
for addition to the pointer, it will be sign-extended. This actually works
just fine.
I *suspect* you were trying to show this:
char * ptr = new char[100];
unsigned int delta = -1;
...
ptr = ptr + delta;
THAT fails, but it also gets a warning. This doesn't get a warning:
char * ptr = new char[100];
unsigned int delta = ~0;
...
ptr = ptr + delta;
Partly because of this, I am trying to go back through all of my existing
code to use "ptrdiff_t" for variables like this.
So what you are saying it should be viewed as a variable?
No. MACRO expansion must taken place.
What the draft standard says,
2.1 Phase of translation
...
4. Preprocessing directives are executed, macro invocations are
expanded, and _Pragma unary operator expressions are executed.
If a character sequence that matches the syntax of a
universal-character-name is produced by token concatenation
(16.3.3), the behavior is undefined.
That is how I expect it - the keyword is MACRO EXPANSION which is
before tokenization.
As we do in our p-code processor, macro substitution is done first.
MACROS will not work right if this is not done first. I can see all
kinds of logic breaking. By definition, a MACRO is a concept of
substitution before processing.
In section "16.3 Macro replacement" I believe this is critical point
about this:
9 A preprocessing directive of the form
# define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance
of the macro name to be replaced by the replacement list of
preprocessing tokens that constitute the remainder of the
directive. The replacement list is then rescanned for more macro
names as specified below.
Note it says "rescanned" - that means that total expansion must take
place before it processes the tokens into OP CODES, etc.
But using parenthesis, that helps the tokenization. But when you use
a variable, the OP CODES are must difference. Both make it work.
I don't see how the macro expansion here:
#define SHRX -10
#define SHRY -20
#define SHRINK() f(-6-SHRX,6-SHRY)
can ever be expected to be correct for tokenization.
To prove the point, compile the code as so:
cl testshrink.cpp /P
and you will see a pro-processed file, testshrink.i, be created. This
is the file that is finally tokenized for op code translations.
Remove the /P and the 2nd phase begins.
--
>> I don't get it. I would never expect the above to compile. I don't
>> view this as a MSVC bug. Why would you consider it valid and thus the
>> MSVC compiler wrong for not compiling?
>
> SHRINK() should expand to the following stream of tokens (note - tokens,
> not characters):
>
> f ( - 6 - - 10 , 6 - - 20 ) ;
>
> This is perfectly valid code. Under no circumstances should a new "--"
> token be formed. But MSVC does precisely that, if the error message is
> to be believed.
True Igor.
But that is not what you have above. You don't have a macro with
white-spaces, like so:
#define SHRINK() f( - 6 - SHRX, 6 - SHRY)
Add your spaces and you will see it work. The white-space is a
delimiter here, just like parenthesis would be useful here.
This is one major reasons, I use the format:
const DWORD SHRY = -10;
because you never know how it will be used in code.
--
That's obviously not true:
3. The source file is decomposed into preprocessing tokens...
Phase 3 naturally occurs before phase 4.
> As we do in our p-code processor, macro substitution is done first.
> MACROS will not work right if this is not done first. I can see all
> kinds of logic breaking. By definition, a MACRO is a concept of
> substitution before processing.
Macros manipulate a stream of tokens, not a stream of characters.
> In section "16.3 Macro replacement" I believe this is critical point
> about this:
>
> 9 A preprocessing directive of the form
>
> # define identifier replacement-list new-line
>
> defines an object-like macro that causes each subsequent instance
> of the macro name to be replaced by the replacement list of
> preprocessing tokens that constitute the remainder of the
> directive. The replacement list is then rescanned for more macro
> names as specified below.
Note - the replacement list of _tokens_, not characters.
> Note it says "rescanned"
Rescanned for _macro names_ - not for tokens that might get glued
together to form other tokens. Macro substitution doesn't rerun the
lexer. It shouldn't turn two consecutive "-" tokens into a single "--"
token (except by application of ## operator, see 16.3.3)
Under your theory, can you explain why this program happily compiles on
MSVC?
#define X -1
int main()
{
return -X;
}
Why doesn't the compiler complain about "return --1;" statement?
> - that means that total expansion must take
> place before it processes the tokens into OP CODES, etc.
I can't seem to find the term "OP CODE" anywhere in the C++ standard.
> I don't see how the macro expansion here:
>
> #define SHRX -10
> #define SHRY -20
> #define SHRINK() f(-6-SHRX,6-SHRY)
>
> can ever be expected to be correct for tokenization.
What do you mean by "correct for tokenization"? Again, according to 2.1,
tokenization happens before macro replacement.
> To prove the point, compile the code as so:
>
> cl testshrink.cpp /P
>
> and you will see a pro-processed file, testshrink.i, be created.
Unfortunately, MSVC appears to have a bug when outputting this file. In
particular, the example I show above compiles directly, but doesn't
compile if explicitly preprocessed first.
It doesn't matter. The replacement list of the macro is broken into
tokens before the macro replacement takes place. "-" and "SHRX" are two
separate tokens. The replacement list of SHRX macro again consists of
two tokens, "-" and "10". Finally, after macro substitution, -SHRX
should be turned into a sequence of three tokens, "-" "-" "10". The fact
that MSVC instead turns it into two tokens "--" and "10", making up "--"
token where none existed before, is a bug.
In my example above, I didn't mean spaces literally. I'm showing a
sequence of 14 tokens, not a string consisting of some letters, digits,
punctuation and whitespace. Characters are gone at the tokenizing phase
(phase 3 in C++ standard 2.1), the rest of the processing (phase 4 and
onward) deals only with tokens. I'm not sure how I can make it any
clearer.
Lexical analyzing the code, which can only be called scanning and it
can also be required in a multi-pass compiler before syntax analysis
and actual code generation takes place.
I think it fairly safe to say that in the annals of computer software,
MACROS was always meant to be a REPLACEMENT before processing -
compilers included.
>> To prove the point, compile the code as so:
>>
>> cl testshrink.cpp /P
>>
>> and you will see a pro-processed file, testshrink.i, be created.
>
> Unfortunately, MSVC appears to have a bug when outputting this file.
> In particular, the example I show above compiles directly, but
> doesn't compile if explicitly preprocessed first.
Not sure what that means here. /P (preprocess to file) auto creates
the filename with a forced extension '.i' (dot i).
No error is produced until it begins the actual compilation.
Because you are not clear and you expecting something most C people
will not expect. Now I see why you think "MACROS are evil" but now it
is clear you are expecting something that simply isn't there.
When you have macros:
#define SHRX -10
#define SHRY -20
#define SHRINK1() f(-6-SHRX,6-SHRY)
#define SHRINK2() f( - 6 - SHRX, 6 - SHRY)
are two different macros with two different macro expansions and
syntax processing.
How clearer can that be? SHRY and SHRX are not reference and
variables here.
if you expect to have code like this in macros, then you have to pay
special attention to lexical analysis which include macro expansion.
It may require help, using parenthesis or spaces or use variables
instead.
How clearer can that be? It doesn't violate any lexical, syntax and
translation rules or guidelines whatsoever.
--
> When you have macros:
>
> #define SHRX -10
> #define SHRY -20
> #define SHRINK1() f(-6-SHRX,6-SHRY)
> #define SHRINK2() f( - 6 - SHRX, 6 - SHRY)
>
> [Here ] are two different macros with two different macro expansions
> and syntax processing.
Igor, You may find this to be a worthy point. Section 16.3 pretty much
make this very clear:
16.3 Macro replacement
1 Two replacement lists are identical if and only if the
preprocessing tokens in both have the same number,
ordering, spelling, and *WHITE_SPACE SEPARATION*, where all
white-space separations are considered identical.
No one can preassume SHRINK1() and SHRINK2() to be equivalent in
syntax processing, not when the SHRX, SHRY are non-parenthentical
macros themselves.
I'm not sure what this is supposed to prove. The standard only uses this
definition of "identical" when describing in 16.3p2 when and how an
already #defined macro can be redefined. All it says is that the
following is illegal:
#define X -1
#define X - 1
while this is legal:
#define Y -1
#define Y -1
In addition, per 16.3.2, the whitespace in the macro definition is
significant when stringizing operator # is involved:
#include <stdio.h>
#define X -1
#define Y - 1
#define STR1(x) #x
#define STR(x) STR1(x)
int main()
{
printf("%s\n", STR(X));
printf("%s\n", STR(Y));
return 0;
}
This prints
-1
- 1
I'm only expecting that the C or C++ compiler follow the letter of their
corresponding languages' standards. Is this too much to ask?
Those C people should read section 5.1.1.2 of C99 standard, which
describes phases of translation of a C program. Just as for C++, phase 3
is "The source file is decomposed into preprocessing tokens...", and
phase 4 is "Preprocessing directives are executed, macro invocations are
expanded...".
> Now I see why you think "MACROS are evil"
I do believe this, but not at all because of this (rather minor and
esoteric) issue. Macros are evil because they stomp all over scopes and
namespaces. E.g., you write a class with GetMessage method (a nice,
generally useful method name). In one source file, the header for this
class is included after <windows.h>, while another source (perhaps one
that actually implements class' methods) uses the class but doesn't
include any Windows headers. You get linker errors. Turns out Windows
headers define GetMessage macro that expands to, say, GetMessageW.
> When you have macros:
>
> #define SHRX -10
> #define SHRY -20
> #define SHRINK1() f(-6-SHRX,6-SHRY)
> #define SHRINK2() f( - 6 - SHRX, 6 - SHRY)
>
> are two different macros with two different macro expansions and
> syntax processing.
Many different sequences of characters may be parsed to the same
sequence of tokens, and are then treated the same by the syntax
analysis. You wouldn't say that (1+2) and ( 1 + 2 ) are two
syntactically different expressions, would you?
> How clearer can that be? SHRY and SHRX are not reference and
> variables here.
Of course not. Have I ever mentioned the words "reference" or "variable"
anywhere in this thread? These terms have well-defined meaning within
C++ standard, and don't in any way apply to macros.
> if you expect to have code like this in macros, then you have to pay
> special attention to lexical analysis which include macro expansion.
Again - lexical analysis (translation phase 3) happens before macro
expansion (translation phase 4).
Again, consider this program:
#define X -1
int main()
{
return -X;
}
Save this code in a file named, say, test.c. This works:
cl test.c
This fails:
cl /P test.c
cl /TC test.i
The compiler can't be right in both cases, can it? Either the program is
illegal and shouldn't compile at all, or the preprocessed file is
generated incorrectly. Either way, there's a bug. My reading of the C++
standard suggests that the latter is the case - the program is valid,
and the bug is in generating the preprocessed file.
Note that a current standard would cause different compilation result,
compared to when a preprocessing output was saved to a file and then
compiled. Making those equivalent would require a concept of "invisible
whitespace" in the input file.
"Igor Tandetnik" <itand...@mvps.org> wrote in message
news:ukeLokwT...@TK2MSFTNGP06.phx.gbl...
Well, I don't see why preprocessor cannot introduce "real" whitespace
into the text stream, to make sure that the preprocessed file is
tokenized into the same stream of tokens as the original file. In my
oft-repeated example, why can't the preprocessor output "return - -1;" ?
Because it will fundamental break the concept of MACROS.
Its the same as this:
#include <stdio.h>
#include <windows.h>
#define firstname "Igor"
#define lastname "Tandetnik"
#define fullname1 firstname lastname
#define fullname2 firstname lastname
#define fullname3 firstname " " lastname
void main(char argc, char *argv[])
{
printf("fullname1 = |%s|\n",fullname1);
printf("fullname1 = |%s|\n",fullname2);
printf("fullname1 = |%s|\n",fullname3);
}
output:
fullname1 = |IgorTandetnik|
fullname1 = |IgorTandetnik|
fullname1 = |Igor Tandetnik|
Why would you expect it to add a white space for you in fullname1,
fullname2?
The only reason for the space in the macro is to separate the tokens.
--
Semantic differentials.
I think you should step back and just look at this from the
programmers common sense point of view.
All MACRO concepts, whether its a compiler or not, is a replacment
concept, a line translation before it is processed or in the case of
C, syntax processing.
Does MSVC multi-pass compiler mean here? What do you think is the
purpose of multi-pass compiler?
>> Now I see why you think "MACROS are evil"
>
> I do believe this, but not at all because of this (rather minor and
> esoteric) issue. Macros are evil because they stomp all over scopes and
> namespaces.
Its a matter of opinion. MACRO has survived and have been used
successfully for ages and like anything else, proper understanding of
its utility, functionality and usage is required. At no point, do you
view it as a promotion of bugs if you know what you are doing - just
like anything else.
>> When you have macros:
>>
>> #define SHRX -10
>> #define SHRY -20
>> #define SHRINK1() f(-6-SHRX,6-SHRY)
>> #define SHRINK2() f( - 6 - SHRX, 6 - SHRY)
>>
>> are two different macros with two different macro expansions and
>> syntax processing.
>
> Many different sequences of characters may be parsed to the same
> sequence of tokens, and are then treated the same by the syntax
> analysis. You wouldn't say that (1+2) and ( 1 + 2 ) are two
> syntactically different expressions, would you?
No, but I would
(1+MACRO_X) and (1 + MACRO_X)
If the MACRO_X introduces a conflictive construct with the languag,
then you need to take into account.
What I think you are saying is that, when the compiler is scanning, it
should read:
--
- -
as the same by ignoring the white space?
The specific MACRO in this case where there is clear potential for
syntax processing confusion is the issue.
But if the long traditional idea of MACRO string/token substitution
and line translations BEFORE processing is not being understood, I can
understand where you are coming from.
>> How clearer can that be? SHRY and SHRX are not reference and
>> variables here.>
> Of course not. Have I ever mentioned the words "reference" or "variable"
> anywhere in this thread? These terms have well-defined meaning within
> C++ standard, and don't in any way apply to macros.
I know you didn't Igor. I am just highlighting that it isn't because
it highlight the clear distinction in how the syntax processing will
be done. The compiler must make a decision on what OP CODES to push
into the final compilation. The variable/reference is an instant
trigger for that logic. A Macro is expanded first so the processing
takes before BEFORE intermediate code is created.
What bothers me is that common sense - maybe its not in this era of
C++ templates which use to be a MACRO concept as well.
>> if you expect to have code like this in macros, then you have to pay
>> special attention to lexical analysis which include macro expansion.
>
> Again - lexical analysis (translation phase 3) happens before macro
> expansion (translation phase 4).
Igor, you are a smart guy. You know your _hit.
So I am vexed in either you are mis-reading the standard, locking
yourself into something that was never there, which means someone
needs to inform the draft C++ standard committee they still have
semantic ambiguity to clear up, or you need to better understand what
a compiler does.
What is critical is that a MACRO by definition is a STRING TOKEN
SUBSTITUTION concept and you can not process it until the final
translations have taken place.
Don't know if you written a compiler, but we have, several, as part of
our server-side hosting server. A heavy emphasis is done with MACROs.
We must get it right otherwise all our 3rd party developers and system
web operators will have it ALL wrong.
First, there is many implementation factors, i.e. levels of passes is
a key one, it can be done line by line or it can go into a
intermediate file. But in general, expansions takes place during the
lexical analysis, tokenization, scanning and line translation what
have you before it is moves into a syntax processing phase. How that
is done depends on the compiler, single, multiple passes, etc.
In general, this process is called lexical analysis, it may or may not
include parsing for syntax. It depends, i.e, an interpreter may be
done that way.
If you come to feel MACROs are evil, promotes problems, then its only
because the key point of substitution before processing is not coming
across. In some way, I think the C++ template evolution could be a
reason for the confusion.
The MSVC is a multi-pass compiler. When someone mentally codes a
program with MACROS, he/she must think of it like a "TEMPLATE" which
means that string substitutions will take place before it is analyze
for final output.
But I don't think any of the above matters because this a critical
programming concept for languages that do offer MACRO concepts.
Again, go back to the /P switch to compile a MACRO that you know is
clearly wrong:
#include <stdio.h>
#include <windows.h>
void f(int x, int y) {}
#define SHRX -10
#define SHRY -20
#define SHRINK1() f( - 6 - SHRX, 6 - SHRY)
#define SHRINK2() f(-6-SHRX, 6-SHRY)
#define SHRINK3() f(FOO, FOO)
int main()
{
f(- 6 - -10, 6 - - 20);
SHRINK1();
SHRINK2();
SHRINK3();
return 0;
}
When you compile with /P, take note of two things:
- No error
- Substitution and lack of substitution in the '.i' file:
This is in the .i file:
int main()
{
f(- 6 - -10, 6 - - 20);
f( - 6 - -10, 6 - -20);
f(-6--10, 6--20);
f(FOO, FOO); <<--- no substitution found.
return 0;
}
When you do finally try to compile this for image code output, you
will notice the error is not at the #define statement but at the line
it is used.
More proof, and this is very critical in understanding that MACRO is
substitution concept, Add the following after SHRINK3() macro:
#ifdef SHRINK3
#undef SHRINK3
#define SHRINK3() f(FOO, BAR)
#endif
compile this with /P and you will see the .i file have:
int main()
{
f(- 6 - -10, 6 - - 20);
f( - 6 - -10, 6 - -20);
f(-6--10, 6--20);
f(FOO, BAR);
return 0;
}
All this, are more than enough clear indications that macro
substitution takes place before syntax processing.
--
>...
> Yes, I can see the problem here, but when it's just one number it looks
> strange, but I guess it's easy to get used to. So would you use brackets
> for every definition in your own header files?
I see I have started a long argument in this thread. But (I think) the
result is that if you use brackets you can be sure of what you're doing, and
ergo it is a good idea.
And no I haven't always used brackets when #defining simple constants, but
these days I tend to avoid defining simple constants.
I (almost) never write
#define XYZ 0x20
and (almost) always write
const UINT XYZ = 0x00000020;
or, where appropriate, use enums.
I like having the type of the number well-defined, but have no scruples of
using explicit casts (in any style) where I don't want the compiler
pestering me that I might be losing data. Eg
(BYTE)XYZ;
Dave
--
David Webber
Author of 'Mozart the Music Processor'
http://www.mozart.co.uk
For discussion/support see
http://www.mozart.co.uk/mozartists/mailinglist.htm
Why do you believe that your opinion represents "common sense", whatever
that means? Have you noticed that the compiler itself disagrees with
your "common" sense? Let me try and make it even more obvious:
#define MINUS -
int main()
{
return -MINUS 1;
}
Please explain why the above code compiles, while the below code
doesn't:
int main()
{
return -- 1;
}
> All MACRO concepts, whether its a compiler or not, is a replacment
> concept, a line translation before it is processed or in the case of
> C, syntax processing.
Quite. However, you believe the replacement occurs at the level of
individual characters, why I claim (and the C and C++ standards concur)
that the replacement occurs at the level of tokens.
> Does MSVC multi-pass compiler mean here?
I don't understand this question. I can't parse it as a valid English
sentence.
> What do you think is the
> purpose of multi-pass compiler?
I don't know - I don't believe I use one. Please enlighten me.
>> Many different sequences of characters may be parsed to the same
>> sequence of tokens, and are then treated the same by the syntax
>> analysis. You wouldn't say that (1+2) and ( 1 + 2 ) are two
>> syntactically different expressions, would you?
>
> No, but I would
>
> (1+MACRO_X) and (1 + MACRO_X)
And you would be incorrect.
> What I think you are saying is that, when the compiler is scanning, it
> should read:
>
> --
> - -
>
> as the same by ignoring the white space?
No, of course not. Translation phase 3 would parse -- as a single token,
and - - as two tokens. However, the same translation phase would
parse -MINUS as two tokens - and MINUS. Then, at translation phase 4,
token MINUS gets replaced with token - by the preprocessor (assuming
"#define MINUS -" line as shown above). The stream of tokens fed to
phase 5 then consists of two tokens - - (note: two tokens "-" and "-",
not three characters "-", " " and "-").
You insist on lumping all translation phases together under "compiler is
scanning" term. You should be more precise, then perhaps the picture
will become clearer.
> But if the long traditional idea of MACRO string/token substitution
> and line translations BEFORE processing is not being understood, I can
> understand where you are coming from.
You keep conflating string substitution and token substitution. They are
not the same thing.
> I know you didn't Igor. I am just highlighting that it isn't because
> it highlight the clear distinction in how the syntax processing will
> be done. The compiler must make a decision on what OP CODES to push
> into the final compilation. The variable/reference is an instant
> trigger for that logic. A Macro is expanded first so the processing
> takes before BEFORE intermediate code is created.
Of course. Both tokenization (phase 3) and macro expansion (phase 4)
occur before syntactial and semantic analysis (phase 7). Have I ever
argued otherwise? The disagreement we have is about how tokenization and
macro expansion interact.
> What bothers me is that common sense - maybe its not in this era of
> C++ templates which use to be a MACRO concept as well.
What bothers me is your taking upon yourself a role of the arbiter for
common sense. Do you have any evidence suggesting that your view is
indeed shared by the majority of programmers? It doesn't seem to be
shared by C++ standardization committee, nor by C++ compiler
implementers.
> What is critical is that a MACRO by definition is a STRING TOKEN
> SUBSTITUTION
What's "string token"? I don't believe the standard ever mentions this
term.
> concept and you can not process it until the final
> translations have taken place.
What's "final translations"? Could you be more precise, preferably using
the terms actually defined by normative documents?
If by "final translations" you mean the point after all nine translation
phases are completed, then this statement is clearly untrue: macro
substitution is phase 4 out of 9.
> Don't know if you written a compiler, but we have, several, as part of
> our server-side hosting server. A heavy emphasis is done with MACROs.
> We must get it right otherwise all our 3rd party developers and system
> web operators will have it ALL wrong.
http://en.wikipedia.org/wiki/Argumentum_ad_verecundiam
> First, there is many implementation factors, i.e. levels of passes is
> a key one, it can be done line by line or it can go into a
> intermediate file. But in general, expansions takes place during the
> lexical analysis, tokenization, scanning and line translation what
> have you before it is moves into a syntax processing phase.
Not quite. Macro expansion in C and C++ happens _after_, not during,
tokenization, and indeed before the compiler moves into a syntax
processing phase. I'm not familiar with the term "line translation", and
I'm not sure what you mean by "scanning" in this context.
> In general, this process is called lexical analysis
Well, you could, I guess, lump tokenization and macro expansion together
under the heading "lexical analysis". I don't quite see how making up
new terms advances the discussion.
> If you come to feel MACROs are evil, promotes problems, then its only
> because the key point of substitution before processing is not coming
> across.
I understand perfectly well how macros work, thank you very much. They
are evil precisely _because_ macro expansion happens before syntax
analysis. Macros work outside the grammar of the language, and that's
the problem.
See also http://www.gotw.ca/gotw/063.htm
> The MSVC is a multi-pass compiler.
I don't believe that's correct, but even if it is, I don't see what it
has to do with anything. A compiler, whether multi-pass or otherwise,
must behave as if phases of translation occur as specified in the
standard. Otherwise it's non-conformant (which is really a euphemism for
"buggy").
>When someone mentally codes a
> program with MACROS, he/she must think of it like a "TEMPLATE" which
> means that string substitutions will take place before it is analyze
> for final output.
Not string substitutions - token substitutions.
> Again, go back to the /P switch to compile a MACRO that you know is
> clearly wrong:
I believe the program is correct, but the output of the compiler with /P
switch is wrong. The compiler has a bug.
> All this, are more than enough clear indications that macro
> substitution takes place before syntax processing.
Of course it does. I've never stated otherwise.
I don't know what "the concept of MACROS" is. Is this term defined
anywhere?
I know that the standards prescribe certain behavior for the conformant
C and C++ compilers, and MSVC compiler doesn't follow these
prescriptions.
> Its the same as this:
>
> #include <stdio.h>
> #include <windows.h>
>
> #define firstname "Igor"
> #define lastname "Tandetnik"
> #define fullname1 firstname lastname
> #define fullname2 firstname lastname
> #define fullname3 firstname " " lastname
>
> void main(char argc, char *argv[])
> {
> printf("fullname1 = |%s|\n",fullname1);
> printf("fullname1 = |%s|\n",fullname2);
> printf("fullname1 = |%s|\n",fullname3);
> }
>
> output:
>
> fullname1 = |IgorTandetnik|
> fullname1 = |IgorTandetnik|
> fullname1 = |Igor Tandetnik|
I have no idea what this is supposed to prove, or how it is related to
our discussion.
> Why would you expect it to add a white space for you in fullname1,
> fullname2?
Why do you think I would expect that?
Note that, if the preprocessor expanded this line
printf("fullname1 = |%s|\n",fullname1);
into this
printf("fullname1 = |%s|\n", "Igor" "Tandetnik" );
the behavior of the program wouldn't change (note extra whitespace
around individual tokens). I see no reason why the preprocessor couldn't
do this if it were so inclined.
> I see I have started a long argument in this thread.
But it's interesting to learn about tokens, white space and macros and
how they can lead to subtle bugs. I thought #define was easy until now!
> And no I haven't always used brackets when #defining simple constants,
> but these days I tend to avoid defining simple constants.
>
> I (almost) never write
>
> #define XYZ 0x20
>
> and (almost) always write
>
> const UINT XYZ = 0x00000020;
Yes, I like this but why isn't it done this way in winnt.h?
> or, where appropriate, use enums.
Could they have used enums for this kind of thing in winnt.h (line 8060)
#define FILE_READ_EA ( 0x0008 ) // file & directory
#define FILE_WRITE_EA ( 0x0010 ) // file & directory
#define FILE_EXECUTE ( 0x0020 ) // file
#define FILE_TRAVERSE ( 0x0020 ) // directory
#define FILE_DELETE_CHILD ( 0x0040 ) // directory
and I see they put some whitespace between the brackets and the number here.
--
Gerry Hickman (London UK)
Because it's not valid C (though it's valid C++).
> Could they have used enums for this kind of thing in winnt.h
Probably.
They are easy. Use it wrong, like anything else, undefined behaviors
can happen.
> #define FILE_READ_EA ( 0x0008 ) // file & directory
> #define FILE_WRITE_EA ( 0x0010 ) // file & directory
> #define FILE_EXECUTE ( 0x0020 ) // file
> #define FILE_TRAVERSE ( 0x0020 ) // directory
> #define FILE_DELETE_CHILD ( 0x0040 ) // directory
>
> and I see they put some whitespace between the brackets and the number
> here.
Right. In this case, the white space is for clarity, the parentheses
or curved brackets are for token delimiters.
Some note:
() Parentheses or round or curved brackets
[] Brackets or squared brackets
<> Angle brackets
{} Curly Brackets or braces
---
VC9 (15.00.21022.08):
test.cpp(9) : error C2105: '--' needs l-value
Schobi
"Tommy" <b...@reallybad.com> wrote in message
news:e$Lv09xTJ...@TK2MSFTNGP05.phx.gbl...
For example
# /*
*/ define /*
*/ MACRO 1
is a valid proprocessor statement. Easy, isn't it?
Thats just as bad as that other thread regarding that construct:
x = x++ + y++;
Can't have it both ways. Clarity and ambiguity goes across the board,
including macros.
--
> I *suspect* you were trying to show this:
>
> char * ptr = new char[100];
> unsigned int delta = -1;
> ...
> ptr = ptr + delta;
Yes. Thanks Tim.
Tom
Although I am enjoying the remainder of this debate, here you have
presented an utterly specious argument. There is a VAST difference between
"whitespace" as a simple syntactic element and "whitespace" as a character
in a string. This example demonstrates nothing about the fundamental
nature of either macros or MACROS. All you have shown is that all of the
following are compiled identically:
"Aaa""bbb"
"Aaa" "bbb"
"Aaa" "bbb"
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
Have a good turkey day - turkey. :-)
--
"Tommy" <b...@reallybad.com> wrote in message
news:ui$GD1IUJ...@TK2MSFTNGP02.phx.gbl...
I wasn't the one who made the suggestion that it should - quite the
opposite. Read the messages again and FOLLOW IT twit!
==
> Could they have used enums for this kind of thing in winnt.h (line 8060)
>
> #define FILE_READ_EA ( 0x0008 ) // file & directory
> #define FILE_WRITE_EA ( 0x0010 ) // file & directory
> #define FILE_EXECUTE ( 0x0020 ) // file
> #define FILE_TRAVERSE ( 0x0020 ) // directory
> #define FILE_DELETE_CHILD ( 0x0040 ) // directory
Yes (I've done it) but I don't recommend it. It is fairly painful using
enums for bit fields which you can combine with | as the result tends not to
be in the set defined by the enum - unless you have an awfully long enum :-)
enums are best used for things with a finite number of values which don't
combine.
> and I see they put some whitespace between the brackets and the number
> here.
Microsoft typists are trained to put white space (and braces) in all sorts
of silly places, bless their cotton socks.
<tantrum>
I hate hate hate
if (x) {
//...
}
when it should obviously be
if( x )
{
//...
}
</tantrum>
Still, what with the recession, there's going to be a world shortage of
spaces and carriage returns, and so I'll have to get used to it. :-)
Don't be a hater <g>
And how do you feel when you see?
if ( x ) { ... } // comment
or just
if ( x ) ... // comment
or
// comment
if ( x )
...
> Still, what with the recession, there's going to be a world shortage of
> spaces and carriage returns, and so I'll have to get used to it. :-)
hehehehe
>> I hate hate hate
>> ...
> Don't be a hater <g>
Sorry - I meant "I deprecate, deprecate, deprecate..." but it doesn't quite
have the same ring :-)
> And how do you feel when you see?
>
> if ( x ) { ... } // comment
Great! I love if, else if, else choices where every action fits in {....}
on one line. [But if even one of the options needs more than one line I
arrange them all in the standard way.]
> or just
>
> if ( x ) ... // comment
Again fine. But I don't like
if( x ) ...
else
{
...
}
Braces for one means braces for all.
> or
>
> // comment
> if ( x )
> ...
Occasionally if I have to. I try to limit myself to 90 characters wide, as
that way I can print selected chunks (when needed) from visual studio on A4
paper without line wraps. I have, however, managed to overcome my earlier
prediliction for 80 characters which had been with me since the punch card
FORTRAN days - progress! :-)
I suspect, like everyone else, I choose conventions which I feel easiest to
read to facilitate code maintenance.
Sometimes I'll do your approach
if ()
{
}
else
{
}
if it looks cleaner, but most of the time I use:
if () {
} else {
}
I don't prefer
if (x)
...;
else
...;
and I'll change it to the above almost without thinking twice. :-)
> Occasionally if I have to. I try to limit myself to 90 characters
> wide, as that way I can print selected chunks (when needed) from visual
> studio on A4 paper without line wraps. I have, however, managed to
> overcome my earlier prediliction for 80 characters which had been with
> me since the punch card FORTRAN days - progress! :-)
You mean 74. :) 5 for line numbers, 6th column for continuation :)
Did you program the drums too?
Same here. I still keep 75 characters 99.99% of the time. But of late
with greater use of these freakiest wide screen monitors, weary eyes,
I've been using with greater frequency 60x132 character wide text
windows if only because reading other people code who use long lines.
Sometime I will find myself cleanly chopping it up to maybe better
understand it and see it all better.
Of course, for text based messages, it still works better to have
within 75-80 since you never know how a remote viewer will render it.
> I suspect, like everyone else, I choose conventions which I feel easiest
> to read to facilitate code maintenance.
It happens to all, IMO. Geez, when I was using APL, it was all one
line baby! <g>
--
I am still stuck with 80 characters. It is because I am used to
divide VS client space into two vertical panes, so I can open .CPP
files in one pane, while .H files in the other. Making a code
wider than 80 characters forces me to use horizontal scroll, which
I hate to no end. Probably, when I get wider monitor (which with
current budget cuts seems as a rather distant future) then I will
relax width limits to 90 characters or more.
As per coding style, I for one, easily adopt surrounding coding
style (unless it is outstandingly stupid). My default coding style
is similar to MFC/ATL.
As an interresing fact, there are much fewer holy wars about
coding style in C#/Java world. Just because an editor puts a whole
coding construct for you most of the time. Then nobody cares to
reformat it by hand according to his/her own style.
Alex
Boy, go away for a day or 2 and all hell breaks loose!
----
I think you guys a probably talking way over my head about multiple pass
compilation and re-tokenization expansion blah blah blah... But, for my own
sake at least, let's just look at a simple, very well-known fact about the
pre-processor for one minue:
#define STRIZE(x) #x // converts the argument into a
string.
#define _STRIZE_(x) STRIZE(x) // converts the pre-preocessed argument to a
string.
#define NUM -10
int main(int argc, ...)
{
puts(STRIZE(NUM));
puts(_STRIZE_(NUM));
puts(_STRIZE_(-NUM));
return 0;
}
Output:
--------
NUM
-10
--10
What can we learn from this?
This is how I see it:
=============
Case 1: STRIZE is passed a value x as NUM. The preprocessor allocates a
textual representation for x (*x* not NUM) as the value "NUM" (no quotes). I
just mean internally it's a string containing the word NUM. The macro says
#x which means to put quotations around this value so out pops "NUM" in the
source file. Case 1 explained.
Case 2: _STRIZE_ is passed x as NUM again. The preprocessor again creates a
textual representation for x, substituting it again as "NUM" (no quotes!).
Next, it must evaluate the macro but this time instead of #NUM it is given
STRIZE(NUM). At this point the preprocessor must decide what to pass as x
for STRIZE. So it is forced to try to expand NUM; to see if it already
exists as a macro (which it does) and that macro evaluates to -10 (again
textually, and internally). So the preprocessor then sets x to "-10" (no
quotes) passes that to STRIZE and finally evaluates #-10 which is "-10" in
the source file (BTW. There are NO spaces before the minus sign, you can
double check if you don't believe me).
Case 3: In this case the preprocessor is passed -NUM into _STRIZE_ so x has
the textual representation -NUM. Again, it must re-evaluate this to see if
NUM is already a macro (and boy, if it didn't do that, and treated -NUM as a
*single token*, then NO macros would work at all!). So the minus sign is not
part of the token, only NUM is evaluated which happens to be stored off
as -10 [without any meaning! it's just what NUM is when dereferenced from
some string table]. Hence the expression -NUM evaluates to --10 which is
then passed into STRIZE for x and it winds up as "--10" in the source file
as expected (NO SPACES again).
============
So now then, the question is (or should be) "why does this work"?
int main(int argc,...)
{
printf("%d", -NUM);
}
Well I passed this through cl /E (preprocess to stdout) and got this as the
result:
int main(int argc, ...)
{
...lots of space...
printf("%d", --10);
return 0;
...lots of space...
}
Again, no spaces after the first minus sign, which should clearly have
generated a compiler error. CLEARLY. The above is cut and pasted from my CMD
box: It is byte-for-byte what is handed to the compiler from the
preprocessor. So it appears that there is a bug in the compiler in that it
must have re-interpreted the preprocessed file!
However it looks to me as though it is only smart enough to re-pre-process 1
level deep where, in this case, it can "see" that it would be better to
throw in a space there (at compilation time) because that's probably what
the programmer intended. Thanks cl.exe I wanted an error!
So there's your bug. The compiler shouldn't be in the business of
second-guessing the programmer's intentions with macros.
- Alan Carre
Per ANSI C standard, there is NO re-pre-process (NO re-tokenization).
The compiler doesn't second-guess. It just follows the standard. May not
always be doing that, but in these cases it does.
By the way, I don't see what's the point in your stringizing example.
Operator # replaces the whole proprocessing token sequence for an argument
with a string.
"Alan Carre" <al...@twilightgames.com> wrote in message
news:eAOBdgWU...@TK2MSFTNGP04.phx.gbl...
I think you missed the Igor's point. You do the same logical
mistake that Tommy does. The preprocessor doesn't work with text
and doesn't output text. It works with the list of tokens, which
is prepared by previous stage of parsing. So, the representation
of the "printf("%d", -NUM);" statement is (in pseudocode):
// tokenization phase
typedef list<string> token_stream;
token_stream t;
t[0] = "printf";
t[1] = "(";
t[2] = "\"%d\"";
t[3] = ",";
t[4] = "-";
t[5] = "NUM";
t[6] = ")";
t[7] = ";";
// macros expansion phase
for(token_stream::iterator it = t.begin();
it != t.end();
++it)
{
if(*it == "NUM")
{
parse_result pr = tokenize(macros["NUM"]);
t.insert(it, pr.begin(), pr.end());
}
}
HTH
Alex
So then you would characterize such behaviour as "a bug".
> The compiler doesn't second-guess. It just follows the standard. May not
> always be doing that, but in these cases it does.
Let me show you again:
cl /E test.cpp :
line0: int main(int,...) {
line1: printf("%d", --10);
line2: return 0;
line3: }
Please explain why line1 compiles under the current standard. Is it
compliant, compileable standard ANSI C code?
> By the way, I don't see what's the point in your stringizing example.
> Operator # replaces the whole proprocessing token sequence for an argument
> with a string.
You don't see the point? I was trying to show that the preprocessor
evaluates MACRO arguments during the process of preprocessing. If a macro
argument is not re-used as an argument (or otherwise) in another macro then
it is NOT evaluated. That's why STRIZE(NUM) --> "NUM" and _STRIZE(NUM)_ -->
"-10". NUM gets evaluated (dereferenced) during preprocessing in the second
case, but NOT the first.
Know what I mean? If NUM appears in a macro it becomes -10, if it doesn't it
remains as NUM. Don't you think that might have some RELEVANCE here when
#define X -1
int main(int,...)
{
return -X;
}
compiles fine, but
#define X -1
#define Y -X
int main(int,....)
{
return Y;
}
doesn't?
- Alan Carre
Well that may be.
It may very well be the case, but when I *compile* the code, at least for
the STRIZE cases, I see no extra spaces being thrown in. It just spits out
exactly what the macro directs it to. Now perhaps there's some penultimate
step where it's all pulled together and sorted, checked for spelling
mistakes or something, but I have yet to see any evidence of that except for
this ONE instance where a macro argument was NOT involved in being evaluated
by the preprocessor due to it's being present within another macro.
That (secondary) behaviour, no matter who's fault it is, is exactly the same
as treating the last step differently than all the preceeding steps. So I
would label the last step/phase (?untokenizing?) of the process "buggy".
- Alan Carre
No, it doesn't spit out exactly. That's the bug that Igor talks
about. You cannot see any spaces because the result is in
compiler's memory in some sort of collection of tokens. However,
when this collection is outputed to a file, compiler erroneously
combines two separate tokens (that is, two minus signs) together,
as if it were one token (prefix decrement). In order to make it
correctly readable by a human reader the compiler should introduce
a whitespace in preprocessor output:
printf("%d", - -10);
so there won't be any confusion for humans.
Alex
Also known as "preprocessing token", as defined by C++ standard 2.4
> Case 2: _STRIZE_ is passed x as NUM again. The preprocessor again
> creates a textual representation for x, substituting it again as
> "NUM" (no quotes!). Next, it must evaluate the macro but this time
> instead of #NUM it is given STRIZE(NUM). At this point the
> preprocessor must decide what to pass as x for STRIZE. So it is
> forced to try to expand NUM; to see if it already exists as a macro
> (which it does) and that macro evaluates to -10 (again textually, and
> internally). So the preprocessor then sets x to "-10" (no quotes)
Not quite. The result of macro replacement is a sequence of tokens, not
a string. Macro NUM expands to a sequence of two tokens "-" and "10"
(without quotes). This sequence, not a string "-10", is then passed to
STRSIZE as a replacement list for parameter x.
> passes that to STRIZE and finally evaluates #-10 which is "-10" in
> the source file
Preprocessor evaluates #x in a way specified by C++ standard 16.3.2:
1 Each # preprocessing token in the replacement list for a function-like
macro shall be followed by a parameter as the next preprocessing token
in the replacement list.
2 If, in the replacement list, a parameter is immediately preceded by a
# preprocessing token, both are replaced by a single character string
literal preprocessing token that contains the spelling of the
preprocessing token sequence for the corresponding argument. Each
occurrence of white space between the argument’s preprocessing tokens
becomes a single space character in the character string literal. White
space before the first preprocessing token and after the last
preprocessing token comprising the argument is deleted. Otherwise, the
original spelling of each preprocessing token in the argument is
retained in the character string literal, except for special handling
for producing the spelling of string literals and character literals: a
\ character is inserted before each " and \ character of a character
literal or string literal (including the delimiting " characters). If
the replacement that results is not a valid character string literal,
the behavior is undefined. The order of evaluation of # and ## operators
is unspecified.
So, given #x, with x having a replacement list of two tokens "-" and
"10", preprocessor produces "-10" (a single string literal with each
token spelled out; no intervening whitespace since there was none in the
replacement list).
> Case 3: In this case the preprocessor is passed -NUM into _STRIZE_ so
> x has the textual representation -NUM.
To be precise, x has a replacement list consisting of two tokens, "-"
and "NUM".
> Again, it must re-evaluate
> this to see if NUM is already a macro (and boy, if it didn't do that,
> and treated -NUM as a *single token*, then NO macros would work at
> all!).
Quite.
> So the minus sign is not part of the token, only NUM is
> evaluated which happens to be stored off as -10
To be precise, NUM has a replacement list consisting of two tokens, "-"
and "10".
> [without any meaning! it's just what NUM is when dereferenced
> from some string table]. Hence the expression -NUM evaluates to --10
To be precise, -NUM expands to a sequence of three tokens, "-", "-" and
"10".
> which is then passed into STRIZE for x and it winds up as "--10" in
> the source file as expected (NO SPACES again).
Correct. Note that the distinction between two "-" tokens is preserved
all the way to # operator. It is this operator that reverts tokens back
to their textual spellings, in the process of building a string literal.
> So now then, the question is (or should be) "why does this work"?
>
> int main(int argc,...)
> {
> printf("%d", -NUM);
> }
To me, the question is: why do you think it shouldn't?
> Well I passed this through cl /E (preprocess to stdout) and got this
> as the result:
>
> int main(int argc, ...)
> {
> ...lots of space...
>
> printf("%d", --10);
> return 0;
>
> ...lots of space...
> }
This, I believe, is due to a compiler bug.
> Again, no spaces after the first minus sign, which should clearly have
> generated a compiler error. CLEARLY. The above is cut and pasted from
> my CMD box: It is byte-for-byte what is handed to the compiler from
> the preprocessor.
That's the misconception I'm trying to clear all this time. Normally,
what's handed to the compiler from the preprocessor is a stream of
_tokens_, not a stream of bytes.
Per C++ standard 2.1, a source file goes through nine phases of
thranslation. An implementation doesn't of course have to explicitly
perform each phase separately, but it has to behave as if it does. For
the purposes of this discussion, three phases are important:
Phase 3: The source file is decomposed into preprocessing tokens (2.4)
and sequences of white-space characters (including comments)...
Phase 4: Preprocessing directives are executed and macro invocations are
expanded...
Phase 7: White-space characters separating tokens are no longer
significant. Each preprocessing token is converted into a token (2.6).
The resulting tokens are syntactically and semantically analyzed and
translated...
So, phase 3 breaks the source file into a sequence of tokens. Phase 4
manipulates and transforms this sequence. Phase 7 actually parses this
sequence, recognizing language constructs. In this model, the source is
_not_ rendered back to a stream of characters between the preprocessor
(phase 4) and the compiler proper (phase 7).
Now, many compilers implement a "preprocess-only" mode whereby the
source file passes through translation phases 1-4 and then is written to
disk, rather than being sent to phase 5 and on. Remember - the output of
phase 4 is a sequence of tokens, but the file has to be written to disk
as a sequence of characters. If the intention is that the resulting file
be equivalent to the original file wrt the translation process, then
this conversion from tokens to characters has to be done in such a way
that the resulting file, after passing through translation phases 1-6
(of which phase 4 is presumably a no-op since there are no preprocessing
directives left), produces the same sequence of tokens as the original
file would after phase 6 (remember that, by phase 7, whitespace between
tokens is no longer significant).
To do this right, the tokens-to-characters converter would have to
introduce a space here and there. E.g. it would have to output a
sequence of two tokens ("-", "-") as "- -" and not as "--". Otherwise,
when the preprocessed file passes through phase 3 again, "--" will be
interpreted as a single token, thus violating the "round-trip"
requirement stated above. MSVC compiler gets it wrong.
Please read again: [Q]When I COMPILE THE CODE[/Q]. Not pre-compile. That is
I compile, and then print out MACRO(-NUM) and get "--10" as the output of
the PE/Win32.EXE program resulting from the pre-processed source.
>
> printf("%d", --10);
>
This (above) printout was not *compiled* code. I hypothesized that the
compiler must be doing something after recieving this from the preprocessor,
but perhaps only 1 level deep. I guessed that the fact that the "apparent"
pre-compiled source is illegal and that it subsequently compiled must be the
result of a compiler bug.
That was a guess. The compiled and executed code is an irrefutable fact.
- Alan Carre
Yes. The question is - which part is buggy: that your original example
successfully compilers, or that it fails to compile after being
preprocessed. You believe the original example shouldn't have compiled
in the first place, and that's where the bug lies. I claim that the
original example is valid, but the preprocessor output is incorrect, and
that's where the bug lies.
> Let me show you again:
> cl /E test.cpp :
>
> line0: int main(int,...) {
> line1: printf("%d", --10);
> line2: return 0;
> line3: }
>
> Please explain why line1 compiles under the current standard.
It doesn't, but -NUM does. The output of stand-alone preprocessor
doesn't match the original code. The bug manifested when said
preprocessor generated "--10" from "-NUM".
> #define X -1
> int main(int,...)
> {
> return -X;
> }
>
> compiles fine, but
>
> #define X -1
> #define Y -X
>
> int main(int,....)
> {
> return Y;
> }
>
> doesn't?
With a conforming compiler such as Comeau, both programs compile.
You yourself show the compiler contradicting itself (the same code
compiles directly, but fails to compile if preprocessed first), and
agree there's a bug somewhere. You cannot then proceed to hold the
behavior of a known-buggy compiler as proof of any facts about the C++
language. You have to argue from first principles - in this case, from
the normative text of the C++ standard. Whenever the compiler disagrees
with the standard, it is, by definition, a bug in the compiler.
>lots and lots and lots of information which I can hardly (pre)process
>myself...
Ok well, then so it seperates everything into these tokens and then pulls
them out (or pops them off or whatever) as seperate entities which
presumably are all seperable by spaces correct? At least by stage 7:
"White-space characters separating tokens are no longer significant."
.. You know, actually I find that a little difficult to comprehend: We all
know that white-space characters are significant in the case of consecutive
minus-sign tokens. So how can that be? How can it be that there exists a
point where we can eliminate all whitespace without changing the program? I
mean what about *pnNum1/ *pnNumber ? I can't remove that space following the
division symbol, the whole program would become a comment...
In any case, what if one's macro was, in fact, "--X" (ie. intentionally
decrementing X)? Should the preprocessor seperate those minus signs as well?
How come they are treated differently? Is the preprocessor aware of C++
syntax? Is it actually compiling when it's decifering macro expressions?
Personally, I think instead of learning all these so-called rules and
standards and so on, I'll just let the compiler teach me what the compiler
does. For one thing I KNOW it converts #define -X to --10. That's fact, end
of story. These rules and standards and lofty committees etc etc... are just
pissing in the wind.
The ultimate arbiter is the compiler.
- Alan Carre
Oh boy, I think you've started another holy war subthread right
now. o_O
Alex
Yes, stringizing operator # builds a string literal by concatenating
textual representations of each token without any additional whitespace
(but sometimes introducing backslashes). I'm not sure what this fact is
supposed to prove though.
> Now perhaps there's
> some penultimate step where it's all pulled together and sorted,
> checked for spelling mistakes or something, but I have yet to see any
> evidence of that except for this ONE instance where a macro argument
> was NOT involved in being evaluated by the preprocessor due to it's
> being present within another macro.
That's because MSVC compiler has a bug when dealing with a macro that IS
present within another macro. You arbitrarily assume this behavior to be
correct, then declare the opposite case to be buggy, since of course
they can't both be correct at the same time. You got them the wrong way
round.
I'm not asserting any facts about the C++ language. I was surprised
when -NUM compiled w/o being pre-processed since by text-replacement it
should not. Now I learn that it's not text replacement it's token-streams
and some kind of intelligent sorting of operator symbols from their
neighbors. I'm led to believe that a macro such as NUM should be treated as
if it were an actual *integer number* by the pre-processor *as well as* the
compiler (if that isn't idiotic duplication I don't know what is).
I don't know if I want a "smart" pre-processor. I want it to do EXACTLY what
I tell it to do. If I make a mistake, I want an error to come out. I expect
errors to come out strive to go out of my way to make sure that any missteps
I make in code are caught *at compile time* and not say, 2 weeks after
release.
Also, I do believe the compiler is in error for "accepting" -NUM, not for
rejecting the preprocessor's interpretation (meaning no interpretation at
all - which is correct as far as I'm concerned).
- Alan Carre
This only proves that # operator doesn't introduce additional whitespace
when building a string literal from a sequence of tokens - which is of
course quite expected. I never argued that # operator does, or should,
introduce whitespace into a string literal it generates.
> Personally, I think instead of learning all these so-called rules and
> standards and so on, I'll just let the compiler teach me what the compiler
> does. For one thing I KNOW it converts #define -X to --10. That's fact, end
> of story. These rules and standards and lofty committees etc etc... are just
> pissing in the wind.
>
> The ultimate arbiter is the compiler.
Good points Alan.
And I might add how you written a solution.
What comes to my mind, and maybe because I've come across this in so
many different ways in my engineering experiences, when it comes to
using a negative entity, along with a formula that includes a
negation, when in essence, the operation is an evaluation of
concatenating string and/or tokens, then inevitably it can yield
results or behavior you do not expect.
C or C++ or any other language is not the issue - it is how the
programmer is using the tool in a wrong way. This is bad:
#define X -10
#define Y -6-X
period. An "aware" programmer will know or come to know that is a bad
practice, in any language, regardless if the issue here is specific to
languages that supports decrement operators.
In other words, people learn that you need to add delimiters in order
to be concise or even possibly change the formula to
#define Y -1*X-6
The same is true when one might use DeMorgan's Theorem to reduce
logical operations into simpler ones for clarity or for optimization
or to CPU work better in its more natural state! Remember the chants,
truth tables?
NAND is equal to NEGATED INPUT OR
!(A && B) == (!A || !B)
And we all know the big calculator (AKA com-putt-er) is just a bunch
of AND and NAND gates. ORs and NORs are manufactured logic.
Any who, not unexpected, Javascript has the same issue:
<script type='text/javascript'>
function f(tag, x,y)
{
alert(tag+"\n\nx: "+x+"\ny: "+y);
}
var SHRX = "-10";
var SHRY = "-20";
var SHRINK = "f('eval',-6-SHRX,6-SHRY)";
eval(SHRINK);
f('direct',-6-SHRX,6-SHRY);
f('direct',-6--10,6-SHRY); //<--- EXCEPTION ERROR
f('direct',-6- -10,6-SHRY);
</script>
Any language that allows for tokens to be concatenated and evaluated
all have the same related issues which doesn't mean the compiler or
interpreter is faulty, but rather the programmer created a situation
that can raised a conflictive result. This was one of them.
--
At phase 7 there are no comments anymore. They're all gone at phase 3, where
comments are replaced by whitespaces. At phase 7, the compiler doesn't pay
attention to '/' '*' character sequence at all (there is no character
sequences, anyway, at that time, the tokens may not even contain those
characters in their internal representation).
You're confused because you think of the compiler always operating on
character stream. It only does that on earlier phases.
I don't get where it should not. In your stringizing operator example? Why?
I wasn't referring to that example. I was referring to the
"printf("%d", -NUM);" example (which compiled). Sorry for any confusion.
- Alan Carre
Yes, you could take a sequence of tokens generated after phase 6, and
output them to a file separated by spaces, if you are so inclined. When
the resulting file is translated, after phase 6 the same sequence of
tokens would be produced as that from the original file.
> .. You know, actually I find that a little difficult to comprehend:
> We all know that white-space characters are significant in the case
> of consecutive minus-sign tokens.
And that's precisely why, in the process above, you would add a space
between two - tokens, so that they don't accidentally get mistaken for a
single -- token when the file is re-translated. It doesn't matter by
that point whether the two - tokens were explicitly present in the
source file (in which case they would necessarily have been separated by
whitespace) or were produced by the preprocessor's manipulation of the
token stream (in which case they might not be).
> So how can that be? How can it be
> that there exists a point where we can eliminate all whitespace
> without changing the program?
Roughly, by that time, the program is not represented by a single
string, but by an array of strings, where each element stands for a
single token. Consider:
char* sequence1[] = ["-", "-"];
char* sequence2[] = ["--"];
The two sequences are obviously different, when compared
element-by-element. But if you were to concatenate their elements into a
single string, you would end up with the same string. And then it would
be impossible to tell which sequence this string was originally produced
from.
The compiler proper (phase 7 onward) works on this array of strings,
detecting sequences of tokens that form various language constructs.
> I mean what about *pnNum1/ *pnNumber ?
> I can't remove that space following the division symbol, the whole
> program would become a comment...
Phase 3 (tokenizer) turns the sequence of characters "*pnNum1/
*pnNumber" into the sequence of tokens ["*", "pnNum1", "/", ws, "*",
"pnNumber"]. "ws" here stands for a special whitespace token: phase 3
needs to preserve it only because the sequence might be passed to #
operator in phase 4 (preprocessor), where whitespace is still somewhat
significant (it generates a single space character in the resulting
string literal). After phase 6, ws tokens are dropped, and the compiler
(phase 7) sees only ["*", "pnNum1", "/", "*", "pnNumber"].
> In any case, what if one's macro was, in fact, "--X" (ie.
> intentionally decrementing X)?
You mean, with "--" actually appearing in the source text? It would be
parsed as a single token in phase 3, and would then travel as a single
token through the rest of the process. Phase 3 doesn't know nor care
whether the token is or isn't part of a macro.
> Should the preprocessor seperate those
> minus signs as well?
Of course not. After phase 3, the preprocessor (phase 4) sees a sequence
of two tokens ["--", "X"]. It's possible that X is a macro, which for
example expands into a sequence of tokens ["-", "10"]. Then, after
preprocessing, the resulting sequence becomes ["--", "-", "10"], which
is fed to the compiler (phase 7).
> How come they are treated differently?
In what way do you believe they are treated differently?
> Is the
> preprocessor aware of C++ syntax?
No (unless you count preprocessing directives as part of the C++ syntax,
which formally they are, but I understand what you are trying to say
here).
> Is it actually compiling when it's
> decifering macro expressions?
No, in the sense that it doesn't parse C++ language constructs other
than preprocessing directives.
> The ultimate arbiter is the compiler.
I wonder what you mean by _the_ compiler. Which implementation, and
which version of that implementation, run with which command line
options, do you hold up as the ultimate arbiter?
Is there such a thing as a "compiler bug"? What happens when two
different compilers contradict each other, by producing different
results when given the same source - which one is more ultimate than the
other? Heck, what happens when the same compiler contradicts itself (as
was shown in this thread when the same source file either passes or
fails compilation when processed by the same compiler in two different
ways)?
It's like saying that the ultimate arbiter of what an electrical plug
should look like is the electrical outlet in your wall. If you have two
of them, slightly different, which one is defective? If a particular
plug doesn't fit the outlet, why would you automatically assume the plug
is defective, and not the outlet? That's what we have standards for.
The C++ language is what the C++ standard says it is. A C++ compiler
that doesn't follow the C++ standard is called non-conforming, which is
just a eupemism for "buggy". The way you demonstrate compiler bugs is
you produce an example program, figure out from the normative language
of the standard how it should behave, then observe that it behaves
differently when processed by a particular compiler.
> The C++ language is what the C++ standard says it is. A C++ compiler
> that doesn't follow the C++ standard is called non-conforming, which is
> just a eupemism for "buggy".
I see your message vanity lines cite an RFC document. If every
implementator followed your line of thought (which they don't), every
MAIL PRODUCT in the market would be in some way or another
"non-conforming", broken, and "buggy."
The only thing of interest shown in this thread, like in other similar
threads, is that poor, ambiguous coding can confuse compilers,
interpreters and/or results.
So what else is new?
--
You don't? Didn't you claim, in essence, that this program shouldn't
compile ("Thanks cl.exe I wanted an error!"):
#define X -1
int main()
{
return -X;
}
> I don't know if I want a "smart" pre-processor. I want it to do
> EXACTLY what I tell it to do.
And it does. It's just that you are telling it something different than
what you think you do. See also http://en.wikipedia.org/wiki/DWIM
> If I make a mistake, I want an error to
> come out.
I'd love to live in such a world, too. Just to think, I'd never have to
touch a debugger again! Heck, if the compiler knows enough to tell me
whenever I make a mistake, why can't it just write the damn program for
me?
Upon further consideration, no, I don't want to live in such a world. My
employment prospects would be rather grim.
> Also, I do believe the compiler is in error for "accepting" -NUM, not
> for rejecting the preprocessor's interpretation
You believe incorrectly.
And also:
http://www.unt.edu/benchmarks/archives/2004/february04/screwupcolor.gif
:)
Alex
"Tommy" <b...@reallybad.com> wrote in message
news:ekG2F$YUJHA...@TK2MSFTNGP05.phx.gbl...
First, If an RFC or a STD was published TODAY as an update, that does
not BREAK existing systems. In fact, many do disagree with an update
and simply ignore it. That does not make them "BUGGY." Case in
point, The SMTP STD standard is 26 years old. There is NO UPDATED
STD, just a 8 years old RFC. Legacy software supporting only the STD
still work and not supporting what is considered the pseudo-standard
RFC does not make them buggy and non-conforming.
Second, if you guys reference a DRAFT C/C++ standard document that is
analogous to an RFC.
So for one to continue to suggest that a long existing compilers are
magically BUGGY because it may not follow verbatim an ever evolving
standard or draft today is just plain silly, unrealistic.
--
New versions of C and C++ standards work hard not to render existing
programs invalid (though it does happen).
Specific versions of C and C++ compilers typically claim conformance
with a specific version of the standard. Thus, when new version of the
standard is released, existing compilers remain conformant with the
previous version (or at least, as conformant as they ever were).
Compiler authors then proceed to work on making future versions of their
compilers conformant with the new version of the standard.
> In fact, many do disagree with an update
> and simply ignore it.
Maybe this is common with mail systems, but not with C++ compilers. All
major compiler vendors are thoroughly represented when the new standard
is drafted. All decisions are made by consensus. The fact that the
standard is released at all means that all interested parties have
signed off on it. It's not like some external force tries to foist the
new text on unsuspecting population.
> That does not make them "BUGGY." Case in
> point, The SMTP STD standard is 26 years old. There is NO UPDATED
> STD, just a 8 years old RFC. Legacy software supporting only the STD
> still work and not supporting what is considered the pseudo-standard
> RFC does not make them buggy and non-conforming.
By definition, they are non-conforming with said RFC, aren't they?
You seem to be talking about RFC 2821. What new features does it add
compared to the previous RFCs or standards? I can't help but notice that
it begins with the phrase "[This document] consolidates, updates and
clarifies, but doesn't add new or change existing functionality...".
> Second, if you guys reference a DRAFT C/C++ standard document that is
> analogous to an RFC.
Everything I said in this thread is true against ISO/IEC 14882:1998 aka
C++98, as well as the C++0x draft (which didn't change the description
of the phases of translation in any significant way, if at all; I
haven't compared the two texts character by character).
> So for one to continue to suggest that a long existing compilers are
> magically BUGGY because it may not follow verbatim an ever evolving
> standard or draft today is just plain silly, unrealistic.
A compiler is buggy when it doesn't follow the standard it claims to
conform to.
I believed (and you confirmed) that if --X were passed into, or used within
another macro, that it would be treated as either a single or 2 tokens
rather than a sequence of 3 tokens (X being a macro itself like NUM was: so
it would be --NUM for instance using the previous notation).
I was initially being given the impression that the expression -NUM was
split into 2 tokens and NUM would be looked up and expanded where necesary
during the precompile phase (somewhere before or during phase 4 I think it
was). NUM was -10 which you asserted was to be split up (yet again) into 2
tokens "-" and "10".
So the resuting expression from -NUM was 3 tokens
["-", "-", "10"]
not
["-", "-10"].
Quoting from your response:
[Alan]
>> [without any meaning! it's just what NUM is when dereferenced
>> from some string table]. Hence the expression -NUM evaluates to --10
[Igor]
>To be precise, -NUM expands to a sequence of three tokens, "-", "-" and
>"10".
So if the macro were
#define NUM 10
#define Y --NUM
then by the same reasoning, we should get 3 tokens ["-", "-", "10"] after
resolving NUM to 10.
You can't have it both ways; if -10 (literal) splits to "-" and "10",
then --10 splits to 3 tokens since the "literal-ness" of -10 was discarded
in favor of ["-" and "10"] how could it be different? But you assert
that --NUM would be resolved to 2 tokens ["--", "10"], which is the
difference I was referring to.
>> The ultimate arbiter is the compiler.
>
> I wonder what you mean by _the_ compiler. Which implementation, and which
> version of that implementation, run with which command line options, do
> you hold up as the ultimate arbiter?
It's analogous to what you call _the_ standard. It's an ever-evolving
construct with built-in rules. It may not be perfect, it may contain
contradictions and so on, but in the end it (that is, whatever compiler you
happen to be using today) has the final say.
- Alan Carre
Then you'd find yourself happy on planet earth and in a real-life, highly
sophisticated software development environment.
You know I have a friend who works for a very respectable (and very
successful) investement firm [name omitted] writing software (in C++ using
VC2005) that, not surprisingly, computes the various dollar-values of
different forms of risk in real-time. This software is highly optimized, and
highly complex and also (obviously) needs to be *highly reliable* as the
firm's future depends *entirely* on that one crucial element (risk
analysis).
Well I can share with you something thing about working in "the real world";
a world where even the slightest errors have "real outcomes", such as the
total wreakage of the institution you're working for:
The average time between "RUNS" of their software is between 6 to 8 MONTHS.
That means that the only help you have in detecting bugs is, in fact, the
compiler. I mean, you can't just "run the program" and test that it works
because it requires the whole world's financial system essentially as a
command line parameter. And unfortunately you can't reliably simulate the
entire world. You may be thinking "use records from previous years etc..."
but no such detailed records, in fact, exist. What you might not realize is
that these ridiculously-complex systems work in time-frames of *pico
seconds*, not hours, not even as long seconds (a second would be essentially
infinite time in the banking world).
So how do they debug their code if they can't run it? You guessed it, the
COMPILER. The code is mostly written in "meta-code" style with built-in
"template-assertions" that alert the programmers to bugs without even having
to execute a single line of code. This is no fairy tale. When you hit
compile, if a varialble happens to have the wrong sign or type, or if a
branch condition happens to be incorrect, they get *compilation* errors.
They are forced to code this way because they are not afforded the luxury of
running and debugging the code "on a whim". That magical occasion (ie.
running/debugging "live code") only happens about twice a year.
So yes, wherever I can, I design my code such that when I make errors they
are automatically caught at compile time. If I think I might make a mistake
due to some implicit type conversion say, then I define that conversion as
"explicit". That's not magic, that's common sense. If I'm worried that I
might accidentally assign a B to an A (where such a conversion happens to be
legal) but I know that 99.9999% of the time that's a mistake, then I declare
A::operator= (const B&); and then deliberately *not implement it*. Then I
get a linker error when I make that mistake.
That's automatic bug detection at compile time. It's not magic. And the
compiler isn't doing it, I am: I'm FORCING the compiler to do it for me.
- Alan Carre
Sorry to be pedantic, but as per the C++ standard the use
of identifiers starting with an underscore followed by an
upper-case letter is reserved for the implementation. You
must not use them.
> [...]
> - Alan Carre
Schobi
I am very sorry, but as someone who has, during the last decade,
written software to be compiled by half a dozen compilers (or
compiler versions) and standard lib implementations (or versions
thereof) I just have to ask the obvious question:
_Which_ compiler?
You are free to ponder the implications of this questions given
the background I provided.
> - Alan Carre
Schobi
--
- Alan Carre
http://www.twilightgames.com
"Hendrik Schober" <spam...@gmx.de> wrote in message
news:ggre2u$ab4$4...@hoshi.visyn.net...
That's alright, I forgive you.
- Alan Carre
Forgiven again.
In my case VC8 or to be more specific:
===========================================
File= cl.exe
CompanyName= Microsoft Corporation
FileDescription= Microsoft« C/C++ Compiler Driver
FileVersion= 14.00.50727.762 (SP.050727-7600)
InternalName= CL.EXE
LegalCopyright= ? Microsoft Corporation. All rights reserved.
LegalTrademarks=
OriginalFilename= CL.EXE
ProductName= Microsoft« Visual Studio« 2005
ProductVersion= 8.00.50727.762
PrivateBuild=
SpecialBuild=
===========================================
- Alan Carre
NUM was already 2 tokens:
"-", "10"
> So the resuting expression from -NUM was 3 tokens
>
> ["-", "-", "10"]
>
> not
>
> ["-", "-10"].
>
> Quoting from your response:
>
> [Alan]
>>> [without any meaning! it's just what NUM is when dereferenced
>>> from some string table]. Hence the expression -NUM evaluates to --10
NUM is NOT dereferenced, and -NUM is NOT an expression, it's NOT evaluated.
When a token "NUM" is found in the original sequence of tokens, it's
REPLACED with its sequence of tokens ["-", "10"].
> [Igor]
>>To be precise, -NUM expands to a sequence of three tokens, "-", "-" and
>>"10".
>
> So if the macro were
>
> #define NUM 10
> #define Y --NUM
>
> then by the same reasoning, we should get 3 tokens ["-", "-", "10"] after
> resolving NUM to 10.
>
> You can't have it both ways; if -10 (literal) splits to "-" and "10",
> then --10 splits to 3 tokens since the "literal-ness" of -10 was discarded
> in favor of ["-" and "10"] how could it be different? But you assert
> that --NUM would be resolved to 2 tokens ["--", "10"], which is the
> difference I was referring to.
>
Y corresponds to a sequence of tokens ["--", "NUM"]. You don't get 3 tokens
here.
When "--" is found in the original source, it's one token.
Implementators are known to Pick and Choose whatever fits the bill.
Users are known to be anal with specifics. I'll explain this more in
the end.
> Thus, when new version of the
> standard is released, existing compilers remain conformant with the
> previous version (or at least, as conformant as they ever were).
Right, implementators are known to Pick and Choose whatever fits the bill.
> Compiler authors then proceed to work on making future versions of their
> compilers conformant with the new version of the standard.
Implementators MAY|SHOULD proceed to PICK and CHOOSE whatever fits the
bill.
>> In fact, many do disagree with an update
>> and simply ignore it.
>
> Maybe this is common with mail systems, but not with C++ compilers. All
> major compiler vendors are thoroughly represented when the new standard
> is drafted.
Thus leaving out all minors ones?
> All decisions are made by consensus.
Minor or Major?
Here's food for thought:
Isn't the lexical analysis PERFECT, Is not the state machine perfect?
Therefore, there should be 100% consensus?
> The fact that the
> standard is released at all means that all interested parties have
> signed off on it. It's not like some external force tries to foist the
> new text on unsuspecting population.
Of course it is, Osmosis By Concensus as I always called it. The fact
that there could disagreements that may not suit the agenda of the
powers to be, does show the consensus is not gospel or speaks for all.
The reality is that the those involved in the process DO NOT always
appeal to all, nor all END USERS, including the fact, they may be too
involved to have created a "mental block" on what may end up being a
bad mistake for the majority.
And I've been long involved in such groups to know it is usually just
1 or 2 champions and others that have blind trust and sign off on
thing they generally don't or may not fully comprehend. Most of the
time, that is ok, the world has to move on, but it can lead to
conflicts and problems and it does.
> By definition, they are non-conforming with said RFC, aren't they?
It depends. The reality is that most implementators pick and choice,
and it is a known understanding that USERS are the ones that get hung
up on a specific document as if its their proof of righteousness to
throw against any implementator.
> You seem to be talking about RFC 2821.
The update, RFC 5381, was just released.
http://tools.ietf.org/html/rfc5321
So now anyone not following this is broken? <g> There were one or two
things that WILL break a system that read the original as it was
UNDERSTOOD.
(BTW, I'm in the acknowledgments, by real name <g>).
> What new features does it add
> compared to the previous RFCs or standards?
Too long to list, but I will give you a few.
> I can't help but notice that
> it begins with the phrase "[This document] consolidates, updates and
> clarifies, but doesn't add new or change existing functionality...".
Well, thats the intent. That fact is it has added a few new stuffs
and has change existing functionality by codify existing practice. In
other words, if the old doc said X, but the practice (by larger
interested parties) has been X1, then you may find the push to
document X1. While X1 should be ok with X, sometimes it may not.
Reality? It depends on whose software it breaks!! <g>
Specific to 2821/5321
A common dispute that you might relate to are ambiguous ABNF
statements - the "Lexical syntax" of the protocol.
A more practical one that can and does have an affect today on all
parties (implementators, users, operators, network) is the QUIT
command requirement with an direct correlation to the advent of higher
speeds, Anti-Spam operations and higher scalability needs. You see, I
don't expect it to see all this, but a QUIT is an RFC requirement to
complete a transaction. The original STD and an informational RFC
speaks of have a relaxed attitude about it. There were a few who did
envision that one day this could be an implementation problem.
Another that comes to mind (because it was a recent hot topic) is the
meaning or semantics of Negative vs Permanent rejection, and how many
*bigger* systems changed the meaning of *Permanent* to fit their bill
(mode of business operation). This is akin to not following the
"phases" of the C++ parsing. This one was a clear example of how a
standard protocol was clear and concise yet not 100% followed by all
parties the same way.
Overall, the most damaging one which has some engineering analogy to
the single or multi-pass compiler, is based on what I call the
Philosophical Differences between Post vs Dynamic SMTP operations.
The simply way to put this is:
A Dynamic system (Single Pass) will check everything as it is issued,
command by command, "parsing" (the context) of the command as it is
issued. It fails on the spot. It does not allow you to proceed to the
next state.
A POST system (Multiple Pass) will accept everything first, then
process it.
A Post SMTP is one of the major reasons we have such a major spam
control problem. A dynamic is more expensive mode.
Now, you may retort, "But C++ is more concise" and for in large, I
will agree. But in the same way the state machine for the parser is
concise, the SMTP state machine is also concise and there are those
will not except it any other way.
It would be wonderful if everyone follow the same rules. The reality
it is not like that, and unfortunately, you will have Users who will
throw a book at someone when they see something that does not make
their world perfect.
In short, the question can be:
Can you take a standard documents and build the perfect
implementation?
Don't be surprise if you find that it isn't 100% perfect.
>> Second, if you guys reference a DRAFT C/C++ standard document that is
>> analogous to an RFC.
>
> Everything I said in this thread is true against ISO/IEC 14882:1998 aka
> C++98, as well as the C++0x draft (which didn't change the description
> of the phases of translation in any significant way, if at all; I
> haven't compared the two texts character by character).
Well, as long as one notices the redlining and understand the
background that C++ is an augmentation of the C language, then I
personally have no issue with that.
The issue I have is an insistence of the specific "parsing protocol"
and I am not entirely convinced it has been all read correctly in its
total context.
But either way, thats ok too, because I know from experience that
implementations do tend to differ and not always because they are buggy.
FWIW, in this case, I happen to agree with you, the parser should
view the macro as a separate token and therefore safe, IN THIS CASE,
to embed a white space. But overall, I think it will cause more harm
than good, and it would be better if the programmer simply made his
macros more concise for the parser to handle.
>> So for one to continue to suggest that a long existing compilers are
>> magically BUGGY because it may not follow verbatim an ever evolving
>> standard or draft today is just plain silly, unrealistic.
>
> A compiler is buggy when it doesn't follow the standard it claims to
> conform to.
Are there any aspects of the C++ standard that are NOT implemented?
Anyway, this is typical behavior of USERS of a system. Users are known
to be anal with specifics, they appeal to docs as it is the "bible" to
an one way or the high way approach towards protocol implementation.
They are the ones that general throw in your face,
"look, its buggy because of this standard section XYZ"
without really understanding all angles about it. The question is
then does it specifically state in some form or another:
"Implementators of this C++ standard MUST follow
the parsing protocol 100%"
I will be surprise if it was there, because as I said, there is far
too many implementators that MAY or MAY NOT follow it for whatever
reason it is. At best, the C++ standard can only serve as a
guideline for implementators to use, to provide an INTENT in order to
make the meaning correct.
At the end of the day, it was about how a parser took a string "-X"
and created tokens.
I know as a fact based on compiler and interpretator creation and
seeing how others, and how other languages work, not just C/C++ but
many others, that it could see that as TWO tokens
"-" "X"
or it can do MACRO substitution FIRST and see a string:
"--10"
and see two different tokens now:
"--" "10"
There are just far too many factors to all this. Appealing to a
STANDARD does not always make it broken. What you (speaking in
general) think is correct may or may not be viewed the same way by others.
--
Nah. Doubtful.
It was ["-10"], a literal constant according to all experiments, with ONE
exception:
Namely when -NUM was used *directly* in source where the preprocessor
interpreted NUM as being the numeric constant -10 instead of replacing NUM
with the "source code" string "-10".
> Y corresponds to a sequence of tokens ["--", "NUM"]. You don't get 3
> tokens here.
>
> When "--" is found in the original source, it's one token.
WOW! Beat me to it... I was just about to pre-empt this obvious reply
(because I knew it was coming) with the following:
If, as you claim, "--" (quoteless) is considered to be ONE token (as with
"-" (quoteless)), then how would the preprocessor interpret this one:
EXPRESSION: "---" (quoteless)
Should it be: ["-","--"] or ["--", "-"] or else ["-","-","-"] ?
Is it the negation of the decrement? Or the decrement of the negation, or
the negation of the negation of the negation?
All of these are possible interpretations of "---" (again quoteless).
Certainly you aren't going to claim that "---" is yet another
specially-recognized token are you?
So which is it?
No, what seems abundantly clear is that the preprocessor isn't doing any
math, it's simply replacing string "tokens" with their corresponding
definitions. Tokens are seperated by "delimitors" such as SPACE (' ') and
other accepted delimitors such as the minus sign ('-'). Brackets, space, + -
slash, comma, basically anything that's not a "csym" seems to serve as a
proper delimitor with the exception of the hash symbol (and not by
coincidence! that's what the preprocessor uses as it's own "special escape
character").
Anyway, either the preprocessor knows C++ and does math or it's a
strtok(er)/srep(er). And I seriously doubt it does any "token" mathematics
[though will do some basic algebra on numeric constants such as 10/2 but
that's about the extent of it from my experience].
- Alan Carre
Typo: "basic arithmetic" not "basic algebra".
- Alan Carre
> Another that comes to mind (because it was a recent hot topic) is the
> meaning or semantics of Negative vs Permanent rejection, and how many
> *bigger* systems changed the meaning of *Permanent* to fit their bill
> (mode of business operation).
Sorry, meant Temporary vs Permanent rejection.
45x - Temporary rejection, try later
55x - Permanent rejection, GO AWAY!
==
>> Could they have used enums for this kind of thing in winnt.h (line 8060)
>>
>> #define FILE_READ_EA ( 0x0008 ) // file & directory
>> #define FILE_WRITE_EA ( 0x0010 ) // file & directory
>> #define FILE_EXECUTE ( 0x0020 ) // file
>> #define FILE_TRAVERSE ( 0x0020 ) // directory
>> #define FILE_DELETE_CHILD ( 0x0040 ) // directory
>
> Yes (I've done it) but I don't recommend it. It is fairly painful
> using enums for bit fields
OK, thanks, that makes sense.
> <tantrum>
> I hate hate hate
>
> if (x) {
> //...
> }
> when it should obviously be
>
> if( x )
> {
> //...
> }
> </tantrum>
Hmm, I've just been looking at my coding style, and I wonder if I'm
doing back to front? This example is JScript, but I use similar style in
C/C++:
var s4byte1 = oSrcTS.Read(0x4); // 0x26 to 0x29
var s4byte2 = oSrcTS.Read(0x4); // 0x2A to 0x2D
// Search each DWORD for any mis-matched bytes
var bMatch = true;
for (var i = 0; i < 4; i++) {
if (s4byte1.charCodeAt(i) != s4byte2.charCodeAt(i)) {
bMatch = false;
break;
}
}
if (bMatch) {
// All bytes match, but are they all zero?
if (s4byte1 == "\0\0\0\0" && s4byte2 == "\0\0\0\0") {
return 0;
} else {
return 1;
}
} else {
return 2;
}
--
Gerry Hickman (London UK)
Minor compiler vendors are free to join if they are so inclined, and so
is anyone else willing to commit time and effort - you don't need to be
a compiler vendor to participate in the committee. I know that major
vendors _are_ represented. I couldn't say "all vendors are represented",
as that would have required me to interview every person not
represented, to make sure they are not writing a new C++ compiler in
their copious spare time.
>> All decisions are made by consensus.
>
> Minor or Major?
By consensus of everyone who chose to join the committee.
> Here's food for thought:
>
> Isn't the lexical analysis PERFECT, Is not the state machine perfect?
Define "PERFECT".
> Therefore, there should be 100% consensus?
Are you suggesting this is not the case? The description of the lexical
analysis hasn't changed between the two versions of the standard.
Presumably, everyone agrees it's fine as is.
>> The fact that the
>> standard is released at all means that all interested parties have
>> signed off on it. It's not like some external force tries to foist
>> the new text on unsuspecting population.
>
> Of course it is, Osmosis By Concensus as I always called it. The fact
> that there could disagreements that may not suit the agenda of the
> powers to be, does show the consensus is not gospel or speaks for all.
Who do you perceive to hold these mystical "powers to be", whatever that
means? And what do you believe their secret agenda is?You begin to sound
like a conspiracy theorist. I'm pretty sure they are not out to get you.
> The reality is that the those involved in the process DO NOT always
> appeal to all, nor all END USERS, including the fact, they may be too
> involved to have created a "mental block" on what may end up being a
> bad mistake for the majority.
Those involved will have to sell their products to said end users. Their
bottom line is firmly in their minds during the discussions.
Besides, many of those end users are themselves on the committee. I know
the company I work for is represented, though we don't produce C++
compilers. We have a very large and growing C++ codebase, and we are
there to make sure that C++ language meets our needs.
> And I've been long involved in such groups to know it is usually just
> 1 or 2 champions and others that have blind trust and sign off on
> thing they generally don't or may not fully comprehend.
I assure you that our company has very smart and capable people
representing it on the committee. I personally have read many of the
proposals, and I believe I comprehend them (I'm not personally involved
in the official business of the committee). You seem to be accusing
people you don't even know of being stupid, or naive, or lazy. Do you
have any evidence for such accusations?
>> You seem to be talking about RFC 2821.
>
> The update, RFC 5381, was just released.
In year 2008. You were talking about an RFC that is 8 years old.
> So now anyone not following this is broken? <g>
Anyone who claims to follow it but doesn't is, by definition, broken.
> There were one or two
> things that WILL break a system that read the original as it was
> UNDERSTOOD.
I'm sure the participants involved have deemed the (presumably
considerable) benefits of these changes to outweigh the drawbacks, and
have carefully considered the migration process and compatibility
issues. At least that's what would have happened in the C++
standardization committee. I have no evidence suggesting that some
mysterios "powers that be" have subverted the process to serve their
evil agenda, and unless and until such evidence surfaces, I assume that
all parties are acting in good faith. I'm not sufficiently familiar with
SMTP or its development procedures to comment further.
>>> Second, if you guys reference a DRAFT C/C++ standard document that
>>> is analogous to an RFC.
>>
>> Everything I said in this thread is true against ISO/IEC 14882:1998
>> aka C++98, as well as the C++0x draft (which didn't change the
>> description of the phases of translation in any significant way, if
>> at all; I haven't compared the two texts character by character).
>
> Well, as long as one notices the redlining and understand the
> background that C++ is an augmentation of the C language, then I
> personally have no issue with that.
ISO/IEC 9899:1999 aka C99 prescribes the same phases of translation, in
substantially the same normative language. I wouldn't be surprised if
the text hasn't changed since C89, but I don't have a copy handy to
check.
> The issue I have is an insistence of the specific "parsing protocol"
> and I am not entirely convinced it has been all read correctly in its
> total context.
Then please feel free to point out specifically where you believe my
reading is incorrect.
> But either way, thats ok too, because I know from experience that
> implementations do tend to differ and not always because they are
> buggy.
Differ from each other, or differ from specification? C++
implementations are allowed to differ from each other - that's why the
standard has the concept of undefined (1.3.12) and unspecified (1.3.13)
behaviors. It is possible to write portable programs that don't exhibit
either of those.
As to differing from specification - how do you define "buggy" if not as
"doesn't follow specification"? Do you subscribe to Alan Carre's notion
that "the compiler is the ultimate arbiter", in which case there
apparently ain't no such thing as a compiler bug?
> FWIW, in this case, I happen to agree with you, the parser should
> view the macro as a separate token and therefore safe, IN THIS CASE,
> to embed a white space. But overall, I think it will cause more harm
> than good, and it would be better if the programmer simply made his
> macros more concise for the parser to handle.
I never suggested otherwise. This bug is a minor bug in an obscure
corner of the language, and is quite easy to work around, with the
workaround arguably improving the readability of the program, so a good
thing all around. Nevertheless, it's a bug (and I distinctly recall you
did argue against that last statement. Perhaps I managed to convince
you.)
>>> So for one to continue to suggest that a long existing compilers are
>>> magically BUGGY because it may not follow verbatim an ever evolving
>>> standard or draft today is just plain silly, unrealistic.
>>
>> A compiler is buggy when it doesn't follow the standard it claims to
>> conform to.
>
> Are there any aspects of the C++ standard that are NOT implemented?
Yes. For MSVC compiler, they are documented here:
http://msdn.microsoft.com/en-us/library/x84h5b78.aspx
These are, arguably, known bugs that the vendor doesn't plan to fix in
the near future.
> Anyway, this is typical behavior of USERS of a system. Users are known
> to be anal with specifics, they appeal to docs as it is the "bible" to
> an one way or the high way approach towards protocol implementation.
I don't know about protocol implementors (who seem to have a rather
uneasy relationship with their users), but compiler vendors appear to be
happy when users report bugs to them. For MSVC compiler, you can do it
here:
http://connect.microsoft.com/feedback/default.aspx?SiteID=210
> They are the ones that general throw in your face,
>
> "look, its buggy because of this standard section XYZ"
You mean, they provide free QA service to you by filing a bug report?
Why again is this something to despise, rather than cherish?
> The question is
> then does it specifically state in some form or another:
>
> "Implementators of this C++ standard MUST follow
> the parsing protocol 100%"
1.4 Implementation compliance
2 Although this International Standard states only requirements on C++
implementations, those requirements are often easier to understand if
they are phrased as requirements on programs, parts of programs, or
execution of programs. Such requirements have the following meaning:
- If a program contains no violations of the rules in this International
Standard, a conforming implementation shall, within its resource limits,
accept and correctly execute that program.
- If a program contains a violation of any diagnosable rule, a
conforming implementation shall issue at least one diagnostic message,
except that
- If a program contains a violation of a rule for which no diagnostic is
required, this International Standard places no requirement on
implementations with respect to that program.
In our case, we have a program that contains no violations of the rules
in the standard, and yet the implementation fails to accept it. Ergo,
it's non-conforming (unless you want to argue that the program in
question strains the compiler's resource limits).
> I will be surprise if it was there
Then, I guess, I've just managed to surprise you.
> because as I said, there is far
> too many implementators that MAY or MAY NOT follow it for whatever
> reason it is.
In other words, there exist buggy implementations. Tell me something I
don't know.
> At best, the C++ standard can only serve as a
> guideline for implementators to use, to provide an INTENT in order to
> make the meaning correct.
You may put it this way, yes. Then, whenever the implementation doesn't
follow the guidelines, its users file a bug report and its authors fix
it.
C++ users, you see, are interested in writing portable code. For that,
they demand compilers that agree with each other on the meaning and
interpretation of their programs. One way to achieve that, the way that
C++ community chose to follow, is to draft a written specification of
the language, and then demand compliance with this specification from
the vendors. It also helps vendors, by introducing clarity as to what
the compiler should do.
> At the end of the day, it was about how a parser took a string "-X"
> and created tokens.
>
> I know as a fact based on compiler and interpretator creation and
> seeing how others, and how other languages work, not just C/C++ but
> many others, that it could see that as TWO tokens
>
> "-" "X"
>
> or it can do MACRO substitution FIRST and see a string:
>
> "--10"
>
> and see two different tokens now:
>
> "--" "10"
A compiler that does the former is conforming. A compiler that does the
latter is non-conforming, aka buggy, as it fails to compile a valid C++
program.
> There are just far too many factors to all this. Appealing to a
> STANDARD does not always make it broken.
How else do you determine that a compiler is broken? What other
measuring stick can you compare it against?
> What you (speaking in
> general) think is correct may or may not be viewed the same way by
> others.
A program is either valid with respect to the standard, or it isn't. The
compiler either accepts it, or it doesn't. These are objective facts not
subject to personal opinion. 2+2==4 regardless of what someone might
think about it.
Read The Frigging Standard. First, the character stream is split into
tokens, then macro substitution is performed. Macro definition is stored as
a sequence of tokens (NOT as a string).
> It was ["-10"], a literal constant according to all experiments, with ONE
> exception:
>
> Namely when -NUM was used *directly* in source where the preprocessor
> interpreted NUM as being the numeric constant -10 instead of replacing NUM
> with the "source code" string "-10".
>
>> Y corresponds to a sequence of tokens ["--", "NUM"]. You don't get 3
>> tokens here.
>>
>> When "--" is found in the original source, it's one token.
>
> WOW! Beat me to it... I was just about to pre-empt this obvious reply
> (because I knew it was coming) with the following:
>
> If, as you claim, "--" (quoteless) is considered to be ONE token (as with
> "-" (quoteless)), then how would the preprocessor interpret this one:
>
> EXPRESSION: "---" (quoteless)
>
> Should it be: ["-","--"] or ["--", "-"] or else ["-","-","-"] ?
>
> Is it the negation of the decrement? Or the decrement of the negation, or
> the negation of the negation of the negation?
>
Read The Frigging C Standard, 6.4/4. The longest character sequence that
could constitute a preprocessing token is selected as a token. This is also
covered in about every C/C++ FAQ and "gotcha" list.
And yet it is the case.
> It was ["-10"], a literal constant according to all experiments
Let's accept this, for the sake of argument. This makes your belief that
the compiler could go from ["-", "-10"] to ["--", "10"] even more
puzzling. It would require the compiler to break up an already
established token. Why do you think it is a reasonable, or desirable,
thing for a compiler to do?
> If, as you claim, "--" (quoteless) is considered to be ONE token (as
> with "-" (quoteless)), then how would the preprocessor interpret this
> one:
> EXPRESSION: "---" (quoteless)
As ["--", "-"], according to the so-called maximum munge rule:
2.4p3 If the input stream has been parsed into preprocessing tokens up
to a given character, the next preprocessing token is the longest
sequence of characters that could constitute a preprocessing token, even
if that would cause further lexical analysis to fail.
> Is it the negation of the decrement? Or the decrement of the
> negation, or the negation of the negation of the negation?
The tokenizer is not concerned with whether the sequence of tokens it
produces is a meaningful C++ construct (nor is the preprocessor).
However, I can give you an example where --- appears in a valid program:
int x = 0;
int y = x---1;
The last statement is equivalent to
int y = (x--) - 1;
I'll leave it as an exercise for the reader to figure out how the
following two programs work:
int main() {
int x = 0;
return ------x; // any even number of dashes.
}
struct S {
int& operator-() {
static int x = 0;
return x;
}
};
int main() {
S s;
return -----s; // any odd number of dashes.
}
Another exercise for the reader: construct a valid C++ program that
contains sequences &&&, &&&& and &&&&& (I believe a sequence of five
ampersands is the longest possible, but would love to learn otherwise).
Not inside a comment or a macro that's never used of course - that would
be cheating.
> All of these are possible interpretations of "---" (again quoteless).
But only one is correct.
> Certainly you aren't going to claim that "---" is yet another
> specially-recognized token are you?
Certainly not.
> So which is it?
["--", "-"]
> No, what seems abundantly clear is that the preprocessor isn't doing
> any math, it's simply replacing string "tokens" with their
> corresponding definitions.
Quite.
> Tokens are seperated by "delimitors" such
> as SPACE (' ') and other accepted delimitors such as the minus sign
> ('-').
Minus sign is itself a token, not a delimiter between tokens.
> Brackets, space, + - slash, comma, basically anything that's
> not a "csym" seems to serve as a proper delimitor
Or rather, as a token:
2.12 Operators and punctuators
1 The lexical representation of C++ programs includes a number of
preprocessing tokens which are used in the syntax of the preprocessor or
are converted into tokens for operators and punctuators:
preprocessing-op-or-punc: one of
{ } [ ] # ## ( )
<: :> <% %> %: %:%: ; : ...
new delete ? :: . .*
+ * / % ^ & | ~
! = < > += = *= /= %=
^= &= |= << >> >>= <<= == !=
<= >= && || ++ -- , ->* ->
and and_eq bitand bitor compl not not_eq
or or_eq xor xor_eq
> Anyway, either the preprocessor knows C++ and does math or it's a
> strtok(er)/srep(er).
It's mostly the latter, though the tokenizing process is more
complicated than what's achievable with strtok (and I'm not at all
familiar with srep).
> And I seriously doubt it does any "token"
> mathematics
I'm not sure what you mean by "token mathematics".
> [though will do some basic algebra on numeric constants
> such as 10/2 but that's about the extent of it from my experience].
No, the preprocessor won't. The compiler will. Consider:
#define X 10/2
#define STR1(x) #x
#define STR(x) STR1(x)
printf("%s", STR(X));
This prints "10/2", not "5". On the other hand, if you write
int main() { return X; }
the generated assembly will be equivalent to that generated for "return
5;". There will be no division instruction.
> Read The Frigging Standard. First, the character stream is split into
> tokens, then macro substitution is performed. Macro definition is stored
> as a sequence of tokens (NOT as a string).
&
> Read The Frigging C Standard, 6.4/4. The longest character sequence that
> could constitute a preprocessing token is selected as a token. This is
> also covered in about every C/C++ FAQ and "gotcha" list.
Ok, no reason to get all huffy about it. So let's look...
#define X -10
#define Y -X
So then here's my character stream:
int main(int,...)
{
return Y;
}
// NOTE: no quotation makrs following this line are intended to indicate
*actual quotes*.
// I am using them ONLY for clarity:
Ok... line 3 reads: return Y;
Longest string following "return" that could be considered a token is "Y".
So now we have 3 tokens namely ["return","Y",";"] (no substitutions yet! as
you said).
Now we're done with the preprocessing and it's time to do substitution
correct? Ok, so let's substitute:
Y --> -X
X --> -10
so now we have:
int main(int,...)
{
return --10;
}
What did I do wrong? I took my character string, extracted the 3 tokens and
then substituted just like you said the standard says. But I got the wrong
answer (I mean I got the answer you don't want me to get).
Let me try the steps again:
> the character stream is split into tokens
["return","Y",";"]
> then macro substitution is performed.
#define Y -X
["return","-X",";"]
Wait, now we have a problem perhaps. You said macros were stored as tokens,
but tokenization happens before substitution. X never appeared in the
character stream so it was never tokenized according to the rule that
"substitution happens AFTER tokenization". So phewf! ok, no problems.
continue substititions:
#define X -10
["return","--10",";"]
int main(int,...)
{
return --10;
}
Well I don't get it. When I follow the standard I get code which doesn't
compile (ie. what cl.exe produces). Maybe I'm missing something. I did
exactly what you said the standard said to do.
As far as --- sure, that sounds like a fine rule. Longest string etc. Never
thought of that. Sounds a bit like an afterthought but whatever. Works for
me.
- Alan Carre
Is there &&& operator?
X = ["-", "10"]
Y = ["-", "X"]
"int", " ", "main", "(", "int", ",", "...", ")",
"{",
"return", " ", "Y", ";",
" }"
After Y is substituted:
"return", " ", "-", "X", ";",
Now X is substituted:
"return", " ", "-", "-", "10", ";",
Resulting program (note that whitespaces are now removed):
"int", "main", "(", "int", ",", "...", ")",
"{",
"return", "-", "-", "10", ";",
" }"
as a sequence of tokens goes to the next compilation phase. It DOES NOT
EXIST IN PLAIN TEXT FORM as:
> int main(int,...)
> {
> return --10;
> }
ANYWHERE AND IS NOT SUBJECT OF ANOTHER RE-TOKENIZATION ANYMORE.
As #define directive is processed, X definition is stored as a sequence of
tokens.
I don't know. I just know that that minus sign seemed to stick with that 10
no matter how you look at it (ie. as strings, or numeric constants or
whatever). If it were seperated from the 10 then everything would have
worked out and compiled. That did not happen so that's why I think X was
"stored" (however these things are stored) as a single token namely [-10].
> 2.4p3 If the input stream has been parsed into preprocessing tokens up
> to a given character, the next preprocessing token is the longest sequence
> of characters that could constitute a preprocessing token, even if that
> would cause further lexical analysis to fail.
Ya, someone already explained this to me. I didn't know the rule but now I
do.
> The tokenizer is not concerned with whether the sequence of tokens it
> produces is a meaningful C++ construct (nor is the preprocessor). However,
> I can give you an example where --- appears in a valid program:
>
> int x = 0;
> int y = x---1;
Sure, I can accept that. If you follow the rule it works. I have no problems
with that rule. Seems like a good rule.
> I'll leave it as an exercise for the reader to figure out how the
> following two programs work:
I look forward to looking into these examples and trying to solve that
puzzle ;)
- Alan Carre
No. There is a unary &, a binary & and a binary &&.
I don't understand. If, in your opinion, minus sticks with 10 no matter
what, then why doesn't "return --10;" compile, while "return - -10;"
does?