Macro not replaced when not preceded by whitespace?

Philipp Klaus Krause

unread,

Jul 15, 2022, 3:46:44 PM7/15/22

to

In the following code,

#define BAD(x) ((x) & 0xff)
unsigned char b=0xfe-BAD(3);

The BAD in the second line is not replaced by the GCC 11 and clang 11
preprocessors.

-, BAD, and ( are all preprocessing tokens.
The standard (N2912) states "Each subsequent instance of the
function-like macro name followed by a ( as the next preprocessing token
introduces the sequence of preprocessing tokens that is replaced by the
replacement list in the definition (an invocation of the macro)."
I don't see why BAD is not replaced.

Philipp

P.S.: The same happens for an object-like macro:

#define BAD 7
unsigned char b=0xfe-BAD;

Philipp Klaus Krause

unread,

Jul 15, 2022, 3:59:25 PM7/15/22

to

Am 15.07.22 um 21:46 schrieb Philipp Klaus Krause:

I think I got it now (reading section 6.4.8, "preprocessing numbers" in
the standard): 0xfe-BAD is a preprocessing number, and thus a single
preprocessing token.

Lew Pitcher

unread,

Jul 15, 2022, 4:06:23 PM7/15/22

to

Not a bug.

0xfe-BAD is a valid hexadecimal floating point constant,
((f) (e-BAD) or (15 times 10 to the power of -2989))
and the lexer is required to interpret it as such ("maximal munch").

You /should/ get an error on the "(3)" portion of the statement, though.

--
Lew Pitcher
"In Skills, We Trust"

Philipp Klaus Krause

unread,

Jul 15, 2022, 4:12:09 PM7/15/22

to

Am 15.07.22 um 22:06 schrieb Lew Pitcher:

>
> 0xfe-BAD is a valid hexadecimal floating point constant,
> ((f) (e-BAD) or (15 times 10 to the power of -2989))
> and the lexer is required to interpret it as such ("maximal munch").
>
> You /should/ get an error on the "(3)" portion of the statement, though.
>

It is a valid preprocessing-number, not a valid hexadecimal
floating-point constant. So I get the error for both (though later, not
at preprocessing time).

Thiago Adams

unread,

Jul 15, 2022, 5:30:15 PM7/15/22

to

If you try to use this inside #if then the preprocessor needs to convert
pp-number to value.

#if 0xfe-BAD(3)
#endif
clang gives me
error: invalid suffix '-BAD' on integer constant
this is funny because

#if 0xfeULL
#endif

works..but (in my understanding) preprocessor doesn't have the concept of types.
everything is "signed long long" inside preprocessor.

Scott Lurndal

unread,

Jul 15, 2022, 6:02:03 PM7/15/22

to

Philipp Klaus Krause <p...@spth.de> writes:
>In the following code,
>
>#define BAD(x) ((x) & 0xff)
>unsigned char b=0xfe-BAD(3);
>
>The BAD in the second line is not replaced by the GCC 11 and clang 11
>preprocessors.
>
>-, BAD, and ( are all preprocessing tokens.
>The standard (N2912) states "Each subsequent instance of the
>function-like macro name followed by a ( as the next preprocessing token
>introduces the sequence of preprocessing tokens that is replaced by the
>replacement list in the definition (an invocation of the macro)."
>I don't see why BAD is not replaced.

A hexidecimal number is allowed to contain all of 'b' 'a' 'd' and '-',
albeit in a defined sequence, does the pre-processor tokenizer treat
it as an [invalid] number?

olcott

unread,

Jul 15, 2022, 6:42:44 PM7/15/22

to

The above works fine in Microsoft C.
BAD(3):3
b:fb

As long as the sequence of substitutions results in a lexically correct
token it should not be rejected.

--
Copyright 2022 Pete Olcott

"Talent hits a target no one else can hit;
Genius hits a target no one else can see."
Arthur Schopenhauer

olcott

unread,

Jul 15, 2022, 7:22:31 PM7/15/22

to

On 7/15/2022 5:42 PM, olcott wrote:
> On 7/15/2022 5:01 PM, Scott Lurndal wrote:
>> Philipp Klaus Krause <p...@spth.de> writes:
>>> In the following code,
>>>
>>> #define BAD(x) ((x) & 0xff)
>>> unsigned char b=0xfe-BAD(3);
>>>
>>> The BAD in the second line is not replaced by the GCC 11 and clang 11
>>> preprocessors.
>>>
>>> -, BAD, and ( are all preprocessing tokens.
>>> The standard (N2912) states "Each subsequent instance of the
>>> function-like macro name followed by a ( as the next preprocessing token
>>> introduces the sequence of preprocessing tokens that is replaced by the
>>> replacement list in the definition (an invocation of the macro)."
>>> I don't see why BAD is not replaced.
>>
>> A hexidecimal number is allowed to contain all of 'b' 'a' 'd' and '-',
>> albeit in a defined sequence, does the pre-processor tokenizer treat
>> it as an [invalid] number?
>>
>>
>
> The above works fine in Microsoft C.
> BAD(3):3
> b:fb
>
> As long as the sequence of substitutions results in a lexically correct
> token it should not be rejected.
>

Here is a snippet of the Lex source-code for ANSI C hexadecimal
H [a-fA-F0-9]
IS (u|U|l|L)*
0[xX]{H}+{IS}? { count(); return(CONSTANT); }

Thiago Adams

unread,

Jul 15, 2022, 7:26:36 PM7/15/22

to

On Friday, July 15, 2022 at 5:12:09 PM UTC-3, Philipp Klaus Krause wrote:

I think to be valid ppnumber "-BAD" must be " identifier-continue"
that is "universal-character-name of class XID_Continue"

identifier-continue
digit
nondigit
universal-character-name of class XID_Continue

pp-number:
digit
. digit
pp-number identifier-continue
pp-number ’ digit
pp-number ’ nondigit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Richard Damon

unread,

Jul 15, 2022, 7:54:09 PM7/15/22

to

The issue is that it matches the form of a pre-processing number that
can match hexadecimal floating point values (so - can follow e) which
was intentionally made a bit sloppy to keep the preprocessor simpler.

A hex constant ending in e and followed by a + or - sign, that is
expected to be part of a new token, needs a space between them.

Keith Thompson

unread,

Jul 15, 2022, 8:08:49 PM7/15/22

to

Thiago Adams <thiago...@gmail.com> writes:
> On Friday, July 15, 2022 at 5:12:09 PM UTC-3, Philipp Klaus Krause wrote:
>> Am 15.07.22 um 22:06 schrieb Lew Pitcher:
>> >
>> > 0xfe-BAD is a valid hexadecimal floating point constant,
>> > ((f) (e-BAD) or (15 times 10 to the power of -2989))
>> > and the lexer is required to interpret it as such ("maximal munch").
>> >
>> > You /should/ get an error on the "(3)" portion of the statement, though.
>> >
>> It is a valid preprocessing-number, not a valid hexadecimal
>> floating-point constant. So I get the error for both (though later, not
>> at preprocessing time).
>
> I think to be valid ppnumber "-BAD" must be " identifier-continue"
> that is "universal-character-name of class XID_Continue"

No, "-BAD" is not a valid pp-number.

> identifier-continue
> digit
> nondigit
> universal-character-name of class XID_Continue
>
> pp-number:
> digit
> . digit
> pp-number identifier-continue
> pp-number ’ digit
> pp-number ’ nondigit
> pp-number e sign
> pp-number E sign
> pp-number p sign
> pp-number P sign
> pp-number .

You must be looking at a recent draft of the standard.

In C11 and C17, "identifier-continue" is called "identifier-nondigit".
It's an underscore or any of the 62 uppercase or lowercase letters (or a
universal-character-name or other implementation-defined character, but
those don't come into play here).

(In the following, 'e' and 'p' can be lower or upper case.)

"-BAD" is not a pp-number, since a pp-number can't start with a sign.
Any sign must follow an 'e' or 'p' (i.e., be part of an exponent). (A sign
applied to a numeric constant is not part of the constant; it's a
separate unary minus operator.)

"0xfe-BAD" is not a valid hexadecimal floating constant. They use 'p',
'e' to introduce the exponent, since 'e' is a valid hex digit.

But "0xfe-BAD" is a valid pp-number. It consists of:

- '0', a digit
- 'x', 'f', both identifier-nondigits
- 'e' followed by a sign '-'
- 'B', 'A', 'D', all identifier-nondigits

pp-numbers encompass all integer and floating-point constants, in all
supported bases, with or without suffixes like "ULL". They also include
a lot of things that aren't valid tokens, basically to make the
preprocessor's job easier. If a pp-number can't be converted to a valid
token, which is the case here, it's a syntax error.

The apparent ambiguity (the 'e' could be either an identifier-nondigit
or the first character of an "e sign" sequence) is resolved in favor of
the longest sequence.

If the input stream has been parsed into preprocessing tokens up to
a given character, the next preprocessing token is the longest
sequence of characters that could constitute a preprocessing token.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

unread,

Jul 15, 2022, 8:12:13 PM7/15/22

to

Philipp Klaus Krause <p...@spth.de> writes:

> In the following code,
>
> #define BAD(x) ((x) & 0xff)
> unsigned char b=0xfe-BAD(3);
>
> The BAD in the second line is not replaced by the GCC 11 and clang 11
> preprocessors.

Just as a matter of style, I would have written this as:

#define BAD(x) ((x) & 0xff)
unsigned char b = 0xfe - BAD(3);

just because I find it more legible with spaces around the "-" operator.
It has the added bonus of avoiding the spurious pp-number.

Lynn McGuire

unread,

Jul 15, 2022, 10:03:50 PM7/15/22

to

On 7/15/2022 7:11 PM, Keith Thompson wrote:
> Philipp Klaus Krause <p...@spth.de> writes:
>> In the following code,
>>
>> #define BAD(x) ((x) & 0xff)
>> unsigned char b=0xfe-BAD(3);
>>
>> The BAD in the second line is not replaced by the GCC 11 and clang 11
>> preprocessors.
>
> Just as a matter of style, I would have written this as:
>
> #define BAD(x) ((x) & 0xff)
> unsigned char b = 0xfe - BAD(3);
>
> just because I find it more legible with spaces around the "-" operator.
> It has the added bonus of avoiding the spurious pp-number.

You and me both.

I use extra spaces due to bad and worsening vision.

Lynn

Philipp Klaus Krause

unread,

Jul 18, 2022, 9:06:03 AM7/18/22

to

Am 15.07.22 um 23:30 schrieb Thiago Adams:

>
> works..but (in my understanding) preprocessor doesn't have the concept of types.
> everything is "signed long long" inside preprocessor.

In the proprocessor, evertything is intmax_t or uintmax_t. Practically,
on all implementations known to me, those are the same as signed long
long and unsigned long long.

Philipp

Thiago Adams

unread,

Jul 18, 2022, 11:01:01 AM7/18/22

to

I have implemented the preprocessor (this one http://thradams.com/web3/playground.html)
and am not doing any type analysis. (differently than the compiler constant expressions)
Instead what I am doing is working with signed long long. For instance if a number is bigger
I will just emit an error and not use unsigned.
It is possible to try to be smart and have two internals representations like A+ B both unsigned
but I guess this is uncommon.

Andrey Tarasevich

unread,

Jul 19, 2022, 1:30:32 PM7/19/22

to

On 7/15/2022 1:06 PM, Lew Pitcher wrote:
> Not a bug.
>
> 0xfe-BAD is a valid hexadecimal floating point constant,

Um... How is it valid?

Firstly, the grammar requires `binary-exponent-part` in
`hexadecimal-floating-constant`

hexadecimal-floating-constant:
hexadecimal-prefix hexadecimal-fractional-constant
binary-exponent-part floating-suffix_opt
hexadecimal-prefix hexadecimal-digit-sequence
binary-exponent-part floating-suffix_opt

which means that only P-exponents are allowed, not E-exponents.

Secondly, the actual exponent in P-exponents of
`hexadecimal-floating-constant` are still required to be represented in
_decimal_ (!) format

binary-exponent-part:
p sign_opt digit-sequence
P sign_opt digit-sequence

Note: `digit-sequence`, not `hexadecimal-digit-sequence` is used in the
above. No hex digits are allowed in the exponent part even in
hexadecimal floating point constants.

--
Nest regards
Andrey

Keith Thompson

unread,

Jul 19, 2022, 2:19:48 PM7/19/22

to

Andrey Tarasevich <andreyta...@hotmail.com> writes:
> On 7/15/2022 1:06 PM, Lew Pitcher wrote:
>> Not a bug.
>> 0xfe-BAD is a valid hexadecimal floating point constant,
>
> Um... How is it valid?

[...]

As previously discussed in this thread, it's not a valid hexadecimal
floating point constant.

The issue is that it's a valid pp-number, which is why it's not parsed
as three tokens 0xfe, -, BAD. Adding spaces around the '-' fixes the problem.

Andrey Tarasevich

unread,

Jul 19, 2022, 2:24:07 PM7/19/22

to

Again, the issue is that preprocessor uses `pp-number` - a "simplified"
form of the grammar, which kinda lumps the grammars of
`decimal-floating-constant` and `hexadecimal-floating-constant`
together. So, at preprocessing stage the original sequence is accepted
as a `pp-number`. Nevertheless, this does not make it a "valid
hexadecimal floating point constant".

--
Best regards,
Andrey

Thiago Adams

unread,

Jul 20, 2022, 8:05:52 AM7/20/22

to

Preprocessor also requires correct numbers but only when it is
necessary inside #if.

#define BAD(x) ((x) & 0xff)

#if 0xfe-BAD(3)
#endif

clang : error: invalid suffix '-BAD' on integer constant
gcc: error: invalid suffix "-BAD" on integer constant

Probably pp-number was defined taking into account
existing preprocessors at the time. I guess it is time
to remove it. At least a compiler mode or warning deprecated warning
something like this.

Philipp Klaus Krause

unread,

Jul 20, 2022, 10:08:27 AM7/20/22

to

Am 20.07.22 um 14:05 schrieb Thiago Adams:

> Probably pp-number was defined taking into account
> existing preprocessors at the time. I guess it is time
> to remove it. At least a compiler mode or warning deprecated warning
> something like this.

Feel free to write a paper for the next SC22WG14 meeting. You are past
the deadline for C2X (likely to become C23), but if WG14 likes the
proposal, it could make it into C2Y.

Philipp

bart c

unread,

Jul 21, 2022, 7:38:45 AM7/21/22

to

My presumably non-conforming compiler expands your macro as expected.

0xfe-BAD is not a hex floating point constant, it's interpreted as 0xfe - BAD, three tokens.

Hex floating point tokens using 'p' for the exponent.

This looks to be yet another quirk in the preprocessor.