For the purposes of this token conversion and evaluation, all signed
integer types and all unsigned integer types act as if they have the
same representation as, respectively, the types intmax_t and
uintmax_t defined in the header <stdint.h>.145) This includes
interpreting character constants, ....
I wonder if this blanket change in type semantics was really intended.
There are many references to types in the section about character
constants. For example, the above text would appear to require a
conforming implementation to accept
#if '\x7fffffff'
#endif
because constraint 6.4.4.4p9 is not be violated and there is no other
reason to reject the lines. However GCC, Comeau, and others all reject
this with a diagnostic complaining the hexadecimal escape sequence is
out of range.
Are they non-conforming, have I misunderstood the standard, or is
this a defect in the standard?
Neil.
But the standard also states:
This includes interpreting character constants, which may involve converting
escape sequences into execution character set members.
According to my own understanding, the reason why gcc complains about this
is that '\x7fffffff' is beyond the range of the execution character set.
Thanks.
--
Hi, I'm a .signature virus, please copy/paste me to help me spread
all over the world.
> But the standard also states:
>
> This includes interpreting character constants, which may involve converting
> escape sequences into execution character set members.
>
> According to my own understanding, the reason why gcc complains about this
> is that '\x7fffffff' is beyond the range of the execution character set.
Being the co-author of that part of GCC your understanding is mistaken.
It's complaining about the escape sequence being outside the range
of the target's "unsigned char". However the standard's wording
appears to require that to be at least 64 bits wide, because "unsigned
char" must have the representation of uintmax_t for this pptoken to
token conversion.
Neil.
actually, I would think that the above token is incorrect:
'\xHH', aka, it only takes 2 hex digits;
'\uHHHH', aka, 4 hex digits;
'\UHHHHHHHH', taking 8 hex digits.
if '\x' can take more than 2 hex chars, it is a mystery then how it is
unambiguously parsed in strings?...
for example, "\x27abadexample", ...
as further detail:
the way my compiler deals with character escapes, is that currently strings
are internally assumed to always be UTF-8 ('long' strings are, likewise,
internally UTF-8 until the final output is generated).
I will assume then, that GCC does something different (such as leaving
escapes as escapes until some later stage of the compilation process?...).
or such...
> Neil.
This is a string consisting of 8 characters, { '\x27abade', 'x', 'a',
'm', 'p', 'l', 'e', '\0' }. On implementations where UCHAR_MAX <
0x27abade, this violates a constraint, but does not introduce any parsing
ambiguity any more than a+++++b does.
>if '\x' can take more than 2 hex chars, it is a mystery then how it is
>unambiguously parsed in strings?...
>
>for example, "\x27abadexample", ...
C99 says (6.4.4.4):
Each octal or hexadecimal escape sequence is the longest sequence of
characters that can constitute the escape sequence.
I suppose you could argue that "can constitute" is not completely
unambiguous, but I think the intent is clear.
-- Richard
--
:wq
Right. If you need to stop a hex escape from consuming more than you
want, use multiple adjacent string literals: "\x27" "abadexample".
-Larry Jones
What's the matter? Don't you trust your own kid?! -- Calvin
Right, but the syntax limits octal escape sequences to 3 digits, while
hexadecimal escape sequences can be arbitrarily long:
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
So this:
"\00000"
is a null character followed by two digits '0', whereas this:
"\x00000"
is just a null character (followed, in both cases, by another
null character to terminate the string).
--
Keith Thompson (The_Other_Keith) <ks...@mib.org>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
I don't follow the reasoning. Nor why you didn't apply it to
'xFFFFFFFFFFFFFFFFFFFF'
Anyway, I think you understand what the intent really was.
yes, ok.
I had missed these, and as such may need to fix my parser.
I guess I had assumed Java syntax or something, where \x only takes 2 chars
(\u and \U being used for more). since, elsewhere, \u and \U had been
specified, and took the expected number of chars, I had mistakingly assumed
that \x was similar, not having read the C standard "in detail" (my compiler
was written mostly from skimming over the standard and from my personal
experience, which it seems contains many minor errors...).
Your example is a multicharacter constant. Mine is a single-character
constant. That's a big difference. Also mine has a hex escape
sequence, for which there is a constraint that it be in the range
of unsigned char. There is no such clear constraint on multicharacter
constants that don't get into imlpementation-defined territory.
The preprocessor arithmetic language states that, when converting
pptokens to tokens, all types have the range of [u]intmax_t. Hence
my example would not be violating the hex escape constraint, as it is a
31-bit number and uintmax_t is at lesat 64 bits.
However, many (all?) compilers have chosen to ignore the standard's
wording here and still treat the constraint as being on the unmodified
type, which presumably is the intended behaviour.
The wording in the standard is particularly poor here, as it even
singles out character constants saying "yes, this widening rule
really does apply to them", viz:
token conversion and evaluation, all signed integer types and all
unsigned integer types act as if they have the same representation as,
respectively, the types intmax_t and uintmax_t defined in the header
<stdint.h>.145) This includes interpreting character constants...
The point of my post was that, "no, I suspect it only applies to some
parts of character constant conversion, but which parts precisely?".
You will note there are several references to types in 6.4.4.4; only
some of those are probably intended to be "widened".
One cannot point out that the standard is carefully and precisely
worded, and yes it really does mean what it says, when dealing with
issues related to its meaning and wording (a point I see a lot), but
then on the other in cases like this say that whilst it doesn't mean
precisely what it says the intent was nevertheless clear.
Neil.
A more telling test is this:
#if !'\x100000000'
#error "preprocessor is not C99 compliant"
#endif
gcc versions 3.4.4 through 4.2.3 gives this output:
ppchar.c:1:6: warning: hex escape sequence out of range
ppchar.c:2:2: #error "preprocessor is not C99 compliant"
I'd be surprised to see a compiler accept it ;-)
--
Chqrlie.
Obviously the \ character got lost somewhere during editing.