On Thursday, 8 June 2017 14:16:08 PDT Daniel Fishman wrote:
> The C++14 Standard says in [lex.ccon]/1:
>
> "An ordinary character literal that contains a single c-char representable
> in the execution character set has type char, with value equal to the
> numerical value of the encoding of the c-char in the execution character
> set"
>
> There seem to be a bit of a problem here: as far as I see, the term
> "representable in the execution character set" is not explicitly defined
> anywhere in the Standard. If the implementation uses utf8 as it's execution
> character set then it seems clear that 'CYRILLIC CAPITAL LETTER A', for
> example, is representable in the execution character set, since the letter
> is part of a utf8. But since the literal's numerical value of the encoding
> is usually larger than the maximum value of a char, it's type cannot be
> char.
This is mostly because of old, non-ASCII encodings. You could write the source
code in ASCII and have EBCDIC (for example) as the execution character set.
That meant you could have characters in the source that are not representable
in the execution one.
The opposite is impossible: if you can't write it in the source, then there is
no source code that has that construct. And note that multibyte sequences are
not taken into account: you can't have multibyte character literals in the
source encoding nor in the execution charset. You need strings for that.
> Wouldn't it be more correct to say something like: "...representable in the
> execution character set and having numerical value of the encoding of the
> c-char in the execution character set representable in a char, has type
> char..."?
I don't think so, because I think it's redundant. I disagree with you that any
Cyrillic letter is a valid char in UTF-8 because it requires multibyte
sequences. Therefore, the extra qualification you added is unnecessary.
--
Thiago Macieira - thiago (AT)
macieira.info - thiago (AT)
kde.org
Software Architect - Intel Open Source Technology Center