When the standard says "character" in the translation limits, what
does that term refer to? Is it a byte or a "real" character?
C99
5.2.4.1:
- 4095 characters in a logical source line
- 4095 characters in a character string literal or wide string
literal (after concatenation)
I'm modifying my compiler to issue warnings for code that contains a
string or a line whose length exceeds those limits, but the term
"character" bothers me.
My compiler works as follows:
- every code (whose character encoding is not ASCII) is converted
to be encoded in UTF-8 in TP1; and
- characters in character constants or (wide) string literals are
converted in TP6 to the encoding the user specified.
Consider a program that contains a non-wide string literal that is of
M bytes long and encoded in EUC-KR, where M is the limit specified by
the standard. Note that a Korean character is encoded by two bytes in
EUC-KR and by three in UTF-8, thus there are M/2 "characters" in the
literal.
Now, what should my compiler do for it?
a) a warning should not be emitted because the string literal has
less than M+1 bytes in EUC-KR;
b) a warning should be emitted because the string literal has more
than M bytes in UTF-8;
c) a warning should not be emitted because the string literal has
less than M+1 Korean "characters" (it has M/2); or
d) any of the above ones; it depends on how an implementer maps
characters in TP1 and defines the "character" after TP1.
And what about wide strings? For a wide string literal whose
multibyte encoding is EUC-KR (that is converted to UTF-8 in TP1) and
wide character encoding is UCS-2, what should my compiler do?
a) emit a warning when the number of bytes in the EUC-KR-encoded
string exceeds the limit;
b) emit a warning when the number of bytes in the UTF-8-encoded
string exceeds the limit;
c) emit a warning when the number of "elements" in the wchar_t
array exceeds the limit; or
d) any of the above ones.
FYI, gcc chose b) and c) respectively for non-wide and wide string
literals and my implementation follows it for now.
Does the same interpretation go for the limit on the number of
characters in a logical line?
Thanks in advance.