longer 'char literals' meaning in c

fir

unread,

Apr 30, 2020, 7:43:54 AM4/30/20

to

by char literal i name things like 'a' '2'
(libe by string literal i name "aaa" "2")

the wuestion is does have those longer char literals like 'aaaaaaaa' some defined meaning
in c? (mostly thinking on older C like c89)

Bonita Montero

unread,

Apr 30, 2020, 7:55:42 AM4/30/20

to

Multibyte character-literals are mapped to an int and have a implemen-
tation-defined mapping according to the byte-ordering of the machine.

fir

unread,

Apr 30, 2020, 8:17:23 AM4/30/20

to

implementation defined?
is for example '%$' value implementation defined?
% is 0x25, $ is 0x24

shouldnt it be 0x2524 anywhere?

fir

unread,

Apr 30, 2020, 8:22:54 AM4/30/20

to

i checked in my tdm/gcc it is
checked 'ABCD' 0x41424344 ok

checked 'ABCDEF'
( printf("%x", 'ABCDEF');)
0x43444546

now that proabbaly seems stupid (or not,?)

fir

unread,

Apr 30, 2020, 8:24:46 AM4/30/20

to

'fir' is 666 972 (in dec) out of curiosity

fir

unread,

Apr 30, 2020, 8:29:34 AM4/30/20

to

sorry, in hex, in dec it is 6 711 666

fir

unread,

Apr 30, 2020, 8:44:48 AM4/30/20

to

well mabe it is ok as 'ABCDEF' is larger than 32 bit and the 'CDEF' is and rest of division

it shouldnt be hovever dependant on implementation, if it works as a kinda number its ok, as i once said i may use it as an implementation of masy idea of adhoc enum

foo("aaaa", 'slow');
foo("aaaa", 'fast');

void foo(char* txt, int how)
{
if(how=='fast') //...
if(how=='slow') //...

}

i only need it to be guaranteed (topw ork on all compilers)? is it?

Bart

unread,

Apr 30, 2020, 9:00:29 AM4/30/20

to

Take this:

#include <stdio.h>
int main(void) {
printf("%08X\n",'ABCD');
}

Some compilers will brint 41424344, and others will show 44434241, even
on the same little-endian machine.

So it is implementation-defined, meaning you can't rely on it being a
particular way around unless you specify a certain compiler always to be
used.

C with 32-bit ints also limit the length to 4 characters.

Those compilers showing 41424344 (most of them), will be using a layout
in memory which is the reverse to that used for the string "ABCD", which
is 44434241.

(My C compiler, as well as Tcc, uses a memory layout for 'ABCD' that
matches that of a string "ABCD". So the 'A' in 'ABCD' is the lowest
addressed when stored in memory, and the least significant byte of the
int (for little-endian).

My static language does the same, but also allows character constants of
64 bits and 128 bits, ie. up to 'ABCDEFGHIJKLMNOP'.)

fir

unread,

Apr 30, 2020, 9:13:24 AM4/30/20

to

i dont get it

logically if you rethink it (as i quickly did a minutes before) the proper mapping is simple

'ABCD' == 0x41424344

why dat? becouse 'D' is for sure 0x44
'CD' should be (also for sure) 0x4344
(becouse you shoulnt reverse digits in number as in common world the older is on left

it should have nothing as to big or little endians, in memory it should be simply stored as a number (say int number or short or long numbers depending hov compiler wona store it) according to opresent endian, but as numerical value it is clear and shouldnt make
various implementations

are you sure that theer are compilers who put it wrong? my gcc put it correctly

fir

unread,

Apr 30, 2020, 9:17:10 AM4/30/20

to

btw does maybe someone know the page where i could look up characteristics of a given integer number (like if it is prime or belong to some specific classes of number)

i think i seen such lookup online page ytears ago but couldnt find it now

Bart

unread,

Apr 30, 2020, 9:42:02 AM4/30/20

to

It could be affected by that. So 0x11223344 might be stored in
little-endian as:

db 44 33 22 11

but in big-endian as:

db 11 22 33 44

But think about the string "ABCD", which in memory on either system is:

db 41 42 43 44 00

Now, on little-endian, 41 or 'A' is the least signicant byte if this was
an int; on big-endian it's the most significant.

However, the way gcc represents 'ABCD' in memory is like this on my
little-endian Windows machine:

db 44 43 42 41

(Try:
int a = 'ABCD';
printf("%02X\n",*(char*)&a);
)

in memory it should be simply stored as a number (say int number or
short or long numbers depending hov compiler wona store it) according to
opresent endian, but as numerical value it is clear and shouldnt make
> various implementations
>
> are you sure that theer are compilers who put it wrong? my gcc put it correctly

My compiler and lccwin (not TCC as I said, which obediently has to do
whatever gcc does) will store 'ABCD' as:

db 41 42 43 44

which you will see exactly matches "ABCD". So why do you say that using
the reverse order is correct? 'ABCD' is not a number where D is units, C
is tens (or 26's) etc.

Anyway, because it is implementation-defined, there is no correct way of
doing this. I chose my method simply because it matches strings, so that
you could do this:

char* s = "BARTSIMPSON";
if (*(int*)s == 'BART') puts("HI BART!");

On my compiler and lccwin, this program prints HI BART!

On gcc and others, it shows nothing.

fir

unread,

Apr 30, 2020, 10:18:30 AM4/30/20

to

'abcd' or even longer 'abcdefghijkl' are not strings it got some numerical value attached,
so i thing you make it wrong way thinking how it is stored, it should be thinked what number it represents(is equal to)

it is eventually possible to treat it as a chunk of bytes ('equal' to chunk of bytes represented by "" type strings but it think it would be worse decision,

for example that "aaasdas" kind of trings have no 'immediately' defined integer values
(you cant cast it on integer its not defined)

unlike that 'a' have this integer correspondence, thats why imo it is better to treat this thing (and 'aaaa' things) as a thing that has integer correspondence (there are reasons imo why this is generally good, some reason is that it amkes some new kind of being here (for instance kinda text format that is storage independant), otherwise 'aaa' as treated like "aaa" without zero gives nothing, its kinda redundant and shallow )
(possibly tere are more arguments as to this)

my conclusion is as i said 'abcdf' should be not treated as a string but as a value and if so you dont reverse order 'digits', you use decimal most high is on lert, use hex its the sam, nin and oct is the same so here for sure most high value should be on left, there could be yet some doubt what base this thing should be (as some could say it not necessary must by 256-based) it could be some lower base that encompases mostly letters and digits or some could say it could be higher base (even open) defining unicode values

(if base would be open it would make new kind of number whose you for example could compare but not map on integers ;c )

for me hovever base 256 seem to be ok

fir

unread,

Apr 30, 2020, 10:31:10 AM4/30/20

to

btw if some say that it is not 'properly' (or 'what i, fir, would want') defined by C std
some may meybe assure me at least that it can be either of one of two 'ab' can be either 0x6162 or 0x6261 but nothing other, ?

Bart

unread,

Apr 30, 2020, 10:49:52 AM4/30/20

to

On 30/04/2020 15:18, fir wrote:
> W dniu czwartek, 30 kwietnia 2020 15:42:02 UTC+2 użytkownik Bart napisał:

>> Anyway, because it is implementation-defined, there is no correct way of
>> doing this. I chose my method simply because it matches strings, so that
>> you could do this:
>>
>> char* s = "BARTSIMPSON";
>> if (*(int*)s == 'BART') puts("HI BART!");
>>
>> On my compiler and lccwin, this program prints HI BART!
>>
>> On gcc and others, it shows nothing.
>
> 'abcd' or even longer 'abcdefghijkl' are not strings it got some numerical value attached,

Not really. When someone writes 'ABCD', they don't care about the
numeric values! Usually it will be something to do with characters or
text or strings.

There will be a numerical value associated with it, because it has
necessarily to be stored in an 'int' type, but usually that will be of
no great interest (what's the meaning of 'ABC'*'DEF'?).

> it is eventually possible to treat it as a chunk of bytes ('equal' to chunk of bytes represented by "" type strings but it think it would be worse decision,

You have to choose between in-memory representation 'ABCD' as either (A
B C D) or (D C B A) (or maybe any other ordering as you can do what you
like); which do you prefer?

Remember C says there is no right answer.

I made my own choice to make 'ABCD' behave like "ABCD", and explained why.

> my conclusion is as i said 'abcdf' should be not treated as a string but as a value and if so you dont reverse order 'digits',

It's not digits, otherwise it would be written ABCD not 'ABCD'. Note
that 0xABCD is entirely different, being 0x0000ABCD rather than
0x41424344 or 0x55434241.

Perhaps that's where the confusion is. Maybe my example should have been
'PQRS'.

Bonita Montero

unread,

Apr 30, 2020, 10:58:48 AM4/30/20

to

>> Multibyte character-literals are mapped to an int and have a implemen-
>> tation-defined mapping according to the byte-ordering of the machine.

> implementation defined?

The byte-oder in memory is actually the same for big-endian and little
-endian machines. So if you have this code (assuming 4-byte-int) ...

union
{
int i;
char a[4];
};

... and assign c a '\1\2\3\4', a will get the same bytes on big-endian
and little-endian machines at a[0...3]. It's just how i is treated.

Bonita Montero

unread,

Apr 30, 2020, 10:59:27 AM4/30/20

to

Am 30.04.2020 um 16:58 schrieb Bonita Montero:
>>> Multibyte character-literals are mapped to an int and have a implemen-
>>> tation-defined mapping according to the byte-ordering of the machine.
>
>> implementation defined?
>
> The byte-oder in memory is actually the same for big-endian and little
> -endian machines. So if you have this code (assuming 4-byte-int) ...
>
> union
> {
> int i;
> char a[4];
> };
>
> ... and assign c a '\1\2\3\4', a will get the same bytes on big-endian

i

Bonita Montero

unread,

Apr 30, 2020, 11:14:22 AM4/30/20

to

> The byte-oder in memory is actually the same for big-endian and little
> -endian machines. So if you have this code (assuming 4-byte-int) ...
> union
> {
> int i;
> char a[4];
> };
> ... and assign c a '\1\2\3\4', a will get the same bytes on big-endian
> and little-endian machines at a[0...3]. It's just how i is treated.

I just had the idea that it would have been cleverer to define the
mapping of the characters in the character-literal to the integer
constant as to the memory. That would have been more useful.

fir

unread,

Apr 30, 2020, 11:31:39 AM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 16:49:52 UTC+2 użytkownik Bart napisał:
> On 30/04/2020 15:18, fir wrote:
> > W dniu czwartek, 30 kwietnia 2020 15:42:02 UTC+2 użytkownik Bart napisał:
>
> >> Anyway, because it is implementation-defined, there is no correct way of
> >> doing this. I chose my method simply because it matches strings, so that
> >> you could do this:
> >>
> >> char* s = "BARTSIMPSON";
> >> if (*(int*)s == 'BART') puts("HI BART!");
> >>
> >> On my compiler and lccwin, this program prints HI BART!
> >>
> >> On gcc and others, it shows nothing.
> >
> > 'abcd' or even longer 'abcdefghijkl' are not strings it got some numerical value attached,
>
> Not really. When someone writes 'ABCD', they don't care about the
> numeric values! Usually it will be something to do with characters or
> text or strings.
>

dont thing so, but tell my why/(where/when) someone writes the 'ABDC' to do something with strings?

fir

unread,

Apr 30, 2020, 11:34:04 AM4/30/20

to

if so i wouldnt have 'abcd' as 0x61626364 in my gcc/mngw as i have i think (as that 0x61626364 is stored here as "DCBA" as far as i know

James Kuyper

unread,

Apr 30, 2020, 11:34:06 AM4/30/20

to

On Thursday, April 30, 2020 at 8:17:23 AM UTC-4, fir wrote:
> W dniu czwartek, 30 kwietnia 2020 13:55:42 UTC+2 użytkownik Bonita Montero napisał:
> > Am 30.04.2020 um 13:43 schrieb fir:
> > > by char literal i name things like 'a' '2'
> > > (libe by string literal i name "aaa" "2")
> > >
> > > the wuestion is does have those longer char literals like 'aaaaaaaa' some defined meaning
> > > in c? (mostly thinking on older C like c89)
> >
> > Multibyte character-literals are mapped to an int and have a implemen-
> > tation-defined mapping according to the byte-ordering of the machine.
>
> implementation defined?

"The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined." (6.4.4.4.p10)

> is for example '%$' value implementation defined?
> % is 0x25, $ is 0x24
>
> shouldnt it be 0x2524 anywhere?

No. That's one possible way of handling it, there are many others,
different implementors had different ideas about how such constants
should be handled. The standard chose to allow a variety of handlings,
rather than mandating a specific one. Personally, I thought this was a
bad idea, but I wasn't consulted on the matter.

One simple way to implement '%$' is to treat it as equivalent, in modern
C, to

(union {char string[sizeof int]; int i;}){"%$"}.i

The value of that expression would depend upon many implementation-
specific details: whether or not '$' is in the extended character set
(it isn't in the basic character set), the encoding is used for source
and execution set characters, the byte order, and sizeof(int).
Another way to do it is similar to the above, except that "%$" is padded
with blanks at the beginning to sizeof(int), which would produce an
entirely different set of possible values. There's many other
possibilities.

Spiros Bousbouras

unread,

Apr 30, 2020, 11:49:46 AM4/30/20

to

On Thu, 30 Apr 2020 07:31:01 -0700 (PDT)
fir <profes...@gmail.com> wrote:
> > > >>>>> by char literal i name things like 'a' '2'
> > > >>>>> (libe by string literal i name "aaa" "2")
> > > >>>>>
> > > >>>>> the wuestion is does have those longer char literals like 'aaaaaaaa' some defined meaning
> > > >>>>> in c? (mostly thinking on older C like c89)

[...]
[It would make responding easier if you snipped irrelevant quotations.]

> btw if some say that it is not 'properly' (or 'what i, fir, would want')
> defined by C std some may meybe assure me at least that it can be either of
> one of two 'ab' can be either 0x6162 or 0x6261 but nothing other, ?

No it's implementation defined. So it could be anything as long as the
implementation documents the choice they made. So if you really need this
(and the examples you have provided do not show a need) , you must read
what the documentation of the compiler(s) you use says on the matter. For
example
http://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior :

The compiler evaluates a multi-character character constant a character
at a time, shifting the previous value left by the number of bits per
target character, and then or-ing in the bit-pattern of the new
character truncated to the width of a target character. The final
bit-pattern is given type int, and is therefore signed, regardless of
whether single characters are signed or not. If there are more
characters in the constant than would fit in the target int the compiler
issues a warning, and the excess leading characters are ignored.

For example, 'ab' for a target with an 8-bit char would be interpreted as
(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b') ,
and '\234a' as
(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')

So this is what you want but there's no guarantee that it will be the same in
other compilers.

Personally I would find it perfectly reasonable if the compiler issued a
warning and ignored all the characters beyond the first ; so 'ab' would
cause a warning and would generate the same constant as 'a' .

--
vlaho.ninja/prog

David Brown

unread,

Apr 30, 2020, 11:49:55 AM4/30/20

to

I think a better solution would have been to make it an error, so that
compilers could give a message saying: You wrote 'abcd', but probably
meant "abcd". I can't think of any good reason for allowing
multicharacter literals in the first place - just as I can't think of
any good reason for 'a' being an "int" and not a "char". (Compatibility
with B is not a good reason IMHO.)

Here is an extract from en.cppreference.com that people might find useful:

<https://en.cppreference.com/w/c/language/character_constant>

"""
Multicharacter constants were inherited by C from the B programming
language. Although not specified by the C standard, most compilers (MSVC
is a notable exception) implement multicharacter constants as specified
in B: the values of each char in the constant initialize successive
bytes of the resulting integer, in big-endian zero-padded right-adjusted
order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4'
is 0x01020304.
"""

fir

unread,

Apr 30, 2020, 12:01:45 PM4/30/20

to

thinking on it (partially recapitulize what i wrote but also adding soem points): there is general question/mystery what that think 'ac' overaly is..afaik only things like 'a' was defined

'ab' was like born somewhere but with full reckognition imo (afaik) what it is

from one point logically if 'a' is value of a char 'ab' should be value of two chars

hovever there is also a point afaik that this 'ab' is not a string it is rather terated (and passed, to functions etc) as a value (integer, shorter or longer)

if so, if it is not a string imo the rule that a in 'ab' is higher rank than b here should be hold

some other things are not quite clear, for example i would agre that it shouldnt be a string but it not necessary mean that this should be an integer value (it could be also some kind of inside vector),

there is alos not clear if it is a value how it should be mapped/calculated

some easy way is t make it as an asci based value as this is kinda oldschool and easy (but not saying im sure, maybe its too oldschool)

fir

unread,

Apr 30, 2020, 12:12:55 PM4/30/20

to

this is limit attitude which imo c is not about, c is more like flying out of limits

i personanlly like to treat such thinga as 'kot' as a value not string, the second point would be hovever how to design such value
(this could include unicode if someone would take utf8 here, for example 'pąk' in utf8 is afaik 4 bytes so this value could be counted also this previously mentioned 256-base way

Bonita Montero

unread,

Apr 30, 2020, 12:15:36 PM4/30/20

to

>> ... and assign c a '\1\2\3\4', a will get the same bytes on big-endian
>> and little-endian machines at a[0...3]. It's just how i is treated.

> if so i wouldnt have 'abcd' as 0x61626364 in my gcc/mngw as i have i think (as that 0x61626364 is stored here as "DCBA" as far as i know

You're right; but nevertheless, it's still machine-dependent.

#include <stdio.h>

int main()
{
union
{
int i;
char c[5];
};
i = 'abcd';
c[4] = 0;
printf( "%s", c );
}

Bart

unread,

Apr 30, 2020, 12:23:24 PM4/30/20

to

On 30/04/2020 16:49, David Brown wrote:
> On 30/04/2020 17:33, James Kuyper wrote:

>> The value of that expression would depend upon many implementation-
>> specific details: whether or not '$' is in the extended character set
>> (it isn't in the basic character set), the encoding is used for source
>> and execution set characters, the byte order, and sizeof(int).
>> Another way to do it is similar to the above, except that "%$" is padded
>> with blanks at the beginning to sizeof(int), which would produce an
>> entirely different set of possible values. There's many other
>> possibilities.
>>
>
> I think a better solution would have been to make it an error, so that
> compilers could give a message saying: You wrote 'abcd', but probably
> meant "abcd". I can't think of any good reason for allowing
> multicharacter literals in the first place - just as I can't think of
> any good reason for 'a' being an "int" and not a "char".

In C, isn't 'char' just a short short int anyway? It's not as though it
is a totally incompatible type. 'AB' might just be the value of a 'A'
char followed by a 'B' character.

This is a summary of some of my uses, expressed as C fragments (but I
can't use them in C because it can't guarantee the order):

sig = readshort(....);
if (sig == 'MZ') // test marker within a file

if (target == 'X64') ... // ad hoc enum values
else if (target = 'C64')

switch (cmd) { // cmd is a function parameter
case 'CMD1': ....
case 'CMD2': ....

if (op == '<=') ...

if ((<cast>)s) == 'BART') // fast string compare, when s
// is known to be suitable size
<(cast>>s = 'ABC' // fast short strcpy

d{'Abcdef'}=1 // (not C) fast string key

A['XYZ']=10; // direct indexing with short
// strings (1-3 chars)

I can understand why, if it has never been available in C because the
results is not well-defined, why people would say they are not needed.
Because they have developed alternatives.

fir

unread,

Apr 30, 2020, 12:38:20 PM4/30/20

to

i asked you how people treat it as string in c not as you would use it (coz you seem biased to treat it as a string , i also seem biased to treat it as a value)..besides above as far as i see you reed short not string, and compare it as a value so its rather example hov people treat it as a value

Bart

unread,

Apr 30, 2020, 12:56:31 PM4/30/20

to

They are examples of how to make use of very fast and efficient int
operations, for short strings.

If you can do that with 1-character strings like 'A', then why not 2, 3
or 4-character ones?

C allows this, but what it doesn't do is specify the ordering. So that
'MZ' might match the 'MZ' in a file, or it might not.

This means that this program:

#include <stdio.h>
int main(void) {
FILE* f;
int sig;

f = fopen("test.exe","rb");

if (f){
fread(&sig,2,1,f);

if (sig == 'MZ') {
puts("EXE file");
} else {
puts("Not an EXE file");
}
fclose(f);
}
}

works properly with bcc or lccwin, but not with gcc or other compilers.
You have to use workarounds instead of the most obvious way.

fir

unread,

Apr 30, 2020, 1:04:55 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 18:56:31 UTC+2 użytkownik Bart napisał:
>
> They are examples of how to make use of very fast and efficient int
> operations, for short strings.
>
> If you can do that with 1-character strings like 'A', then why not 2, 3
> or 4-character ones?
>
> C allows this, but what it doesn't do is specify the ordering. So that
> 'MZ' might match the 'MZ' in a file, or it might not.
>
> This means that this program:
>
> #include <stdio.h>
> int main(void) {
> FILE* f;
> int sig;
>
> f = fopen("test.exe","rb");
>
> if (f){
> fread(&sig,2,1,f);
>
> if (sig == 'MZ') {
> puts("EXE file");
> } else {
> puts("Not an EXE file");
> }
> fclose(f);
> }
> }
>
> works properly with bcc or lccwin, but not with gcc or other compilers.
> You have to use workarounds instead of the most obvious way.

this is becouse you fread string not short and put that string into memory of short, if you would read short than you could compare

Bart

unread,

Apr 30, 2020, 1:28:26 PM4/30/20

to

Sure, you can fread two bytes into a 'char sig[3]'. Then put a 0
terminator into sig[2]. Then call strcmp(sig,"MZ") and test whether the
result is zero.

This is what I mean by a workaround. A workaround you wouldn't bother
with if we only testing for 'M' as the signature.

Ask why we have char constants like 'A' at all. Why not only have string
constants?

fir

unread,

Apr 30, 2020, 1:32:03 PM4/30/20

to

besides note if you would have value of 'smth' defined as local endian is it would mean that

int x = 'smth'
would heve different value that
int x = 'smth'
on other endiannes machine
(i mean on 'source' level, not on encoding level
now if you get
{'mz', 1000, 1000 , "alakalabaa" }
on source level it is equal on two
computers)
)
this is mighty bad

i got no doubt if that would be decoded this 256-base way that this 'left is more high' way
is proper, i got some doubts if this should be encoded this way at all (but if it already is alike in gcc it seem ok)

(much more i like my idea of adhoc enomus i invented atleast to myself like 5 years ago or mabe more (one of my best inventions) which uses the same syntax 'slow' 'red' 'fast' etc but not consumes so much ram, but thats a differen stroy)

fir

unread,

Apr 30, 2020, 1:39:51 PM4/30/20

to

you shouldnt call strcmp but rather read it as an short and compare to short, you read string and compare it to short

its like you would want read "\0x00\0x00\0x00\0x01" into int and compare it to 1 (this is kinda argument for big endian, but contraargument to what you say..i personanly seem to be closest to find bigandian as more propar than little, but vaguelly remember my old considerations which maybe showed that there were some reasons finding little endian better, but in turn some human-comfort ways finding big-endian better..in sum i would risk an opinion big endian is better)

Richard Damon

unread,

Apr 30, 2020, 1:48:14 PM4/30/20

to

On 4/30/20 11:49 AM, David Brown wrote:

> I think a better solution would have been to make it an error, so that
> compilers could give a message saying: You wrote 'abcd', but probably
> meant "abcd". I can't think of any good reason for allowing
> multicharacter literals in the first place - just as I can't think of
> any good reason for 'a' being an "int" and not a "char". (Compatibility
> with B is not a good reason IMHO.)

'a' was an int because ALL type smaller than int got promoted to int,
and you couldn't take the address of a character literal (&'a') because
early C didn't have const, and being able to change the value of a
literal made no sense.

One big reason 'ab' wasn't just made an error in the standard is that by
the time the standard was being written there were implementations that
allowed it and useful applications using it. Knowing how your
implementation built them, you could do a switch using the
multi-character constants as cases and let the compiler build a simple
command decoder (maybe for just 2 character commands, but Unix loved
that sort of thing).

Bart

unread,

Apr 30, 2020, 2:17:13 PM4/30/20

to

What's the 'short' version of 'MZ'?

See, this is exactly what such constants are for!

fir

unread,

Apr 30, 2020, 2:17:21 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 19:39:51 UTC+2 użytkownik fir napisał:

>..in sum i would risk an opinion big endian is better)

if so thet fellow who introduced it )i mean uttle one) and then people who propagate it
made quite a bad work making them a loads of unnecessary work for milions of people (work which could be spend on something more valuable than overcoming artificial obstacles and incompatibilities..but dont know what tos ay more on this

Bart

unread,

Apr 30, 2020, 2:20:43 PM4/30/20

to

(1) That's up to the implementation. It could easily choose to make the
ordering independent of the machine's byte-order, just as happens with
"ABCD".

(2) What are the realistic chances of any code running on a big-endian
machine? But if (1) is done properly, then it doesn't matter (100 other
things are more likely to break first).

fir

unread,

Apr 30, 2020, 2:24:21 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 20:17:13 UTC+2 użytkownik Bart napisał:

> On 30/04/2020 18:39, fir wrote:
> >
> > you shouldnt call strcmp but rather read it as an short and compare to short, you read string and compare it to short
>
> What's the 'short' version of 'MZ'?
>

19 802 obviously

> See, this is exactly what such constants are for!

?

fir

unread,

Apr 30, 2020, 2:29:10 PM4/30/20

to

it matters imo, you could decode and encode
those mz but the relation of x to some absolute number walue is broken

fir

unread,

Apr 30, 2020, 2:38:39 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 13:43:54 UTC+2 użytkownik fir napisał:
> by char literal i name things like 'a' '2'
> (libe by string literal i name "aaa" "2")
>
> the wuestion is does have those longer char literals like 'aaaaaaaa' some defined meaning
> in c? (mostly thinking on older C like c89)

ps doeas maybe someone know how to make 64 but versions of it work on mingw/gcc 32 bit?

printf("\n %d %x", 'fir', 'fir');
this int version works but what with

long long int z = 'firfir';

how to display it?

fir

unread,

Apr 30, 2020, 2:57:44 PM4/30/20

to

this seem to not work, seems maybe like its kinda bug as the code

long long int z = 'firfir';

if(z<2*1000*1000*1000)
printf("\n %d something wrong ", sizeof(z));

prints

8 something wrong

'firfir' definitely shouldnt be les than 2G here afaiu

Bart

unread,

Apr 30, 2020, 2:58:58 PM4/30/20

to

On 30/04/2020 19:24, fir wrote:
> W dniu czwartek, 30 kwietnia 2020 20:17:13 UTC+2 użytkownik Bart napisał:
>> On 30/04/2020 18:39, fir wrote:
>>>
>>> you shouldnt call strcmp but rather read it as an short and compare to short, you read string and compare it to short
>>
>> What's the 'short' version of 'MZ'?
>>
>
> 19 802 obviously

Seriously? You want code to look like this:

if (sig = 19802)

?

Even this is bad enough:

if (sig = 'M'+'Z'*256)

or:

#define PACK2(c,d) ((c)+(d)*256)

if (sig = PACK2('M','Z'))

As I said, there are a million workarounds, but the most obvious,
clearest way is to write 'MZ'. And, if the language specified it, the
least error prone.

BTW how did you work out 19802? How do you know you didn't make a mistake?

Because I make 'MZ', AS IT APPEARS AT THE START OF AN EXECUTABLE, the
value 23117.

You get 19802 when 'Z' is first followed by 'M'.

Bonita Montero

unread,

Apr 30, 2020, 3:01:25 PM4/30/20

to

> if(z<2*1000*1000*1000)
> printf("\n %d something wrong ", sizeof(z));

To printf a sizeof you have to use "%zu", although what you did might
accidentally work because the lower four bytes of the size_t might be
placed there where the int is expected.

Bart

unread,

Apr 30, 2020, 3:07:25 PM4/30/20

to

My C compiler manages it:

printf("%016llx\n",'ABCDEFGH');
printf("%016llx\n",'firfir');

produces:

4847464544434241
0000726966726966

But standard C limits a character literal to 'int', or usually 32 bits.
(My literals become long long int over 4 characters.)

You will have to use a workaround.

fir

unread,

Apr 30, 2020, 3:09:02 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 20:58:58 UTC+2 użytkownik Bart napisał:
> On 30/04/2020 19:24, fir wrote:
> > W dniu czwartek, 30 kwietnia 2020 20:17:13 UTC+2 użytkownik Bart napisał:
> >> On 30/04/2020 18:39, fir wrote:
> >>>
> >>> you shouldnt call strcmp but rather read it as an short and compare to short, you read string and compare it to short
> >>
> >> What's the 'short' version of 'MZ'?
> >>
> >
> > 19 802 obviously
>
> Seriously? You want code to look like this:
>
> if (sig = 19802)
>
> ?

no i wanted
int sig = readShort(f);
if(sig=='MZ') //...

this is proper and that sig has the same value on both computers

it wouldnt in case of exe file becuse this signature is not short (if it sould be short it would be 'MZ' on 1 machine and 'ZM' on other) but 2 byte string, but if its a strean you shouldnt fread it into short and expect it to work)

fir

unread,

Apr 30, 2020, 3:10:40 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 20:58:58 UTC+2 użytkownik Bart napisał:
>

> BTW how did you work out 19802? How do you know you didn't make a mistake?
>

printf("\n %d",'MZ');

Bart

unread,

Apr 30, 2020, 3:16:49 PM4/30/20

to

On 30/04/2020 20:08, fir wrote:
> W dniu czwartek, 30 kwietnia 2020 20:58:58 UTC+2 użytkownik Bart napisał:
>> On 30/04/2020 19:24, fir wrote:
>>> W dniu czwartek, 30 kwietnia 2020 20:17:13 UTC+2 użytkownik Bart napisał:
>>>> On 30/04/2020 18:39, fir wrote:
>>>>>
>>>>> you shouldnt call strcmp but rather read it as an short and compare to short, you read string and compare it to short
>>>>
>>>> What's the 'short' version of 'MZ'?
>>>>
>>>
>>> 19 802 obviously
>>
>> Seriously? You want code to look like this:
>>
>> if (sig = 19802)
>>
>> ?
>
> no i wanted
> int sig = readShort(f);
> if(sig=='MZ') //...
>
> this is proper and that sig has the same value on both computers
>
> it wouldnt in case of exe file becuse this signature is not short (if it sould be short it would be 'MZ' on 1 machine and 'ZM' on other) but 2 byte string, but if its a strean you shouldnt fread it into short and expect it to work)

Well, /I/ expect it to work. And with my stuff (my languages, my C
compiler, and on all the targets I expect to use, it will work).

fir

unread,

Apr 30, 2020, 3:29:53 PM4/30/20

to

if yu expect it iy means you expect int to be big endian (and the assumption was we talk about real world, in rwal worl you cant expect thet reading string into int will work withoup proper conceptualisation)

(ithin i said enough on that point, not that i tbink talkin on this is nonsense but repeating it has not much sense (and i dony think i got somethinj interestin to add more here))

fir

unread,

Apr 30, 2020, 3:41:47 PM4/30/20

to

my gcc cuts it
0000000066726966
its very bad news as i could use it

still i get a decision if i sgould use this short ip to 4 literals... as to me i could use it butiy gives a consequence i would need to
explain numerouspeople why i use it

(still even if so limited its better than crappy normal enums imo)

in fact hovever it is weird i dont found a need to use an enum for years, i dont know why

Vir Campestris

unread,

Apr 30, 2020, 4:33:02 PM4/30/20

to

On 30/04/2020 13:17, fir wrote:
> implementation defined?
> is for example '%$' value implementation defined?
> % is 0x25, $ is 0x24
>
> shouldnt it be 0x2524 anywhere?

One word.

EBCDIC.

You probably haven't met it, but a lot of mainframes still use it.

You write 'AB' and might get 0x4141 or 0x4241 depending on the byte
order chosen. On a mainframe you might well get 0xc1c2.

Full portability is _hard_!

Andy

fir

unread,

Apr 30, 2020, 4:44:02 PM4/30/20

to

W dniu czwartek, 30 kwietnia 2020 22:33:02 UTC+2 użytkownik Vir Campestris napisał:
> On 30/04/2020 13:17, fir wrote:
> > implementation defined?
> > is for example '%$' value implementation defined?
> > % is 0x25, $ is 0x24
> >
> > shouldnt it be 0x2524 anywhere?
>
> One word.
>
> EBCDIC.
>
> You probably haven't met it, but a lot of mainframes still use it.
>
> You write 'AB' and might get 0x4141 or 0x4241 depending on the byte

as i said imo only 0x4142 is right (0x4241 is ering as 'AB' is not a string but value)

> order chosen. On a mainframe you might well get 0xc1c2.
>
> Full portability is _hard_!
>
> Andy

if so this what i say should only apply where ascii applies (on this ebdic it could be as abdic but still most higher on the left)

Lew Pitcher

unread,

Apr 30, 2020, 4:54:59 PM4/30/20

to

A spot check of the various EBCDIC character sets (yes, there's more than
one EBCDIC character set) shows that '%$' would usually translate to 0x6c5b.

> Full portability is _hard_!

Indeed, it is.

--
Lew Pitcher
"In Skills, We Trust"

Keith Thompson

unread,

Apr 30, 2020, 5:00:16 PM4/30/20

to

Bart <b...@freeuk.com> writes:
[...]
> Take this:
>
> #include <stdio.h>
> int main(void) {
> printf("%08X\n",'ABCD');
> }
>
> Some compilers will brint 41424344, and others will show 44434241,
> even on the same little-endian machine.
>
> So it is implementation-defined, meaning you can't rely on it being a
> particular way around unless you specify a certain compiler always to
> be used.

The value is implementation-defined. That's *all* the standard
says about it. There's no mention of endianness or representation.
There is no requirement for the mapping to make sense.

I vaguely recall seeing an implementation where 'AB' has the same
value as 'A', i.e., any characters after the first are ignored.
That's conforming as long as the implementation documents it.

> C with 32-bit ints also limit the length to 4 characters.

No it doesn't. gcc does print a warning for 'ABCDE':

warning: character constant too long for its type

but that's not a required diagnostic, just something that gcc chooses to
warn about. It prints a different warning for *any* multi-character
constant:

warning: multi-character character constant [-Wmultichar]

That's not a required diagnostic either. Both 'ABCD' and 'ABCDE'
are conforming, with implementation-defined (and not necessarily
distinct) values of type int. A compiler that rejects 'ABCDE'
because it won't fit in an int would be non-conforming. (gcc gives
'ABCDE' the same value as 'BCDE'.)

Multi-character constants are not useful in portable code, which
is why some compilers warn about them. They *can* be useful
in non-portable code that depends on a particular compiler.
I've seen them used as an easy way to define distinct values,
assuming that 'THIS' and 'THAT' are distinct. And if the mapping
is straightforward, they can be legible in an annotated hex dump.

In hindsight, it might have been useful to have a similar syntax for
*portable* integer constants composed from byte values -- as long as
that syntax is distinct from character constants to avoid confusion.
(I don't have any good ideas for what that might look like.)

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Spiros Bousbouras

unread,

Apr 30, 2020, 5:16:11 PM4/30/20

to

On Thu, 30 Apr 2020 14:00:06 -0700
Keith Thompson <Keith.S.T...@gmail.com> wrote:
> That's not a required diagnostic either. Both 'ABCD' and 'ABCDE'
> are conforming, with implementation-defined (and not necessarily
> distinct) values of type int. A compiler that rejects 'ABCDE'
> because it won't fit in an int would be non-conforming. (gcc gives
> 'ABCDE' the same value as 'BCDE'.)

Why would it be non-conforming ?

Bart

unread,

Apr 30, 2020, 5:26:00 PM4/30/20

to

On 30/04/2020 22:00, Keith Thompson wrote:

> Bart <b...@freeuk.com> writes:

>> C with 32-bit ints also limit the length to 4 characters.
>
> No it doesn't. gcc does print a warning for 'ABCDE':
>
> warning: character constant too long for its type
>
> but that's not a required diagnostic, just something that gcc chooses to
> warn about. It prints a different warning for *any* multi-character
> constant:
>
> warning: multi-character character constant [-Wmultichar]
>
> That's not a required diagnostic either. Both 'ABCD' and 'ABCDE'
> are conforming, with implementation-defined (and not necessarily
> distinct) values of type int. A compiler that rejects 'ABCDE'
> because it won't fit in an int would be non-conforming. (gcc gives
> 'ABCDE' the same value as 'BCDE'.)

So it limits the length to 4 characters as I said. That it ends up with
'BCDE' rather than 'ABCD' is because the 'E' is the least significant
byte of the int, as gcc does it, and the 'A' would have occupied bit 33
upwards.

> Multi-character constants are not useful in portable code, which
> is why some compilers warn about them.

I use them in my language; where it ends up targeting C, then it just
writes out a regular integer constant, either 32 or 64 bits, with just
the arrangement I expect.

C could have made their use a little more practical.

For example when was L'...' introduced'? That could have been specified
with a particular ordering, and, for example, S'...' could have been
used for regular char literals, but now with a language-defined
arrangement (and with an overflow to LL just like a normal constant).

> In hindsight, it might have been useful to have a similar syntax for
> *portable* integer constants composed from byte values -- as long as
> that syntax is distinct from character constants to avoid confusion.
> (I don't have any good ideas for what that might look like.)

See above.

Keith Thompson

unread,

Apr 30, 2020, 6:22:34 PM4/30/20

to

Because 'ABCDE' is a valid character constant with an
implementation-defined value. For that matter, so is 'A'
(though the standard has more to say about its value).

Keith Thompson

unread,

Apr 30, 2020, 6:30:52 PM4/30/20

to

Bart <b...@freeuk.com> writes:
> On 30/04/2020 22:00, Keith Thompson wrote:
>> Bart <b...@freeuk.com> writes:
>>> C with 32-bit ints also limit the length to 4 characters.
>>
>> No it doesn't. gcc does print a warning for 'ABCDE':
>>
>> warning: character constant too long for its type
>>
>> but that's not a required diagnostic, just something that gcc chooses to
>> warn about. It prints a different warning for *any* multi-character
>> constant:
>>
>> warning: multi-character character constant [-Wmultichar]
>>
>> That's not a required diagnostic either. Both 'ABCD' and 'ABCDE'
>> are conforming, with implementation-defined (and not necessarily
>> distinct) values of type int. A compiler that rejects 'ABCDE'
>> because it won't fit in an int would be non-conforming. (gcc gives
>> 'ABCDE' the same value as 'BCDE'.)
>
> So it limits the length to 4 characters as I said. That it ends up
> with 'BCDE' rather than 'ABCD' is because the 'E' is the least
> significant byte of the int, as gcc does it, and the 'A' would have
> occupied bit 33 upwards.

When you said it limits the length to 4 characters, I took that to mean
that 'ABCDE' would be invalid (and you did show a warning).

There is no specified limit on the number of characters in a
multi-character constant. (A compiler can limit logical source lines to
4095 characters, but is not required to impose any limit at all.)

[...]

> For example when was L'...' introduced'?

ANSI C, 1989 (or likely earlier in some implementations).

[...]

James Kuyper

unread,

Apr 30, 2020, 6:36:19 PM4/30/20

to

"implementation-defined value
unspecified value where each implementation documents how the choice is
made" (3.19.1)

"unspecified value
valid value of the relevant type where this International Standard
imposes no requirements on which value is chosen in any instance"

If the implementation-defined value of 'ABCDE' won't fit in an int, then
it doesn't qualify as "a valid value of the relevant type", because the
relevant type is "int".

Bart

unread,

Apr 30, 2020, 7:01:51 PM4/30/20

to

On 30/04/2020 23:30, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:
>> On 30/04/2020 22:00, Keith Thompson wrote:
>>> Bart <b...@freeuk.com> writes:
>>>> C with 32-bit ints also limit the length to 4 characters.
>>>
>>> No it doesn't. gcc does print a warning for 'ABCDE':
>>>
>>> warning: character constant too long for its type
>>>
>>> but that's not a required diagnostic, just something that gcc chooses to
>>> warn about. It prints a different warning for *any* multi-character
>>> constant:
>>>
>>> warning: multi-character character constant [-Wmultichar]
>>>
>>> That's not a required diagnostic either. Both 'ABCD' and 'ABCDE'
>>> are conforming, with implementation-defined (and not necessarily
>>> distinct) values of type int. A compiler that rejects 'ABCDE'
>>> because it won't fit in an int would be non-conforming. (gcc gives
>>> 'ABCDE' the same value as 'BCDE'.)
>>
>> So it limits the length to 4 characters as I said. That it ends up
>> with 'BCDE' rather than 'ABCD' is because the 'E' is the least
>> significant byte of the int, as gcc does it, and the 'A' would have
>> occupied bit 33 upwards.
>
> When you said it limits the length to 4 characters, I took that to mean
> that 'ABCDE' would be invalid (and you did show a warning).
>
> There is no specified limit on the number of characters in a
> multi-character constant.

I don't agree with that. If the extra characters can't be used and
shouldn't be there, then that's an error in my book. DMC agrees, and
further it explicitly states a maximum of 4 characters.

But then gcc takes the same approach to obviously over-long integer
constants, and we're back into this same situation of needing to run gcc
with specially refined collections of options just to get it to do the
obvious thing.

My compiler, for all its other problems, will at least report over-long
char and integer constants as proper errors. So a program with those
faults cannot be compiled unless they are fixed.

fir

unread,

Apr 30, 2020, 7:16:03 PM4/30/20

to

bad antu-open anti-no limit attitude
as for as i think this behaviour of allowing longer may give soem gun usages

for example

int main()
{

printf("ala"); 'this is some kind of comment';

return 'only the last 4 matters so we may write something for fun 2020';
}

fir

unread,

Apr 30, 2020, 7:29:48 PM4/30/20

to

also i think this probably may be used in tiny c contests as i seen in some of thisprograms integrer constants like 5 or maybe even more digits for example
x=6711666 may be turned x='fir' 2 characters spared

Keith Thompson

unread,

Apr 30, 2020, 8:15:55 PM4/30/20

to

Bart <b...@freeuk.com> writes:
> On 30/04/2020 23:30, Keith Thompson wrote:

[...]

>> There is no specified limit on the number of characters in a
>> multi-character constant.
>
> I don't agree with that. If the extra characters can't be used and
> shouldn't be there, then that's an error in my book. DMC agrees, and
> further it explicitly states a maximum of 4 characters.

What exactly do you disagree with?

The C standard does not impose a limit on the number of characters in a
multi-character constant. If you disagree with that, cite the wording
in the standard that imposes such a limit. If you merely think that
it *should* be an error, please make it clearer that you're stating your
personal opinion rather than discussing what the standard says.

DMC is the Digital Mars C compiler, yes? What exactly does it do
with a character constant such as 'ABCDE'? If DMC does not attempt
to be fully conforming by default, feel free to say what it does
by default and what it does in conforming mode.

If it rejects
int n = 'ABCDE';
in a mode in which it claims to conform to the standard, that would be a
bug. (A non-fatal warning would not violate the standard.)

> But then gcc takes the same approach to obviously over-long integer
> constants, and we're back into this same situation of needing to run
> gcc with specially refined collections of options just to get it to do
> the obvious thing.

gcc is not fully conforming by default, and I'm not interested
in getting into that discussion again. The C standard specifies
the type and value of any integer constant, and says that that
an integer constant whose value is outside the range of its type
violates a constraint. I'm not aware of anything gcc does in
this area that is non-conforming (or anything that I object to).
What approach are you talking about?

> My compiler, for all its other problems, will at least report
> over-long char and integer constants as proper errors. So a program
> with those faults cannot be compiled unless they are fixed.

Are you referring to your C compiler?

You are not required to make your compiler conform to the C standard,
and you apparently have chosen not to do so.

Bart

unread,

Apr 30, 2020, 8:48:09 PM4/30/20

to

On 01/05/2020 01:15, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:
>> On 30/04/2020 23:30, Keith Thompson wrote:
> [...]
>>> There is no specified limit on the number of characters in a
>>> multi-character constant.
>>
>> I don't agree with that. If the extra characters can't be used and
>> shouldn't be there, then that's an error in my book. DMC agrees, and
>> further it explicitly states a maximum of 4 characters.
>
> What exactly do you disagree with?
>
> The C standard does not impose a limit on the number of characters in a
> multi-character constant. If you disagree with that, cite the wording
> in the standard that imposes such a limit.

'6.4.4.4p10 An integer character constant has type int.'

That suggests that you cannot practically have more than sizeof(int)
characters in a character constant. Practically in being able to store
and remember the values.

You're probably thinking of being allowed to write any number of
characters between '...' quotes, subject to receiving or inhibiting a
warning, but you can only store and remember 4 or those, either the
first 4 or the last 4.

My compiler and at least DMC don't allow that, in my case as I can't see
the point. (I do allow 8, as then the type is 'long long int', but that
makes code non-portable outside my compiler, so it's not useful, other
than to show off.)

James Kuyper

unread,

Apr 30, 2020, 9:38:33 PM4/30/20

to

On Thursday, April 30, 2020 at 8:48:09 PM UTC-4, Bart wrote:
> On 01/05/2020 01:15, Keith Thompson wrote:
> > Bart <b...@freeuk.com> writes:
> >> On 30/04/2020 23:30, Keith Thompson wrote:
> > [...]
> >>> There is no specified limit on the number of characters in a
> >>> multi-character constant.
> >>
> >> I don't agree with that. If the extra characters can't be used and
> >> shouldn't be there, then that's an error in my book. DMC agrees, and
> >> further it explicitly states a maximum of 4 characters.
> >
> > What exactly do you disagree with?
> >
> > The C standard does not impose a limit on the number of characters in a
> > multi-character constant. If you disagree with that, cite the wording
> > in the standard that imposes such a limit.
>
> '6.4.4.4p10 An integer character constant has type int.'
>
> That suggests that you cannot practically have more than sizeof(int)
> characters in a character constant. Practically in being able to store
> and remember the values.

A fully conforming implementation of C is not permitted to reject any
sequence of characters in a multi-character literal by reason of that
sequence being too long. It is required to map each such sequence to a
valid int value, which must be storable. What this means is that the
number of different values such constants can have is MUCH smaller than
the number of different permitted sequences. It does NOT mean that the
sequences are in any way prohibited.

> You're probably thinking of being allowed to write any number of
> characters between '...' quotes, subject to receiving or inhibiting a
> warning, but you can only store and remember 4 or those, either the
> first 4 or the last 4.

Or the middle four, or the 1st, 4th, 9th, and 16th. Or it could take
the first bit of the first character, the second bit of the second
character etc, wrapping around to the first bit of the 9th character. Or
it could use any other mapping an implementation chooses between the
input character sequences and their values. A conforming implementation
could map every sequence that corresponds to the name of a number into
that number: 'twentytwo' could have the value 22.

Note: I don't approve of this - I'd prefer that the mapping be standard-
defined. But you're not fully appreciating how open-ended the C
standard's specification for such constants is, if you think that "the
first 4 or the last 4" are the only possibilities.

Keith Thompson

unread,

Apr 30, 2020, 9:55:52 PM4/30/20

to

Bart <b...@freeuk.com> writes:
> On 01/05/2020 01:15, Keith Thompson wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 30/04/2020 23:30, Keith Thompson wrote:
>> [...]
>>>> There is no specified limit on the number of characters in a
>>>> multi-character constant.
>>>
>>> I don't agree with that. If the extra characters can't be used and
>>> shouldn't be there, then that's an error in my book. DMC agrees, and
>>> further it explicitly states a maximum of 4 characters.
>>
>> What exactly do you disagree with?
>>
>> The C standard does not impose a limit on the number of characters in a
>> multi-character constant. If you disagree with that, cite the wording
>> in the standard that imposes such a limit.
>
> '6.4.4.4p10 An integer character constant has type int.'
>
> That suggests that you cannot practically have more than sizeof(int)
> characters in a character constant. Practically in being able to store
> and remember the values.

If the C standard were to impose a limit, it wouldn't and shouldn't
do so by some vague suggestion. For there to be an actual limit
on the number of characters in a multi-character constant, there
would have to be either a syntax rule or a constraint that 'ABCDE'
would violate. There isn't.

Imagine an implementation in which the value of '11001001' is 201 (i.e.,
it's binary). That might be silly, but it would be conforming. So
would an implementation where 'three' == 3. It's well within what
"implementation-defined" means, as long as it's documented.

The reason the standard makes the value implementation-defined is
that, when the standard was being written, there were implementations
that did it differently. Requiring the value to be sensible,
say by restricting it a choice from what actual compilers did,
would have been substantially more effort with no real benefit.
Making the value simply implementation-defined means that (a) you
can use multi-character constants in non-portable code, and (b)
you can't meaningfully use them in portable code. Banning 'ABCDE'
when int is 32 bits would not have changed that.

From a programmer's point of view, the solution is simple: Don't use
multi-character constants. From an implementer's point of view, the
solution is equally simple: Implement the requirements from the
standard, and document how the value is determined.

> You're probably thinking of being allowed to write any number of
> characters between '...' quotes, subject to receiving or inhibiting a
> warning, but you can only store and remember 4 or those, either the
> first 4 or the last 4.

Sure, that's one possibility. But what I was actually thinking of is
simply that the value of a multi-character constant is
implementation-defined. The standard doesn't even suggest that the
value should be computed from the values of the characters. I
understand that you *want* the value to be computed in some sensible way
from the values of the individual characters, but the standard doesn't
require or suggest it.

Just in case you misunderstood, I am *not* suggesting that a character
constant like 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' is useful, merely that the
language standard does not permit a conforming compiler to reject it.
I don't want my compiler to reject code because it thinks it's icky --
unless I've asked it to.

> My compiler and at least DMC don't allow that, in my case as I can't
> see the point. (I do allow 8, as then the type is 'long long int', but
> that makes code non-portable outside my compiler, so it's not useful,
> other than to show off.)

Again, you're under no obligation to make your C compiler conforming.

You snipped my questions about exactly what DMC does. I'm still curious
about that.

Siri Cruise

unread,

Apr 30, 2020, 11:13:45 PM4/30/20

to

In article <87tv104...@nosuchdomain.example.com>,

Keith Thompson <Keith.S.T...@gmail.com> wrote:

> > I don't agree with that. If the extra characters can't be used and
> > shouldn't be there, then that's an error in my book. DMC agrees, and
> > further it explicitly states a maximum of 4 characters.
>
> What exactly do you disagree with?

4 character literals were important part of the original Mac
system, so they made sure their compiler compiled what they
wanted. Thereafter any C compiler that wanted to be used on a Mac
had to conform to Apple's interpretation.

In a previous life I worked on CDC NOS and it also used two or
three character literals; that meant when looking octal dumps
which included the characters, you had the letters right there
rather an integer you had to look up. The Apple interpretation
makes practical sense even if you younglings never have to look
at a hex dump.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
The first law of discordiamism: The more energy This post / \
to make order is nore energy made into entropy. insults Islam. Mohammed

Bart

unread,

May 1, 2020, 6:35:32 AM5/1/20

to

That's an interesting interpretation.

Does the standard say that two instances of 'ABC' in the source code
must have the same value, whatever that value happens to be?

In other words, must 'ABC'=='ABC' always be true?

If not, then a compiler could just call rand() for the value of 'ABC',
since nothing in the standard says that that would be wrong.

But then, I'm sure this is not what the standard had in mind when it
says it's 'implementation defined'.

All the C compilers I've tried, on my machine and online, will take
either the first 4 or last 4 characters of 'ABCDE', and put the four
character codes into the 4 bytes of the int result, in either order.

ALL of them are discarding excess characters. Most of them do not regard
that as an error. Some of them might allow that behaviour to be changed.

> From a programmer's point of view, the solution is simple: Don't use
> multi-character constants.

But that is not useful. If nobody should be using them, then what's the
point of the language allowing them at all? They should have been
deprecated long ago.

Not many programs use them BECAUSE the results are so poorly defined.

> Just in case you misunderstood, I am *not* suggesting that a character
> constant like 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' is useful, merely that the
> language standard does not permit a conforming compiler to reject it.
> I don't want my compiler to reject code because it thinks it's icky --
> unless I've asked it to.

You've got that backwards. You WANT it to reject the code unless you
asked it to accept it.

> My compiler and at least DMC don't allow that, in my case as I can't
>> see the point. (I do allow 8, as then the type is 'long long int', but
>> that makes code non-portable outside my compiler, so it's not useful,
>> other than to show off.)
>
> Again, you're under no obligation to make your C compiler conforming.
>
> You snipped my questions about exactly what DMC does. I'm still curious
> about that.

I said it rejects 'ABCDE' because there are more than 4 characters.

I've no idea what mode it's in or how to change it. The following are
the set of its options; any ideas as to which I might try? -A didn't
make any difference.

--------------------------------------
Digital Mars Compiler Version 8.42n
Copyright (C) Digital Mars 2000-2004. All Rights Reserved.
Written by Walter Bright www.digitalmars.com/ctg/sc.html
DMC is a one-step program to compile and link C++, C and ASM files.
Usage ([] means optional, ... means zero or more):

DMC file... [flags...] [@respfile]

file... .CPP, .C or .ASM source, .OBJ object or .LIB library file name
@respfile... pick up arguments from response file or environment variable
flags... one of the following:
-a[1|2|4|8] alignment of struct members -A strict ANSI C/C++
-Aa enable new[] and delete[] -Ab enable bool
-Ae enable exception handling -Ar enable RTTI
-B[e|f|g|j] message language: English, French, German, Japanese
-c skip the link, do compile only -cpp source files are C++
-cod generate .cod (assembly) file -C no inline function expansion
-d generate .dep (make dependency) file
-D #define DEBUG 1 -Dmacro[=text] define macro
-e show results of preprocessor -EC do not elide comments
-EL #line directives not output -f IEEE 754 inline 8087 code
-fd work around FDIV problem -ff fast inline 8087 code
-g generate debug info
-gf disable debug info optimization -gg make static functions global
-gh symbol info for globals -gl debug line numbers only
-gp generate pointer validations -gs debug symbol info only
-gt generate trace prolog/epilog -GTnnnn set data threshold to nnnn
-H use precompiled headers (ph) -HDdirectory use ph from directory
-HF[filename] generate ph to filename -HHfilename read ph from filename
-HIfilename #include "filename" -HO include files only once
-HS only search -I directories -HX automatic precompiled headers

-Ipath #include file search path -j[0|1|2] Asian language characters
0: Japanese 1: Taiwanese and Chinese 2: Korean
-Jm relaxed type checking -Ju char==unsigned char
-Jb no empty base class optimization -J chars are unsigned
-l[listfile] generate list file -L using non-Digital Mars linker
-Llink specify linker to use -L/switch pass /switch to linker
-Masm specify assembler to use -M/switch pass /switch to assembler
-m[tsmclvfnrpxz][do][w][u] set memory model
s: small code and data m: large code, small data
c: small code, large data l: large code and data
v: VCM r: Rational 16 bit DOS Extender
p: Pharlap 32 bit DOS Extender x: DOSX 32 bit DOS Extender
z: ZPM 16 bit DOS Extender f: OS/2 2.0 32 bit
t: .COM file n: Windows 32s/95/98/NT/2000/ME/XP
d: DOS 16 bit o: OS/2 16 bit
w: SS != DS u: reload DS
-Nc function level linking -NL no default library
-Ns place expr strings in code seg -NS new code seg for each function
-NTname set code segment name -NV vtables in far data
-o[-+flag] run optimizer with flag -ooutput output filename
-p turn off autoprototyping -P default to pascal linkage
-Pz default to stdcall linkage -r strict prototyping
-R put switch tables in code seg -s stack overflow checking
-S always generate stack frame -u suppress predefined macros
-v[0|1|2] verbose compile -w suppress all warnings
-wc warn on C style casts
-wn suppress warning number n -wx treat warnings as errors
-W{0123ADabdefmrstuvwx-+} Windows prolog/epilog
-WA Windows EXE
-WD Windows DLL
-x turn off error maximum -XD instantiate templates
-XItemp<type> instantiate template class temp<type>
-XIfunc(type) instantiate template function func(type)
-[0|2|3|4|5|6] 8088/286/386/486/Pentium/P6 code

James Kuyper

unread,

May 1, 2020, 8:47:58 AM5/1/20

to

No.

> In other words, must 'ABC'=='ABC' always be true?
>
> If not, then a compiler could just call rand() for the value of 'ABC',
> since nothing in the standard says that that would be wrong.
>
> But then, I'm sure this is not what the standard had in mind when it
> says it's 'implementation defined'.

The standard doesn't have a mind. The committee has multiple minds. The
committee issued a Rationale which fails to address why the committee
made this choice, so the best we can do is guess, and I don't see any
point in arguing about my guess vs. your guess. My personal guess is
that the committee felt that the variety of behaviors of existing
implementations was too broad to justify saying anything more specific
than "implementation defined".

> All the C compilers I've tried, on my machine and online, will take
> either the first 4 or last 4 characters of 'ABCDE', and put the four
> character codes into the 4 bytes of the int result, in either order.

Yes, I'm sure that those are the most common ways it is implemented. The
point is that those are not the only ways it is permitted to be
implemented.

> ALL of them are discarding excess characters. Most of them do not regard
> that as an error. Some of them might allow that behaviour to be changed.

A conforming implementation must accept 'ABCDE' as a valid integer
constant and assign it a valid int value; it is also permitted to issue
a warning message. Treating it as an error would render the
implementation non-conforming.

> > From a programmer's point of view, the solution is simple: Don't use
> > multi-character constants.
>
> But that is not useful. If nobody should be using them, then what's the
> point of the language allowing them at all? They should have been
> deprecated long ago.

I'd prefer a standard-defined meaning over deprecation. Being able to
use 'STOP' as an integer constant has advantages similar to those
provided by an enumeration constant named STOP, but with the added
advantage that you can also read 'STOP' in text memory dumps.

> Not many programs use them BECAUSE the results are so poorly defined.

I don't fail to use them because they're poorly defined. I don't use
them because they're implementation-defined. It doesn't matter how well
any particular implementation defines them; it doesn't matter even if
every implementation were to provide a good definition. The fact that it
can be a different definition on different implementations is a show-
stopper for me. Other people who are less concerned with portability
will have different preferences. I gather from other messages in this
thread that such constants were commonly used on Apple systems.

> > Just in case you misunderstood, I am *not* suggesting that a character
> > constant like 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' is useful, merely that the
> > language standard does not permit a conforming compiler to reject it.
> > I don't want my compiler to reject code because it thinks it's icky --
> > unless I've asked it to.
>
> You've got that backwards. You WANT it to reject the code unless you
> asked it to accept it.

How in the world can you claim to know that Keith is wrong when
describing what HE wants?

You should have said "What I want is for it to reject code unless I

asked it to accept it."

> > My compiler and at least DMC don't allow that, in my case as I can't
> >> see the point. (I do allow 8, as then the type is 'long long int', but
> >> that makes code non-portable outside my compiler, so it's not useful,
> >> other than to show off.)
> >
> > Again, you're under no obligation to make your C compiler conforming.
> >
> > You snipped my questions about exactly what DMC does. I'm still curious
> > about that.
>
> I said it rejects 'ABCDE' because there are more than 4 characters.

That's not quite specific enough. When you say "it rejects 'ABCDE'", do
you mean merely that it generates a warning message? That would be fully
conforming behavior, and is not what I would call rejection, though I
know from past experience that you often think of it as rejection. Or do
you mean that it doesn't produce an object file and the compiler returns
an unsuccessful termination status to the operating system? That's what
I would mean if I used the term "rejects". If that's what DMC does, that
means that DMC fails to conform to the C standard.

fir

unread,

May 1, 2020, 9:15:58 AM5/1/20

to

W dniu piątek, 1 maja 2020 14:47:58 UTC+2 użytkownik James Kuyper napisał:
>
> I'd prefer a standard-defined meaning over deprecation. Being able to
> use 'STOP' as an integer constant has advantages similar to those
> provided by an enumeration constant named STOP, but with the added
> advantage that you can also read 'STOP' in text memory dumps.
>

main advantage is not a mempry dumps but the fact you can use it as an 'ad hoc' constant which you dont need to define
which is realy tremendous advantage,
it is also vey quick to compare etc,
if defined uniformly it is same acros platforms in whole uniwerse etc

bit sad its not so storage efficient
as only like 26+26+10+some = about 64 to 96 / 256 is normally printable (in fact more maybe im not sure how it would like including utf-8) is printable, but probably it is standable,

biggest problem is here that this 8-letter long dont work

Bart

unread,

May 1, 2020, 10:53:37 AM5/1/20

to

On 01/05/2020 13:47, James Kuyper wrote:
> On Friday, May 1, 2020 at 6:35:32 AM UTC-4, Bart wrote:

>> Does the standard say that two instances of 'ABC' in the source code
>> must have the same value, whatever that value happens to be?
>
> No.

>> In other words, must 'ABC'=='ABC' always be true?

Presumably No to that too.

So, if someone uses multi-char constants, without knowing which exact
compiler their code will run on, those constants might as well be
written as random numbers. Even if you know the OS, platform and hardware.

That doesn't sound right. It's effectively a wild-card feature in the
language: someone writes /anything/ inside '...' quotes, and the
implementation can do what it likes (perhaps it could be shell commands
to be executed), except it must yield an integer value, eg. 0:

'run: "format c:"'; // same as 0;
'this is a comment'; // same as 0;

These two also demonstrate that /different/ contents of the char literal
could have the same value:

if ('ABC' == 'ABC') // could be False (see above)
if ('ABC' == 'XYZ') // could be True

Crazy, but C allows it!

> Yes, I'm sure that those are the most common ways it is implemented. The
> point is that those are not the only ways it is permitted to be
> implemented.

Keith suggested some ways in which such a constant could plausibly have
more then 4 characters. But they don't apply in the examples I tried
where only 4 characters will be significant.

>> But that is not useful. If nobody should be using them, then what's the
>> point of the language allowing them at all? They should have been
>> deprecated long ago.
>
> I'd prefer a standard-defined meaning over deprecation. Being able to
> use 'STOP' as an integer constant has advantages similar to those
> provided by an enumeration constant named STOP, but with the added
> advantage that you can also read 'STOP' in text memory dumps.

I suggested that could have been done at the same time as L"..." was
introduced. You can obviously see the advantages better than David
Brown. (Although, as gcc does it, the memory dump will likely show
'POTS' not 'STOP'!)

>> You've got that backwards. You WANT it to reject the code unless you
>> asked it to accept it.
>
> How in the world can you claim to know that Keith is wrong when
> describing what HE wants?

I meant a generic 'you'. Isn't it better to work in a fail-safe manner
by default, unless told otherwise? The alternative is to be aware of the
million things which a compiler may let through but which could be a
serious error on the part of the coder.

>> I said it rejects 'ABCDE' because there are more than 4 characters.
>
> That's not quite specific enough. When you say "it rejects 'ABCDE'", do
> you mean merely that it generates a warning message? That would be fully
> conforming behavior, and is not what I would call rejection, though I
> know from past experience that you often think of it as rejection. Or do
> you mean that it doesn't produce an object file and the compiler returns
> an unsuccessful termination status to the operating system? That's what
> I would mean if I used the term "rejects". If that's what DMC does, that
> means that DMC fails to conform to the C standard.

It says lexical error (I thought I'd posted it, but maybe I then edited
it out):

Lexical error: max of 4 characters in string exceeded

and doesn't produce an object file. It that's non-conforming then I
don't care; it's good that Walter Bright (the author of DMC and inventor
of D) seems to share my opinion. (However, he appears to have dropped
multi-char literals altogether in D; not so good.)

Richard Damon

unread,

May 1, 2020, 12:39:06 PM5/1/20

to

On 5/1/20 10:53 AM, Bart wrote:
> On 01/05/2020 13:47, James Kuyper wrote:
>> On Friday, May 1, 2020 at 6:35:32 AM UTC-4, Bart wrote:
>
>>> Does the standard say that two instances of 'ABC' in the source code
>>> must have the same value, whatever that value happens to be?
>>
>> No.
>
>
>>> In other words, must 'ABC'=='ABC' always be true?
>
> Presumably No to that too.
>
> So, if someone uses multi-char constants, without knowing which exact
> compiler their code will run on, those constants might as well be

> written as random numbers. Even if you know the OS, platform and hardware.,

That is probably a somewhat reasonable statement. But is actually true
for a LOT of code. There are many parts of the C language that have
implementation defined behavior in parts of them, and the use of any of
those, without knowing the implementation behavior, puts you in a
similar situation.

>
> That doesn't sound right. It's effectively a wild-card feature in the
> language: someone writes /anything/ inside '...' quotes, and the
> implementation can do what it likes (perhaps it could be shell commands
> to be executed), except it must yield an integer value, eg. 0:

The implementation must document the process fully.

>
>     'run: "format c:"';        // same as 0;
>     'this is a comment';       // same as 0;
>
> These two also demonstrate that /different/ contents of the char literal
> could have the same value:
>
>      if ('ABC' == 'ABC')    // could be False (see above)
>      if ('ABC' == 'XYZ')    // could be True

I am not sure if the intent of the clause is to allow the 'same' literal
to end up with different values, but the second case sort of needs to be
allowed, as there are more allowed literals (since there length isn't
constrained) then there are possible int values. I would say it is
unlikely for you particular one would happen (unless it just defined all
'too long' literals to be 0)

My memory was that it used to be possible for a conforming compiler to
reject multi-character literals, but I could be mistaken. I might make
sense to revise the standard to ALLOW a compiler to define as part of
its 'implementation definition' for multi-character literals to define a
point where the literal is considered ill-formed and the program is now
rejectable. After all, having a constant in your code that you can't say
anything about doesn't seem very useful.

If the Standard wanted to make them more useful, perhaps it could do
something like for literals with no more than sizeof(int) characters,
each such literal needed to produce a unique value (I don't think it can
define that value, as we have multiple choices in the wild), and maybe
even could define some function that takes a 'short' string and produces
the corresponding character literal value, something like

tochar("FOO") == 'Foo'

Keith Thompson

unread,

May 1, 2020, 12:51:50 PM5/1/20

to

You have a copy of the standard, or at least a recent draft. The
section on character constants is not very long.

No, the standard doesn't say that. An implementation with 'ABC' !=
'ABC' would be perverse but conforming.

> In other words, must 'ABC'=='ABC' always be true?

I don't think so, but again, such an implementation would be perverse.
The standard does not, and cannot, require implementations to behave
sensibly. (I can imagine an implementation doing this deliberately to
discourage the use of multi-character constants. I do not suggest that
would be a good idea.)

> If not, then a compiler could just call rand() for the value of 'ABC',
> since nothing in the standard says that that would be wrong.

Sure. Of course it would have to call rand() at compile time.

> But then, I'm sure this is not what the standard had in mind when it
> says it's 'implementation defined'.

The standard defines what it means by "implementation-defined".

> All the C compilers I've tried, on my machine and online, will take
> either the first 4 or last 4 characters of 'ABCDE', and put the four
> character codes into the 4 bytes of the int result, in either order.
>
> ALL of them are discarding excess characters. Most of them do not
> regard that as an error. Some of them might allow that behaviour to be
> changed.

And those that regard it as an error, assuming that means rejecting the
translation unit, are non-conforming. Why is this so difficult for you
to understand?

>> From a programmer's point of view, the solution is simple: Don't use
>> multi-character constants.
>
> But that is not useful. If nobody should be using them, then what's
> the point of the language allowing them at all? They should have been
> deprecated long ago.

I wouldn't mind if they were deprecated, but it wouldn't make much
difference.

> Not many programs use them BECAUSE the results are so poorly defined.

Implementation defined, not poorly defined. But yes.

They exist in C because they've existed since the first implementations.
The 1974 and 1975 C reference manuals say:

Character constants with more than one character are inherently
machine-dependent and should be avoided.

The B reference manual from 1972 says:

A character constant is represented by ' followed by one or two
characters (possibly escaped) followed by another '. It has an
rvalue equal to the value of the characters packed and right
adjusted.

There's existing code that uses them. That code is not portable, but
it's conforming, and it *works* as long as it's compiled with a compiler
that assigns the expected implementation-defined values. Removing them
from the language would have broken that code.

>> Just in case you misunderstood, I am *not* suggesting that a character
>> constant like 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' is useful, merely that the
>> language standard does not permit a conforming compiler to reject it.
>> I don't want my compiler to reject code because it thinks it's icky --
>> unless I've asked it to.
>
> You've got that backwards. You WANT it to reject the code unless you
> asked it to accept it.

DO NOT presume to tell me what I want.

What I want is a compiler that conforms to the standard. That means not
rejecting valid code. If I want stricter diagnostics, I can ask for
them.

You're free to want whatever you like. If you want a non-conforming
compiler that rejects multi-character constants that are too long,
or even one that rejects *all* multi-character constants, that's
fine. If a conforming compiler provides an option that makes it
non-conforming, that's fine too. If it enables such an option by
default, I dislike it, but I'll accept it as long as there's a way
to disable it.

>> My compiler and at least DMC don't allow that, in my case as I can't
>>> see the point. (I do allow 8, as then the type is 'long long int', but
>>> that makes code non-portable outside my compiler, so it's not useful,
>>> other than to show off.)
>>
>> Again, you're under no obligation to make your C compiler conforming.
>>
>> You snipped my questions about exactly what DMC does. I'm still curious
>> about that.
>
> I said it rejects 'ABCDE' because there are more than 4 characters.

I don't know what you mean by "rejects". But I have DMC installed
on my Windows laptop (which I wasn't using when I posted previously).
Here's what I got:

$ cat c.c
#include <stdio.h>
int main(void) {
printf("'ABCDE' = %d\n", 'ABCDE');
}
$ dmc c.c
printf("'ABCDE' = %d\n", 'ABCDE');
^
c.c(3) : Lexical error: max of 4 characters in string exceeded
--- errorlevel 1
$

So yes, it does reject a translation unit that contains a multicharacter
constant with more than 4 characters (with a misleading error message.)

I'm not saying this is a serious problem with DMC, but it is
non-conforming behavior.

[...]

Keith Thompson

unread,

May 1, 2020, 12:58:08 PM5/1/20

to

Bart <b...@freeuk.com> writes:
[...]

> It says lexical error (I thought I'd posted it, but maybe I then
> edited it out):
>
> Lexical error: max of 4 characters in string exceeded
>
> and doesn't produce an object file. It that's non-conforming then I
> don't care; it's good that Walter Bright (the author of DMC and
> inventor of D) seems to share my opinion. (However, he appears to have
> dropped multi-char literals altogether in D; not so good.)

You don't care that it's non-conforming. That's fine. But if you
don't care, don't waste everyone's time arguing with me when I say
that it's non-conforming.

Message has been deleted

Bart

unread,

May 1, 2020, 3:50:58 PM5/1/20

to

On 01/05/2020 17:57, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:
> [...]
>> It says lexical error (I thought I'd posted it, but maybe I then
>> edited it out):
>>
>> Lexical error: max of 4 characters in string exceeded
>>
>> and doesn't produce an object file. It that's non-conforming then I
>> don't care; it's good that Walter Bright (the author of DMC and
>> inventor of D) seems to share my opinion. (However, he appears to have
>> dropped multi-char literals altogether in D; not so good.)
>
> You don't care that it's non-conforming. That's fine. But if you
> don't care, don't waste everyone's time arguing with me when I say
> that it's non-conforming.

I don't think it's me doing the time-wasting.

I said this up-thread:

BC:"C with 32-bit ints also limit the length to 4 characters."

which you decided to dispute by pedantically saying that C allows more
than 4 characters, even though only 4 will be significant.

How will that information be of use to anyone?

C doesn't magically let them write "if (cmd == 'ABCDEF')" in the same
way that the same implementation allows "if(cmd == 'ABCD')", and most
implementations don't even properly point out that the former will never
work as intended.

To repeat: for all practical purposes, 32-bit C compilers that
meaningfully support 'ABC' constants at all, limit them to 'ABCD'.

Keith Thompson

unread,

May 1, 2020, 4:11:15 PM5/1/20

to

Bart <b...@freeuk.com> writes:
[...]

> I said this up-thread:
>
> BC:"C with 32-bit ints also limit the length to 4 characters."
>
> which you decided to dispute by pedantically saying that C allows more
> than 4 characters, even though only 4 will be significant.
>
> How will that information be of use to anyone?

It is of use *to most people here* to understand what the C standard
actually says. I understand that you don't consider that to be of any
use to you. I don't understand why you insist on discussing something
that you claim not to care about, or why you seem to be unable to deal
with the fact that other people *do* care about it.

And I never said that 4 characters will be significant. It's very
likely that 4 characters will be significant (for an implementation with
32-bit int and 8-bit char), but the standard doesn't guarantee it.

> C doesn't magically let them write "if (cmd == 'ABCDEF')" in the same
> way that the same implementation allows "if(cmd == 'ABCD')", and most
> implementations don't even properly point out that the former will
> never work as intended.
>
> To repeat: for all practical purposes, 32-bit C compilers that
> meaningfully support 'ABC' constants at all, limit them to 'ABCD'.

It depends on what you mean by "limit", something that you never made
clear. Some compilers "limit" them by ignoring the extra characters,
perhaps with a warning. Some "limit" them by rejecting them (which is
non-conforming behavior).

My point is this: A conforming C compiler must accept 'ABCDEF' as a
valid character constant. Equivalently, a C compiler that rejects
'ABCDEF' (meaning that it fails to translate a translation unit
containing it) is non-conforming.

If you don't care about that fact, that's fine.
If you agree with it, that's great.
If you disagree with it, you're wrong.

And if you want to drop this, now would be a good time.

James Kuyper

unread,

May 1, 2020, 7:08:44 PM5/1/20

to

On Friday, May 1, 2020 at 10:53:37 AM UTC-4, Bart wrote:
> On 01/05/2020 13:47, James Kuyper wrote:
> > On Friday, May 1, 2020 at 6:35:32 AM UTC-4, Bart wrote:
>
> >> Does the standard say that two instances of 'ABC' in the source code
> >> must have the same value, whatever that value happens to be?
> >
> > No.
>
>
> >> In other words, must 'ABC'=='ABC' always be true?
>
> Presumably No to that too.

"too"? That's the same question, as you yourself noted when you wrote
"in other words". I saw no need to repeat the answer.

> So, if someone uses multi-char constants, without knowing which exact
> compiler their code will run on, those constants might as well be
> written as random numbers. Even if you know the OS, platform and hardware.

Correct - what you need to know is the implementation, not the OS,
platform, or hardware (except insofar as the behavior defined by the
implementation depends upon those things).

...

> if ('ABC' == 'ABC') // could be False (see above)
> if ('ABC' == 'XYZ') // could be True
>
> Crazy, but C allows it!

It's stronger than that. For character constants longer than
sizeof(int), it's not merely allowed, it's mandatory. The number of
different possible character constants is larger than the number of
different int values - it's only possible to meet the standard's
requirements by giving the same value to multiple constants.

The C standard also allows an implementation of C to generate code for

int i = 1;
int j = i;

which requires four millennia to execute. It's far more useful to pay
attention to what requirements the C standard does impose, than on those
it doesn't impose. For one thing, there's far fewer of them.

...

> > I'd prefer a standard-defined meaning over deprecation. Being able to
> > use 'STOP' as an integer constant has advantages similar to those
> > provided by an enumeration constant named STOP, but with the added
> > advantage that you can also read 'STOP' in text memory dumps.
>
> I suggested that could have been done at the same time as L"..." was
> introduced. You can obviously see the advantages better than David
> Brown. (Although, as gcc does it, the memory dump will likely show
> 'POTS' not 'STOP'!)

It's a minor advantage; I can understand David not considering it to be
sufficient justification for retaining the feature.

> >> You've got that backwards. You WANT it to reject the code unless you
> >> asked it to accept it.
> >
> > How in the world can you claim to know that Keith is wrong when
> > describing what HE wants?
>
> I meant a generic 'you'. Isn't it better to work in a fail-safe manner
> by default, unless told otherwise? The alternative is to be aware of the
> million things which a compiler may let through but which could be a
> serious error on the part of the coder.

When I use a compiler in a mode where it's supposed to conform to the
requirements of a given version of the C standard, that's precisely what
I want to see. I don't mind if it gives warnings about things that seem
dangerous, but if the only reason C code fails to be strictly conforming
is unspecified behavior (implementation-defined behavior is a subset of
unspecified behavior), then the C standard requires (see 4p3) that the
program act in accordance with 5.1.2.3. Rejecting such code does not
qualify as acting in accordance with 5.1.2.3, and is therefore very
definitely not what I want to see.

David Brown

unread,

May 2, 2020, 9:22:25 AM5/2/20

to

On 01/05/2020 18:51, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:

>> Does the standard say that two instances of 'ABC' in the source code
>> must have the same value, whatever that value happens to be?
>
> You have a copy of the standard, or at least a recent draft. The
> section on character constants is not very long.
>
> No, the standard doesn't say that. An implementation with 'ABC' !=
> 'ABC' would be perverse but conforming.
>
>> In other words, must 'ABC'=='ABC' always be true?
>
> I don't think so, but again, such an implementation would be perverse.
> The standard does not, and cannot, require implementations to behave
> sensibly. (I can imagine an implementation doing this deliberately to
> discourage the use of multi-character constants. I do not suggest that
> would be a good idea.)
>
>> If not, then a compiler could just call rand() for the value of 'ABC',
>> since nothing in the standard says that that would be wrong.
>
> Sure. Of course it would have to call rand() at compile time.
>
>> But then, I'm sure this is not what the standard had in mind when it
>> says it's 'implementation defined'.
>
> The standard defines what it means by "implementation-defined".
>

I've always assumed that implementation-defined behaviour had to be
consistent - that any two uses of the same code should have the same
results. But after reading this thread, and then re-reading the
definition in the standard, I can see nothing that /requires/
implementation-specific behaviour to be consistent, as long as it is
properly documented.

For many of the implementation-defined behaviours in C, inconsistent
behaviour would make the compiler virtually useless (imagine a compiler
that had varying sizes for "int"!).

Are there any known cases of implementation-defined behaviour of any
sort that is inconsistent in real-world compilers? I can think of a few
that would be outside the control of the compiler (such as "The effect
of program termination in a freestanding environment")

Richard Damon

unread,

May 2, 2020, 12:14:01 PM5/2/20

to

I've always assumed (but admit the wording is totally compelling) that
that the documentation had to be detailed enough that the programmer
could at least theoretically know what the compiler was going to do.
That means that it can't say things like 'arbitrarily'. This doesn't
mean that each instance of the literal will generate exactly the same
value, but the documentation would need to detail HOW it generates each
of the different values.

In my mind, your comment about what happens as the effect of program
termination in a freestanding environment is controlled by the
implementation, as the final exit goes into the support run time of the
compiler (that thing that called main). It might end just as a jmp $
command, or it might return to some part of the environment, so it could
be documented as jumping to address xxxx or initiating a reset, which
does fully define what it does, until it isn't under the control of the
implementation.

Vir Campestris

unread,

May 3, 2020, 4:23:51 PM5/3/20

to

On 30/04/2020 21:54, Lew Pitcher wrote:
> A spot check of the various EBCDIC character sets (yes, there's more than
> one EBCDIC character set) shows that '%$' would usually translate to 0x6c5b.

I haven't used EBCDIC in 30 years. I could remember AB, and ICBA to look
the other chars up :)

Andy

Lew Pitcher

unread,

May 3, 2020, 4:48:54 PM5/3/20

to

On May 3, 2020 16:23, Vir Campestris wrote:

> On 30/04/2020 21:54, Lew Pitcher wrote:
>> A spot check of the various EBCDIC character sets (yes, there's more than
>> one EBCDIC character set) shows that '%$' would usually translate to
>> 0x6c5b.
>
> I haven't used EBCDIC in 30 years.

And, I haven't used it in about 12 years.

> I could remember AB, and ICBA to look
> the other chars up :)

In one of my past responsibilities, I worked on interop software between our
host S390 (and later zSeries) systems and our frontend Windows and Unix
platforms and became an SME (Subject Matter Expert) on the ins and outs of
characterset translation between ASCII/Microsoft-extended-ASCII/Unicode and
EBCDIC (EBCDIC-INT, EBCDIC-US, and some others). I've had to explain to more
than one executive|programmer|analyst why their data didn't transport
"properly" between platforms. I got to be expert at looking up characters
and reporting on /why/ they didn't "look right" on the host systems.

--
Lew Pitcher
"In Skills, We Trust"

Les Cargill

unread,

May 4, 2020, 1:13:49 AM5/4/20

to

Richard Damon wrote:
> On 5/2/20 9:22 AM, David Brown wrote:
>> On 01/05/2020 18:51, Keith Thompson wrote:
>>> Bart <b...@freeuk.com> writes:
>>
>>>> But then, I'm sure this is not what the standard had in mind when it
>>>> says it's 'implementation defined'.
>>>
>>> The standard defines what it means by "implementation-defined".
>>>
>> I've always assumed that implementation-defined behaviour had to be
>> consistent - that any two uses of the same code should have the same
>> results. But after reading this thread, and then re-reading the
>> definition in the standard, I can see nothing that /requires/
>> implementation-specific behaviour to be consistent, as long as it is
>> properly documented.
>>
>> For many of the implementation-defined behaviours in C, inconsistent
>> behaviour would make the compiler virtually useless (imagine a compiler
>> that had varying sizes for "int"!).
>>
>> Are there any known cases of implementation-defined behaviour of any
>> sort that is inconsistent in real-world compilers? I can think of a few
>> that would be outside the control of the compiler (such as "The effect
>> of program termination in a freestanding environment")
>>
>>
>
> I've always assumed (but admit the wording is totally compelling) that
> that the documentation had to be detailed enough that the programmer
> could at least theoretically know what the compiler was going to do.

There's always reading the generated assembler code.

> That means that it can't say things like 'arbitrarily'. This doesn't
> mean that each instance of the literal will generate exactly the same
> value, but the documentation would need to detail HOW it generates each
> of the different values.
>
> In my mind, your comment about what happens as the effect of program
> termination in a freestanding environment is controlled by the
> implementation, as the final exit goes into the support run time of the
> compiler (that thing that called main). It might end just as a jmp $
> command, or it might return to some part of the environment, so it could
> be documented as jumping to address xxxx or initiating a reset, which
> does fully define what it does, until it isn't under the control of the
> implementation.
>

What needs to happen is very context sensitive.

--
Les Cargill

James Kuyper

unread,

May 4, 2020, 10:23:24 AM5/4/20

to

On 5/2/20 9:22 AM, David Brown wrote:

...

> I've always assumed that implementation-defined behaviour had to be
> consistent - that any two uses of the same code should have the same
> results. But after reading this thread, and then re-reading the
> definition in the standard, I can see nothing that /requires/
> implementation-specific behaviour to be consistent, as long as it is
> properly documented.

The committee addressed this in a ruling on a DR not long after the
first C standard came out. Implementation-defined behavior is a subset
of unspecified behavior, which is behavior where the standard provides a
set of choices (which might be infinite), and the implementation if free
to choose one. What the committee said is that an implementation is, in
general, permitted to change which choice it makes at any time. It can
be different in two different programs translated by the same
implementation, or in two different translation units of the same
program, or two different lines in the same translation unit, or even
during two different passes through the same line of code.

Now, while this is the general rule, I believe that there are
exceptions. Unspecified behavior is NOT allowed to prevent a program
from behaving as required by 5.1.2.3 (4p3). The C standard supports
breaking a program up into multiple translation units, translating each
one separately and then linking them together later. There are many
unspecified things such as the representation of types that would
prevent such linkage from working if they changed between one execution
of the compiler and the next. So I think those things are required to be
consistent. But I'm having trouble locating the DR where the committee
made that decision, so I'm not sure whether the committee has ever
explicitly addressed that issue.

> For many of the implementation-defined behaviours in C, inconsistent
> behaviour would make the compiler virtually useless (imagine a compiler
> that had varying sizes for "int"!).
>
> Are there any known cases of implementation-defined behaviour of any
> sort that is inconsistent in real-world compilers? I can think of a few
> that would be outside the control of the compiler (such as "The effect
> of program termination in a freestanding environment")

Wherever the standard specifies that the order in which two expressions
are evaluated is unspecified, you're at the mercy of the optimizer which
order is actually chosen. I gather that it is, in fact, quite likely to
be different at different locations in the code depending upon the
context in which the expressions occur.

David Brown

unread,

May 4, 2020, 10:48:23 AM5/4/20

to

Thanks for trying to find this out.

I suppose in the end it comes down to quality of implementation. The
implementation-dependent behaviour must be documented. So if a compiler
is going to implement multi-character literals by calling "rand()" at
compile-time, it must say so - and people will generally choose a
different compiler.

>> For many of the implementation-defined behaviours in C, inconsistent
>> behaviour would make the compiler virtually useless (imagine a compiler
>> that had varying sizes for "int"!).
>>
>> Are there any known cases of implementation-defined behaviour of any
>> sort that is inconsistent in real-world compilers? I can think of a few
>> that would be outside the control of the compiler (such as "The effect
>> of program termination in a freestanding environment")
>
> Wherever the standard specifies that the order in which two expressions
> are evaluated is unspecified, you're at the mercy of the optimizer which
> order is actually chosen. I gather that it is, in fact, quite likely to
> be different at different locations in the code depending upon the
> context in which the expressions occur.
>

That's fine - that's what "unspecified" means. You should not write
code that depends on any particular unspecified behaviour. But very
often you /do/ need to write code that depends on
implementation-dependent behaviour. You can do that, because it is
documented.

James Kuyper

unread,

May 4, 2020, 10:52:08 AM5/4/20

to

On 5/2/20 12:13 PM, Richard Damon wrote:
...

> I've always assumed (but admit the wording is totally compelling) that
> that the documentation had to be detailed enough that the programmer
> could at least theoretically know what the compiler was going to do.
> That means that it can't say things like 'arbitrarily'.

There's an old saying that has been discussed in this newsgroup - "The
exception proves the rule". Taken literally, according to the rules of
modern English, that's simply nonsense. However, there's a valid
principle which that saying is intended to convey: "the existence of a
stated exception implies the existence of a rule that it is an exception
to".

The accuracy of floating point operations and C standard library
functions that take floating point arguments or return floating point
values is implementation defined. The standard explicitly states that
"The implementation may state that the accuracy is unknown."
(5.2.4.2.2p6). To me this implies that there's an unwritten rule that
documentation that vague would normally not meet a requirement that the
behavior be documented.

James Kuyper

unread,

May 4, 2020, 12:39:34 PM5/4/20

to

On 5/4/20 10:48 AM, David Brown wrote:
> On 04/05/2020 16:23, James Kuyper wrote:
>> On 5/2/20 9:22 AM, David Brown wrote:
...

>>> Are there any known cases of implementation-defined behaviour of any
>>> sort that is inconsistent in real-world compilers? I can think of a few
>>> that would be outside the control of the compiler (such as "The effect
>>> of program termination in a freestanding environment")
>>
>> Wherever the standard specifies that the order in which two expressions
>> are evaluated is unspecified, you're at the mercy of the optimizer which
>> order is actually chosen. I gather that it is, in fact, quite likely to
>> be different at different locations in the code depending upon the
>> context in which the expressions occur.
>>
>
> That's fine - that's what "unspecified" means. You should not write
> code that depends on any particular unspecified behaviour. But very
> often you /do/ need to write code that depends on
> implementation-dependent behaviour. You can do that, because it is
> documented.

Sorry - you did specify "implementation-defined", and my response was an
example of "unspecified".

The extent to which the register and inline keywords have any effect
(6.7.1, 6.7.4) is implementation-defined, and I gather that on many
implementations they have no effect (more true of "register" than of
"inline"). However, for implementations where they do have an effect, I
believe it is commonplace for the extent to which they have an effect to
vary depending upon the context.

Tim Rentsch

unread,

May 4, 2020, 1:28:07 PM5/4/20

to

I will offer my two cents on the two topics under discussion.

Short summary (with more complete statements below): The Digital
Mars compiler has it right. Also, any given character constant
has the same value everywhere it occurs. Other interpretations
expressed in this thread in conflict with those statements are
wrong.

Elaboration I: DMC may conformingly reject a program containing a
multiple-character character constant, provided its documentation
defining the value of said character constant determines a value
for that constant that is not representable as an int (which the
implementation may do while still being conforming).

Elaboration II: within a particular implementation (including the
settings of any options that may affect the particulars of any
implementation-defined behavior), a given character constant will
always have the same value anywhere it is used, subject to the
condition that the documentation-determined value is one that is
representable as an int. (If the aforementioned value is not one
that is representable as an int, that is a constraint violation,
and all bets are off after that, except that a diagnostic must
be produced in such cases.)

Let's take the second topic first. First, in 6.4.4.4 p10, the
Standard says "The value of an integer character constant
containing more than one character [...]". Using the word "the"
implies a single value. Second, all of section 6.4 is about
"constants". Constants don't change. There is no sensible
understanding of the word "constant" where the value of a constant
can vary. (This may not apply to physical "constants", but those
are not constants in the same sense, just measured quantities that
seem not to change over time.) Third, the choice of value for any
given character constant is implementation defined (except we might
say the value of a single-byte character constant is implementation
determined, but we aren't talking about those here). When a choice
is simply implementation defined, the choice is meant to be single
valued for each case. For example, the representation of 'int' may
be ones' complement, two's complement, or sign and magnitude, but
that doesn't mean one 'int' can be represented one way and another
'int' be represented a different way; the same choice must apply
to both (and indeed all int objects). When the Standard wants to
allow variation from case to case, it uses different phrasing. For
example, how decimal floating point constants are rounded does not
have to be done consistently. Discussing how FP constants are
rounded, in 6.4.4.2 p3 the Standard says

[...] the result is either the nearest representable
value, or the larger or smaller representable value
immediately adjacent to the nearest representable
value, /chosen in an implementation-defined manner./
[my emphasis]

The phrase "implementation-defined manner" is used to indicate that
the choice may vary from circumstance to circumstance. The rule
for character constants doesn't say "implementation-defined manner"
but just "implementation-defined". Within a given implementation
the value of any particular character constant must be the same
everywhere.

Now for the first topic. The Standard talks about values for
character constants in 6.4.4.4 p{10,11}. Paragraph 11 gives rules
for wide character constants, which do not concern us here. About
character constants having more than one character, paragraph 10
says this:

An integer character constant has type int. [...] The value
of an integer character constant containing more than one
character (e.g., 'ab'), [...], is implementation-defined.

Note the wording used. The quoted text does not contain the
defined phrase "implementation-defined value". Rather it says
that the value of such constants is implementation-defined.
Although these two phrasings may sound similar, they are not
the same. Consider for example 6.2.5 p3, which says

An object declared as type char is large enough to store any
member of the basic execution character set. If a member of
the basic execution character set is stored in a char object,
its value is guaranteed to be nonnegative. If any other
character is stored in a char object, the resulting value is
implementation-defined but shall be within the range of
values that can be represented in that type.

Note the final sentence. If an implementation-defined value were
the same as a value that is implementation-defined, there would
be no need for the statement that the value must be within the
range of representable values for char. The value of a character
constant (having more than one character) thus may fall outside
the range of values for int. Note also the constraint given at
the start of the section on Constants, in 6.4.4 p2. It says:

Each constant shall have a type and the value of a constant
shall be in the range of representable values for its type.

Why does the constraint mention representable values? It isn't
needed for Integer Constants, which are always representable
unless they fall outside the range of all integer types, in which
case they don't have a type, per 6.4.4.1 p6. It isn't needed for
Floating Constants, which by construction always determine a
representable value. It isn't needed for Enumeration Constants,
which by 6.7.2.2 must have a value representable as an int. The
only reason this clause is there therefore must be for Character
Constants, which ergo must be able to have values outside of what
can be represented as an int.

Besides the above, the idea that all character constants must be
mapped to values that are in the range of what int can represent
doesn't pass the laugh test. Consider an implementation that
has 16-bit ints, and how it might define the value of different
character constants. Suppose it encounters a character constant
for a single universal character name, such as

'\U000FFFFF'

What purpose is served by insisting that all such character
constants be mapped to 16-bit value? It's absurd. Of course the
Standard could have included a clear and explicit statement that
mandates such an absurd behavior. But in this case is doesn't.
In the absence of a clear and explicit statement to the contrary,
the only sensible conclusion is that the Standard does not intend
absurd consequences, and that in this particular case character
constants may have values outside the range of what int can
represent, and which therefore under 6.4.4 p2 must result in a
diagnostic. Moreover, since a constraint has been violated, an
implementation is within its rights to reject the compilation.

For the sake of completeness, I ran tests using several versions
of GNU tools and clang tools, on two different sources:

int x = 'abcde';

and

int y = '\U000FFFFF';

In all cases, with all option settings, one or more diagnostics
were issued; in no case was a source accepted without producing
a diagnostic. In the case of the GNU tools, the diagnostics were
always warnings, even with -pedantic-errors. In the case of the
clang tools, using -pedantic-errors turned warnings into errors,
and clang++ gave errors in all cases. To me this says that all
of these compilers - DMC, gcc/g++, and clang/clang++ - all
support the idea that character constants may have values that
are not representable as int.

So fwiw there is my take on the question(s).

Tim Rentsch

unread,

May 4, 2020, 1:30:36 PM5/4/20

to

Bart <b...@freeuk.com> writes:

> On 30/04/2020 19:57, fir wrote:
>
>> W dniu czwartek, 30 kwietnia 2020 20:38:39 UTC+2 u?ytkownik fir napisa?:
>>
>>> W dniu czwartek, 30 kwietnia 2020 13:43:54 UTC+2 u?ytkownik fir napisa?:
>>>
>>>> by char literal i name things like 'a' '2'
>>>> (libe by string literal i name "aaa" "2")
>>>>
>>>> the wuestion is does have those longer char literals like 'aaaaaaaa'
>>>> some defined meaning
>>>> in c? (mostly thinking on older C like c89)
>>>
>>> ps doeas maybe someone know how to make 64 but versions of it work on
>>> {} mingw/gcc 32 bit?
>>>
>>> printf("\n %d %x", 'fir', 'fir');
>>> this int version works but what with
>>>
>>> long long int z = 'firfir';
>>>
>>> how to display it?
>>
>> this seem to not work, seems maybe like its kinda bug as the code
>>
>> long long int z = 'firfir';
>>
>> if(z<2*1000*1000*1000)
>> printf("\n %d something wrong ", sizeof(z));
>>
>> prints
>>
>> 8 something wrong
>>
>> 'firfir' definitely shouldnt be les than 2G here afaiu
>
> My C compiler manages it:
>
> printf("%016llx\n",'ABCDEFGH');
> printf("%016llx\n",'firfir');
>
> produces:
>
> 4847464544434241
> 0000726966726966
>
> [...] (My literals become long long int over 4 characters.)

What your compiler does is conforming as long as it issues
a diagnostic for the constraint violation of 6.4.4 p2.

David Brown

unread,

May 4, 2020, 1:37:52 PM5/4/20

to

(Just to be clear - these keywords have other effects that are not
optional. I assume you are talking about their effect as optimisation
hints.)

Yes, I think you are right here.

Let me refine my question - are there any known cases of inconsistent
implementation of implementation-dependent behaviour that are able to
affect the observable behaviour of the program? (Optimisation affects
its speed, but not its correctness or the possible behaviour.)

James Kuyper

unread,

May 4, 2020, 3:03:24 PM5/4/20

to

On 5/4/20 1:37 PM, David Brown wrote:
> On 04/05/2020 18:39, James Kuyper wrote:

...

>> The extent to which the register and inline keywords have any effect
>> (6.7.1, 6.7.4) is implementation-defined, and I gather that on many
>> implementations they have no effect (more true of "register" than of
>> "inline"). However, for implementations where they do have an effect, I
>> believe it is commonplace for the extent to which they have an effect to
>> vary depending upon the context.
>>
>
> (Just to be clear - these keywords have other effects that are not
> optional. I assume you are talking about their effect as optimisation
> hints.)
>
> Yes, I think you are right here.
>
> Let me refine my question - are there any known cases of inconsistent
> implementation of implementation-dependent behaviour that are able to
> affect the observable behaviour of the program? (Optimisation affects
> its speed, but not its correctness or the possible behaviour.)

C allows you provide both an inline definition of a function, and an
external definition, and "It is unspecified whether a call to the
function uses the inline definition or the external definition."
(6.7.4p7). This can be important, because, unlike C++, C doesn't require
that the two definitions match. Therefore, the two definitions can, in
particular, have different observable behavior.

As I understand it, with an inline definition and an external definition
for the same function available, an implementation can inline the inline
definition, or it can call a function - the choice is
implementation-defined. If it calls a function, that function could
either be one that implements the inline definition, or it could be the
external definition; that choice is unspecified. The net result is that
the observable behavior can be different, depending in part upon
implementation-defined behavior.

However, I can't come up with any other reasonable possibilities.

anti...@math.uni.wroc.pl

unread,

May 4, 2020, 8:47:47 PM5/4/20

to

Keith Thompson <Keith.S.T...@gmail.com> wrote:
>
> If the C standard were to impose a limit, it wouldn't and shouldn't
> do so by some vague suggestion. For there to be an actual limit
> on the number of characters in a multi-character constant, there
> would have to be either a syntax rule or a constraint that 'ABCDE'
> would violate. There isn't.

I think this is opposite. Unless standard give meaning to
it you have undefined or unspecifed or implementation specific
behaviour. AFAICS each means that program is non stricly conforming
(4.5 in N1570). More precisely, 4.5 says about output, which
give a loophole for unreachable or useless code. But if
implementation maps multibyte characters to integers in one
of two obvious ways, then for character constants having 3 or
more characters resulting integer value is bigger than minimal
environmental limit for integers and again program is not stricly
conforming.

Now, could you say where standard requires that conforming
implementation accepts some non stricly conforming program?
In obvious place (4.6) I see only requirement about stricly
conforming programs.

--
Waldek Hebisch

Keith Thompson

unread,

May 4, 2020, 11:09:07 PM5/4/20

to

anti...@math.uni.wroc.pl writes:
> Keith Thompson <Keith.S.T...@gmail.com> wrote:
>> If the C standard were to impose a limit, it wouldn't and shouldn't
>> do so by some vague suggestion. For there to be an actual limit
>> on the number of characters in a multi-character constant, there
>> would have to be either a syntax rule or a constraint that 'ABCDE'
>> would violate. There isn't.
>
> I think this is opposite. Unless standard give meaning to
> it you have undefined or unspecifed or implementation specific
> behaviour. AFAICS each means that program is non stricly conforming
> (4.5 in N1570). More precisely, 4.5 says about output, which
> give a loophole for unreachable or useless code. But if
> implementation maps multibyte characters to integers in one
> of two obvious ways, then for character constants having 3 or
> more characters resulting integer value is bigger than minimal
> environmental limit for integers and again program is not stricly
> conforming.

Very few programs are strictly conforming, and IMHO the concept of
strict conformance isn't all that useful. For example, this program is
not strictly conforming:

#include <stdio.h>
#include <limits.h>
int main(void) {
printf("%d\n", INT_MAX);
}

I don't believe a conforming hosted implementation would be allowed to
reject it.

> Now, could you say where standard requires that conforming
> implementation accepts some non stricly conforming program?
> In obvious place (4.6) I see only requirement about stricly
> conforming programs.

4p3:
A program that is correct in all other aspects, operating on correct
data, containing unspecified behavior shall be a correct program and

act in accordance with 5.1.2.3.

James Kuyper

unread,

May 4, 2020, 11:52:34 PM5/4/20

to

On 5/4/20 8:47 PM, anti...@math.uni.wroc.pl wrote:
> Keith Thompson <Keith.S.T...@gmail.com> wrote:
>>
>> If the C standard were to impose a limit, it wouldn't and shouldn't
>> do so by some vague suggestion. For there to be an actual limit
>> on the number of characters in a multi-character constant, there
>> would have to be either a syntax rule or a constraint that 'ABCDE'
>> would violate. There isn't.
>
> I think this is opposite. Unless standard give meaning to
> it you have undefined or unspecifed or implementation specific
> behaviour.

Correct. In this case, it's implementation-defined, as has already been
mentioned.

> ... AFAICS each means that program is non stricly conforming

> (4.5 in N1570). More precisely, 4.5 says about output, which

> give a loophole for unreachable or useless code. ...

Correct.

> ... But if

> implementation maps multibyte characters to integers in one
> of two obvious ways, then for character constants having 3 or
> more characters resulting integer value is bigger than minimal
> environmental limit for integers and again program is not stricly
> conforming.

If "one of the two obvious ways" maps a character constant with 3 or
more characters to a value that is not representable as an int, than an
conforming implementation is not allowed to use "one of the two obvious
ways" to map that character constant to a value. The standard specifies
no explicit limit on the length of character constant, and you cannot
derive the existence of an implicit limit by referring to a way that
seems "obvious" to you; not unless the standard actually mandates that
way, which it doesn't.

There are other ways to map multi-character character constants to int
values that do meet all of the standard's applicable requirements. For
instance, take the (first/last) sizeof(int) characters of the string,
and pad, if necessary, at the (beginning/end) with null characters to a
length of sizeof(int). The value of the character constant would then be
the same as the value of of an int object whose individual bytes had the
same representation as that sequence of characters, in that order.
That's four different ways (first/last and beginning/end) that all
produce conforming values for such constants. Any other approach that
always generates a value representable as an int, regardless of the
length of the character constant, are also allowed.

Note: exceeding the minimum implementation limits for integers does not,
in itself, prevent a program from being strictly conforming. Any program
that involves an int value that is outside of the range -32767 to 32767
exceeds those limits, but a strictly conforming program can do that, so
long as it does so only when compiled on implementations where INT_MIN
or INT_MAX is large enough. So long as the observable behavior does not
depend upon the values of INT_MIN and INT_MAX, it's perfectly fine for
the unobservable part of the program's behavior does depend on those values.

> Now, could you say where standard requires that conforming
> implementation accepts some non stricly conforming program?
> In obvious place (4.6) I see only requirement about stricly
> conforming programs.

You're right - but if the only reason why a program fails to be strictly
conforming is that it has observable behavior that depends upon code
with unspecified behavior, then it is still considered a correct
program, and is required to behave in accordance with 5.1.2.3 (see 4p3).
You cannot label it an incorrect program just because the character
constant is longer than some arbitrary limit you've decided is applicable.

David Brown

unread,

May 5, 2020, 2:33:21 AM5/5/20

to

On 04/05/2020 21:03, James Kuyper wrote:
> On 5/4/20 1:37 PM, David Brown wrote:
>> On 04/05/2020 18:39, James Kuyper wrote:
> ...
>>> The extent to which the register and inline keywords have any effect
>>> (6.7.1, 6.7.4) is implementation-defined, and I gather that on many
>>> implementations they have no effect (more true of "register" than of
>>> "inline"). However, for implementations where they do have an effect, I
>>> believe it is commonplace for the extent to which they have an effect to
>>> vary depending upon the context.
>>>
>>
>> (Just to be clear - these keywords have other effects that are not
>> optional. I assume you are talking about their effect as optimisation
>> hints.)
>>
>> Yes, I think you are right here.
>>
>> Let me refine my question - are there any known cases of inconsistent
>> implementation of implementation-dependent behaviour that are able to
>> affect the observable behaviour of the program? (Optimisation affects
>> its speed, but not its correctness or the possible behaviour.)
>
> C allows you provide both an inline definition of a function, and an
> external definition, and "It is unspecified whether a call to the
> function uses the inline definition or the external definition."
> (6.7.4p7). This can be important, because, unlike C++, C doesn't require
> that the two definitions match. Therefore, the two definitions can, in
> particular, have different observable behavior.

That's unspecified behaviour, not implementation-defined behaviour. The
implementation-defined bit is the extent to which the compiler inlines
or optimises the function - which version it uses is unspecified, not
implementation-defined.

>
> As I understand it, with an inline definition and an external definition
> for the same function available, an implementation can inline the inline
> definition, or it can call a function - the choice is
> implementation-defined. If it calls a function, that function could
> either be one that implements the inline definition, or it could be the
> external definition; that choice is unspecified. The net result is that
> the observable behavior can be different, depending in part upon
> implementation-defined behavior.
>

It is also free to take the inline definition and make it a file-local
(internal linkage) function and call it - or to take the external
definition and compile that inline.

As far as I can tell, the aspects that don't affect observable behaviour
(merely optimisation) are implementation-defined, while the aspects that
/can/ affect observable behaviour (which function is used) are
unspecified behaviour.

James Kuyper

unread,

May 5, 2020, 9:47:31 AM5/5/20

to

Because I disagree with your interpretation below, the
implementation-defined behavior makes the unspecified behavior a
possibility.

>> As I understand it, with an inline definition and an external definition
>> for the same function available, an implementation can inline the inline
>> definition, or it can call a function - the choice is
>> implementation-defined. If it calls a function, that function could
>> either be one that implements the inline definition, or it could be the
>> external definition; that choice is unspecified. The net result is that
>> the observable behavior can be different, depending in part upon
>> implementation-defined behavior.
>>
>
> It is also free to take the inline definition and make it a file-local
> (internal linkage) function and call it

That's precisely what I was referring to when I said "that function
could ... be one that implements the inline definition".

> - or to take the external
> definition and compile that inline.

I don't think that is permitted, except possibly by the as-if rule. That
is, only if it doesn't change the observable behavior.

David Brown

unread,

May 5, 2020, 10:22:14 AM5/5/20

to

That's not how I interpret the paragraph in the standard. As I see it,
the implementation-defined behaviour is the extent to which the compiler
uses "inline" as a hint that it should make the function as fast as
possible (such as by using it inline in the calling function). This is
entirely independent of whether the compiler uses the definition given
with the inline keyword in the file, or if it uses the external
definition provided in a different file. The choice of definitions to
use here is unspecified.

(It would have made more sense for later C standards to follow C++ here,
and insist that all definitions of an external linkage inline function
are functionally the same.)

>
>>> As I understand it, with an inline definition and an external definition
>>> for the same function available, an implementation can inline the inline
>>> definition, or it can call a function - the choice is
>>> implementation-defined. If it calls a function, that function could
>>> either be one that implements the inline definition, or it could be the
>>> external definition; that choice is unspecified. The net result is that
>>> the observable behavior can be different, depending in part upon
>>> implementation-defined behavior.
>>>
>>
>> It is also free to take the inline definition and make it a file-local
>> (internal linkage) function and call it
>
> That's precisely what I was referring to when I said "that function
> could ... be one that implements the inline definition".
>
>> - or to take the external
>> definition and compile that inline.
>
> I don't think that is permitted, except possibly by the as-if rule. That
> is, only if it doesn't change the observable behavior.
>

Yes, exactly - the compiler can do any kind of code arrangement like
this as long as it is an optimisation, and does not affect the
observable behaviour.

anti...@math.uni.wroc.pl

unread,

May 5, 2020, 1:10:26 PM5/5/20

to

For that part Tim Rentsch gave better argument. But my argument
was different.

> The standard specifies
> no explicit limit on the length of character constant, and you cannot
> derive the existence of an implicit limit by referring to a way that
> seems "obvious" to you; not unless the standard actually mandates that
> way, which it doesn't.

<snip>

> Note: exceeding the minimum implementation limits for integers does not,
> in itself, prevent a program from being strictly conforming. Any program
> that involves an int value that is outside of the range -32767 to 32767
> exceeds those limits, but a strictly conforming program can do that, so
> long as it does so only when compiled on implementations where INT_MIN
> or INT_MAX is large enough. So long as the observable behavior does not
> depend upon the values of INT_MIN and INT_MAX, it's perfectly fine for
> the unobservable part of the program's behavior does depend on those values.

Then why 4.5 says: "minimum implementation limit". If not for
other limits it could refer to "minimum translation limit" or
simply point to 5.2.4.1. I agree that consequences do not look
nice, but wording strongly suggest that 5.2.4.2 was intended
to apply. And clearly "minimum implementation limit" applies
regardless if offending code otherwise affects observable behaviour
or not. Now, how 5.2.4.2 could apply? Clearly
"minimum implementation limit" is _not_ intended to affect
runtime behaviour (actual limits do this). Natural
troublespot are numeric literals, trying to use large
literals could exceed compiler limit. In particluar,
it seems that compiler is allowed to reject at compile
time literals which otherwise would represent legal
runtime values. I admit that that this looks undesirable,
but ATM compiler which implements INT_MAX as expression
and is unable to handle it as literal value looks for
me as poor quality but legal implementation.

> > Now, could you say where standard requires that conforming
> > implementation accepts some non stricly conforming program?
> > In obvious place (4.6) I see only requirement about stricly
> > conforming programs.
>
> You're right - but if the only reason why a program fails to be strictly
> conforming is that it has observable behavior that depends upon code
> with unspecified behavior, then it is still considered a correct
> program, and is required to behave in accordance with 5.1.2.3 (see 4p3).
> You cannot label it an incorrect program just because the character
> constant is longer than some arbitrary limit you've decided is applicable.

4.3 says "...correct in all other aspects, ...". So this means
that merely containing unspecified behavior is not valid reason
to reject program. But this does not imply that choice of implementation
defined value have to be done in way that makes program "correct in
all other aspects". For example if implementation defined value
is used in arithmetic, then implementation is _not_ forced to
find values that avoid undefined behaviour (or other trouble).
And for me 4.5 and 4.6 together mean that it is responsiblity
of program (programmer) to avoid exceeding implementation limits.
In other word when for given program implementation defined rule
leads to behaviur that exceed capability of given implementation
then implementation is still conforming: it does not reject
program due to "unspecified behavior", but because it can
not handle it... Of course, this leads to question if
implementation limit is reasonable. Clearly saying "this
implementation is unable to handle unspecified behavior"
is a cheat. But saying "this implementation can not fit
5 chars into space of 4 chars" is pretty reasonable limit.

--
Waldek Hebisch

Keith Thompson

unread,

May 5, 2020, 4:41:44 PM5/5/20

to

anti...@math.uni.wroc.pl writes:
[...]

> Then why 4.5 says: "minimum implementation limit". If not for
> other limits it could refer to "minimum translation limit" or
> simply point to 5.2.4.1. I agree that consequences do not look
> nice, but wording strongly suggest that 5.2.4.2 was intended
> to apply. And clearly "minimum implementation limit" applies
> regardless if offending code otherwise affects observable behaviour
> or not. Now, how 5.2.4.2 could apply? Clearly
> "minimum implementation limit" is _not_ intended to affect
> runtime behaviour (actual limits do this). Natural
> troublespot are numeric literals, trying to use large
> literals could exceed compiler limit.

Large numeric literals are handled in a well defined way.

Each constant shall have a type and the value of a constant shall be
in the range of representable values for its type.

A integer or floating constant that's outside the range of any supported
type is a constraint violation.

> In particluar,
> it seems that compiler is allowed to reject at compile
> time literals which otherwise would represent legal
> runtime values.

I don't believe that's correct, but I'm not entirely sure what you
mean. Can you provide an example (even a hypothetical one)?

> I admit that that this looks undesirable,
> but ATM compiler which implements INT_MAX as expression
> and is unable to handle it as literal value looks for
> me as poor quality but legal implementation.

INT_MAX must be a constant expression of type int. N1570 5.2.4.2.1p1:

The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Moreover,
except for CHAR_BIT and MB_LEN_MAX, the following shall be
replaced by expressions that have the same type as would an
expression that is an object of the corresponding type converted
according to the integer promotions.

[...]

> 4.3 says "...correct in all other aspects, ...". So this means
> that merely containing unspecified behavior is not valid reason
> to reject program. But this does not imply that choice of implementation
> defined value have to be done in way that makes program "correct in
> all other aspects". For example if implementation defined value
> is used in arithmetic, then implementation is _not_ forced to
> find values that avoid undefined behaviour (or other trouble).
> And for me 4.5 and 4.6 together mean that it is responsiblity
> of program (programmer) to avoid exceeding implementation limits.
> In other word when for given program implementation defined rule
> leads to behaviur that exceed capability of given implementation
> then implementation is still conforming: it does not reject
> program due to "unspecified behavior", but because it can
> not handle it... Of course, this leads to question if
> implementation limit is reasonable. Clearly saying "this
> implementation is unable to handle unspecified behavior"
> is a cheat. But saying "this implementation can not fit
> 5 chars into space of 4 chars" is pretty reasonable limit.

Given that character constants are of type int, I'm not at all
convinced by an argument that the implementation can say that
some character constants have values that are outside the range of
type int.

An implementation certainly *can* map a constant like 'ABCDE'
to a value within the range of type int. The question is whether it's
allowed to choose not to.

anti...@math.uni.wroc.pl

unread,

May 5, 2020, 6:29:38 PM5/5/20

to

Two hypotetical possiblilities:

- "broken" scanner, that could not produce all legal values.
I am not aware of any major compiler with such problem, but
some toy compilers had such problem.
Not likeley on normal machines, where scanner must handle
minimal range of long long, but possible for machine
with 128-bit interger type, but 64-bit literals.
- simple code generator for 32-bit machine with 16-bit immediates.
Such code generator may simply embed constant literals as
immediates and reject any integer literal that does not fit
in 16 bits. Assuming no constant folding compiler still would
be able to produce values of 32-bit constant expressions.
Point is that turning literal into expression is a bruden
on code generator and implementer may be tempted to say
that bigger literals are not implemented. In the past
compiler had various crazy limits for no better reason.

> > I admit that that this looks undesirable,
> > but ATM compiler which implements INT_MAX as expression
> > and is unable to handle it as literal value looks for
> > me as poor quality but legal implementation.
>
> INT_MAX must be a constant expression of type int. N1570 5.2.4.2.1p1:
>
> The values given below shall be replaced by constant expressions
> suitable for use in #if preprocessing directives. Moreover,
> except for CHAR_BIT and MB_LEN_MAX, the following shall be
> replaced by expressions that have the same type as would an
> expression that is an object of the corresponding type converted
> according to the integer promotions.

Consider

#define INT_MAX (25575*41*2048 + 2047)

or

#define INT_MAX __int_max_val__

where __int_max_val__ is really a variable, but compiler and
preprocessor magic means that during translation it is treated
as a constant. Neither uses literals bigger than 16-bits
and implementation can handle each of them in way required by
5.2.4.2.1p1.

> [...]
>
> > 4.3 says "...correct in all other aspects, ...". So this means
> > that merely containing unspecified behavior is not valid reason
> > to reject program. But this does not imply that choice of implementation
> > defined value have to be done in way that makes program "correct in
> > all other aspects". For example if implementation defined value
> > is used in arithmetic, then implementation is _not_ forced to
> > find values that avoid undefined behaviour (or other trouble).
> > And for me 4.5 and 4.6 together mean that it is responsiblity
> > of program (programmer) to avoid exceeding implementation limits.
> > In other word when for given program implementation defined rule
> > leads to behaviur that exceed capability of given implementation
> > then implementation is still conforming: it does not reject
> > program due to "unspecified behavior", but because it can
> > not handle it... Of course, this leads to question if
> > implementation limit is reasonable. Clearly saying "this
> > implementation is unable to handle unspecified behavior"
> > is a cheat. But saying "this implementation can not fit
> > 5 chars into space of 4 chars" is pretty reasonable limit.
>
> Given that character constants are of type int, I'm not at all
> convinced by an argument that the implementation can say that
> some character constants have values that are outside the range of
> type int.

Well, 3 character constant is representable as int, but outside
minimal limits of integer type. If you agree with previous
part, then implementation can say that it exceeds implementation
limit. Or, to be more reasonable it can say that INT_MIN
exceeds its capability (everybody knows that correct handling
of INT_MIN may be problematic). Then it can map anything
bigger than 4 chars to INT_MIN and reject it.
Note: I assume that limit only applies to handling of character
constants and other part of implementation have no trouble with
INT_MIN.

Now, why implementation which makes effort to handle INT_MIN
be forced to invent some fake representation of longer constants,
while one rejecting INT_MIN can reject them? It looks
much more resonable to say that implementation has limit
since implementation defined way would otherwise produce
something that is not of required type.

> An implementation certainly *can* map a constant like 'ABCDE'
> to a value within the range of type int. The question is whether it's
> allowed to choose not to.

--

Waldek Hebisch

Keith Thompson

unread,

May 5, 2020, 7:18:23 PM5/5/20

to

anti...@math.uni.wroc.pl writes:
> Keith Thompson <Keith.S.T...@gmail.com> wrote:
>> anti...@math.uni.wroc.pl writes:
[...]

>> > In particluar,
>> > it seems that compiler is allowed to reject at compile
>> > time literals which otherwise would represent legal
>> > runtime values.
>>
>> I don't believe that's correct, but I'm not entirely sure what you
>> mean. Can you provide an example (even a hypothetical one)?
>
> Two hypotetical possiblilities:
>
> - "broken" scanner, that could not produce all legal values.
> I am not aware of any major compiler with such problem, but
> some toy compilers had such problem.
> Not likeley on normal machines, where scanner must handle
> minimal range of long long, but possible for machine
> with 128-bit interger type, but 64-bit literals.
> - simple code generator for 32-bit machine with 16-bit immediates.
> Such code generator may simply embed constant literals as
> immediates and reject any integer literal that does not fit
> in 16 bits. Assuming no constant folding compiler still would
> be able to produce values of 32-bit constant expressions.
> Point is that turning literal into expression is a bruden
> on code generator and implementer may be tempted to say
> that bigger literals are not implemented. In the past
> compiler had various crazy limits for no better reason.

Any such compiler would simply be non-conforming and buggy.
A conforming compiler is not "allowed" to reject constants that are
within the range of long long. Equivalently, a compiler that does
so is not conforming.

>> > I admit that that this looks undesirable,
>> > but ATM compiler which implements INT_MAX as expression
>> > and is unable to handle it as literal value looks for
>> > me as poor quality but legal implementation.
>>
>> INT_MAX must be a constant expression of type int. N1570 5.2.4.2.1p1:
>>
>> The values given below shall be replaced by constant expressions
>> suitable for use in #if preprocessing directives. Moreover,
>> except for CHAR_BIT and MB_LEN_MAX, the following shall be
>> replaced by expressions that have the same type as would an
>> expression that is an object of the corresponding type converted
>> according to the integer promotions.
>
> Consider
>
> #define INT_MAX (25575*41*2048 + 2047)

OK, that's a constant expression of type int, so that's a conforming but
silly way to define INT_MAX.

> or
>
> #define INT_MAX __int_max_val__
>
> where __int_max_val__ is really a variable, but compiler and
> preprocessor magic means that during translation it is treated
> as a constant.

Then it's a constant, isn't it?

> Neither uses literals bigger than 16-bits
> and implementation can handle each of them in way required by
> 5.2.4.2.1p1.

Again, any conforming implementation must handle the constant 2147483647
correctly. If INT_MAX >= 2147483647 then it's of type int; otherwise
it's of type long. There is simply no wiggle room.

(Unless you're talking about non-conforming or buggy compilers, but if
so I don't see the point.)

Let's assume 16-bit int. How do you conclude that the value of 'ABC' is
outside the range of int? That's the case only if the implementation
*chooses* such a mapping. Since the standard says nothing about what
mapping is to be used (only that it's implementation-defined), and since
it requires the value of a character constant to be of type int, my
argument is that such a mapping is non-conforming. (It's not a 100%
firm argument. It's possible the authors of the standard just didn't
think about that possibility one way or the other.)

> If you agree with previous
> part, then implementation can say that it exceeds implementation
> limit. Or, to be more reasonable it can say that INT_MIN
> exceeds its capability (everybody knows that correct handling
> of INT_MIN may be problematic).

Since there are no negative integer constants, defining INT_MIN is
*slightly* tricky. With 16-bit int, this:
#define INT_MIN (-32768)
is non-conforming. Solutions are well known, for example:
#define INT_MIN (-32767-1)
which meets the requirements of the standard. What's the problem?

[...]

James Kuyper

unread,

May 5, 2020, 8:18:31 PM5/5/20

to

On 5/5/20 1:10 PM, anti...@math.uni.wroc.pl wrote:
> James Kuyper <james...@alumni.caltech.edu> wrote:
...

>> If "one of the two obvious ways" maps a character constant with 3 or
>> more characters to a value that is not representable as an int, than an
>> conforming implementation is not allowed to use "one of the two obvious
>> ways" to map that character constant to a value.
>
> For that part Tim Rentsch gave better argument. But my argument
> was different.
>
>> The standard specifies
>> no explicit limit on the length of character constant, and you cannot
>> derive the existence of an implicit limit by referring to a way that
>> seems "obvious" to you; not unless the standard actually mandates that
>> way, which it doesn't.
> <snip>
>> Note: exceeding the minimum implementation limits for integers does not,>> in itself, prevent a program from being strictly conforming. Any program

My apologies - that statement directly contradicts 4p5. I apparently
wasn't thinking clearly at the time I wrote that. Having two 5-year olds
suffering from cabin fever is the best excuse I can come up with. Such
code might not be strictly conforming, but it still can be correct code
(4p3).

> literals could exceed compiler limit. In particluar,
> it seems that compiler is allowed to reject at compile
> time literals which otherwise would represent legal
> runtime values.

Sort-of. Only strictly conforming programs are required to be accepted.
However, a program that fails to be strictly conforming solely by reason
of having unspecified behavior is still a correct program and must act
in accordance with 5.1.2.3 (4p3). How a program can "act in accordance
with 5.1.2.3" without being accepted is a bit of a conundrum. However,
while an implementation may be free to reject a program by reason of
containing unspecified behavior, 4p3 means that you can't describe it's
reason for rejection by saying that the program isn't correct.

>>> Now, could you say where standard requires that conforming
>>> implementation accepts some non stricly conforming program?
>>> In obvious place (4.6) I see only requirement about stricly
>>> conforming programs.
>>
>> You're right - but if the only reason why a program fails to be strictly
>> conforming is that it has observable behavior that depends upon code
>> with unspecified behavior, then it is still considered a correct
>> program, and is required to behave in accordance with 5.1.2.3 (see 4p3).
>> You cannot label it an incorrect program just because the character
>> constant is longer than some arbitrary limit you've decided is applicable.
>
> 4.3 says "...correct in all other aspects, ...". So this means
> that merely containing unspecified behavior is not valid reason
> to reject program. But this does not imply that choice of implementation
> defined value have to be done in way that makes program "correct in
> all other aspects".

True, but in this particular case, the choice is required to be a value
representable as an int. If the only processing your program performs on
that value is to do things to it that can be safely done on all values
that are representable as an int, then the program can be "correct in
all other aspects". For instance, dividing it by an arbitrary positive
integer, or using printf("%d") to display it.

For example if implementation defined value
> is used in arithmetic, then implementation is _not_ forced to
> find values that avoid undefined behaviour (or other trouble).
> And for me 4.5 and 4.6 together mean that it is responsiblity
> of program (programmer) to avoid exceeding implementation limits

> In other word when for given program implementation defined rule
> leads to behaviur that exceed capability of given implementation
> then implementation is still conforming: it does not reject
> program due to "unspecified behavior", but because it can
> not handle it... Of course, this leads to question if
> implementation limit is reasonable. Clearly saying "this
> implementation is unable to handle unspecified behavior"
> is a cheat. But saying "this implementation can not fit
> 5 chars into space of 4 chars" is pretty reasonable limit.

Yes, but the standard requires that 'ABCDE' be given a value that is
representable as an 'int'. Such a value is guaranteed to fit in
sizeof(int) bytes. For an implementation where sizeof(int)==4, this does
NOT imply that it must stuff 5 chars into the space of 4. The standard
imposes no upper limit on the number of characters in a character
literal, and requires that the result map to representable int value,
and that is all. There's many simple obvious ways to meet both
requirements. Those requirements can only be met by a mapping that is
many-to-one, which is fine, since the standard says nothing to suggest
otherwise. The two simplest ways to do it would be to map 'ABCDE' to the
same value as either 'ABCD' or 'BCDE'.