[intro.memory]1 The fundamental storage unit in the C ++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. [...]
[lex.charset]1 The basic source character set consists of 96 characters [...]
[lex.charset]3 The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. [...]
[basic.fundamentals]1 Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set. [...]
[expr.sizeof]1 [...]. sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. [...]
[lex.phases]1.1 Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. [...] Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. [...]
[lex.phases]1.5 Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set (2.14.3, 2.14.5); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.
The idea behind all of these bullets is:
The compiler must recognize each character written in the source file. For that, the compiler of course need to know the encoding of the source file, but how the compiler treats internally each character its implementation-defined. The internal encoding of the compiler has theoretically nothing to do with the input or execution encoding. Each appearance of non-basic characters must be treated as-if written using the universal-character-name syntax (\u + Unicode code point in hexadecimal). But the compiler could for example treat internally the found characters in source files as wish (for example, translating from input encoding to UTF-8 encoding without the universal-character-name syntax, or translating them directly to the execution character set). What matter is the character recognition from the input to parse the syntax and to translate char and strings to do a proper translation to the output encoding.
Non-basic character cannot appear outside identifiers or char/string literals. Support for non-basic characters inside identifiers is implementation-defined, which usually aren't, because it can cause problems when linking (different linkers can have different way of treating the encoding of non-ASCII symbols), because the execution encoding has also nothing to do with identifiers encoding in the executable.
So, what matters is the characters written inside char and strings literals. Objects of type `char` must support at least 255 different values in execution, because there's 255 different UTF-8 code-units (which includes the 96 characters of the basic source character set). But char's can hold more values if the compiler wants to support them. For that, it will require more physical storage for a bigger range of values that those minimally required. Despite the physical size, `sizeof(char) = 1`.
The physical storage length of chars it what defines physically the memory unit of C++ (the C++ bytes), because `sizeof(unsigned char) = sizeof(signed char) = sizeof(char)`, and the object representation is the set of `unsigned chars` required to store an object. So, `sizeof(T)` is the amount of `unsigned char`s that an object of type T requires for being stored, which of course, can be different from the amount of physical bytes needed to store that object if the physical size of a char is bigger than 1 physical byte.
So, the size of chars is not about characters, but also about memory composition. And then, it comes the execution character set (the execution encoding).
If the user specifies an execution character set of 2^32 different values (UTF-32, for instance), each char will be able to store at least 2^32 different values, but `sizeof(char)` will still be 1, and `sizeof(int) = 1`, if `int` are physically stored as 4 bytes. And remember that objects of type char must be "large enough", so the physical size of chars can be even bigger (which of course never is).
Is it a good idea to specifiy UTF-32 as execution encoding? Never, because the storage required for ints is recommended to be the fastest size of the host machine. If the fastest size of the host machine is 2-bytes (unlikely today in the most majority of situations), then you are forcing the compiler to work with ints of 4 physical bytes, because each int is composed of a finite set of unsigned chars (in this case, 1).