On 24/07/2020 07:51, Juha Nieminen wrote:
> Daniel P <
daniel...@gmail.com> wrote:
>> C++ already had two byte types too many - char, signed char,
>> and unsigned char.
>
> I don't know what the motivation was originally to have three distinct char
> types in C (from which C++ "inherited" them), but in a way it actually
> makes sense, perhaps serendipitously.
>
> The type 'char' is supposed to be the most efficient byte type for the
> target platform, either a signed one or an unsigned one. Most usually
> it should be preferred especially when dealing with strings, as long
> as one is aware that it can be either signed or unsigned, and doesn't
> make any assumptions either way.
>
> Curiously, this is not just theoretical. There is a *modern* very
> concrete example where this has bitten many a developer in the posterior,
> with code compiling but working incorrectly, because it wrongly assumes
> that 'char' is signed.
>
> Namely in ARM processors (at least the 32-bit ones) it so happens that
> an unsigned char is more efficient than a signed one, and thus most
> compilers (such as gcc) will use the 'char' type as an unsigned char.
> Most notoriously this happens when compiling for the Raspberry Pi (and
> probably other ARM-based systems).
On most embedded targets, and most newer ABI's, plain char is unsigned
because making "char" signed is a totally meaningless historical
artefact from the days of 7-bit ASCII as the the only character set
supported by C. The signedness of plain char is specified in the ABI,
not given by the compiler - and pretty much every target except 32-bit
Windows and a few embedded microcontrollers has a proper ABI that pretty
much every compiler follows. (Though gcc, and some other compilers, may
let you override the signedness of char with a command-line switch.)
>
> Many a C and C++ program out there doesn't work correctly for the Raspi
> because it wrongly assumes that 'char' is signed. I have encountered
> actual examples.
>
> (Most often this happens because of a if(c < 0), which obviously
> always evaluates to false if char is unsigned.)
>
Any code that makes any assumptions about the signedness of plain char
is broken. If the signedness matters, make it explicit. (Usually
int8_t and uint8_t make vastly more sense in code than "signed char" and
"unsigned char". Or int_least8_t and uint_least8_t for maximal
portability.)
Unfortunately, you are right that people sometimes write such broken code.