Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CHAR_BIT != 8

272 views
Skip to first unread message

Geoff

unread,
Apr 17, 2018, 12:43:36 AM4/17/18
to
Does the standard guarantee that CHAR_BIT will always be 8?

In other words, does one ever have to defensively write:

#include <limits.h>

#if CHAR_BIT != 8
#error This code assumes CHAR_BIT is 8;
#endif

... for code that assumes the ASCII character set is in effect?

Richard Kettlewell

unread,
Apr 17, 2018, 4:07:31 AM4/17/18
to
Geoff <ge...@invalid.invalid> writes:
> Does the standard guarantee that CHAR_BIT will always be 8?

No. It guarantees only that CHAR_BIT >= 8.

--
https://www.greenend.org.uk/rjk/

supe...@casperkitty.com

unread,
Apr 17, 2018, 10:47:48 AM4/17/18
to
On Monday, April 16, 2018 at 11:43:36 PM UTC-5, Geoff wrote:
> Does the standard guarantee that CHAR_BIT will always be 8?

It allows for other sizes, and indeed I've programmed for a system where
CHAR_BIT was 16. Writing networking code on a system where CHAR_BIT is
16 was a bit painful, but performance was much better than would have been
possible on that hardware using CHAR_BIT==8 (requiring that every character
write read a 16-bit value from RAM, modify it, and write it back), and
despite the pain of 16-bit characters, using a 16-bit-character dialect of
C was much less painful than doing everything in assembly language would
have been.

Lew Pitcher

unread,
Apr 17, 2018, 11:04:50 AM4/17/18
to
Geoff wrote:

> Does the standard guarantee that CHAR_BIT will always be 8?

No. The standard guarantees only that CHAR_BIT will NOT BE LESS THAN 8. It
doesn't define an explicit upper bound for CHAR_BIT (the implicit upper
bound would have a practical limit based on the internal representation of a
pre-processor's "constant-expression").

> In other words, does one ever have to defensively write:
>
> #include <limits.h>
>
> #if CHAR_BIT != 8
> #error This code assumes CHAR_BIT is 8;
> #endif
>
> ... for code that assumes the ASCII character set is in effect?

You mistakenly conflate 8-bit char widths with the ASCII character code.
Your code (above) won't guarantee that "the ASCII character set is in
effect", no matter what the size of CHAR_BIT.

For instance, EBCDIC is an 8-bit character code, and an EBCDIC-based
compiler on an IBM system would probably (knowing IBM's 360/370/zSeries
architecture) have CHAR_BIT == 8. However, the character code in use /would
NOT/ be ASCII.

For that matter, ASCII is a 7-bit character code, and is expressed in C in
/at least/ 8 bits; your code would exclude (say) a PDP-11 (with 9-bit
characters) that uses ASCII as it's character code.

--
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

Keith Thompson

unread,
Apr 17, 2018, 11:43:12 AM4/17/18
to
Lew Pitcher <lew.p...@digitalfreehold.ca> writes:
[...]
> For that matter, ASCII is a 7-bit character code, and is expressed in C in
> /at least/ 8 bits; your code would exclude (say) a PDP-11 (with 9-bit
> characters) that uses ASCII as it's character code.

Which is fine, since there are no PDP-11s with 9-bit characters (it has
8-bit bytes). But there have been other systems with 9-bit characters.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Lew Pitcher

unread,
Apr 17, 2018, 12:00:17 PM4/17/18
to
Keith Thompson wrote:

> Lew Pitcher <lew.p...@digitalfreehold.ca> writes:
> [...]
>> For that matter, ASCII is a 7-bit character code, and is expressed in C
>> in /at least/ 8 bits; your code would exclude (say) a PDP-11 (with 9-bit
>> characters) that uses ASCII as it's character code.
>
> Which is fine, since there are no PDP-11s with 9-bit characters

My memory was off, I guess :-)

> (it has 8-bit bytes). But there have been other systems with 9-bit
> characters.

I'm likely thinking of the PDP-10 (yes, obsolete and likely without a
conforming C implementation) which had 36-bit words, often subdivided into
four 9-bit "bytes". I /have/ seen tools that pack five ASCII characters into
a PDP-10 word (that would leave 1 bit unused in each word), and I know that
DEC used a "SIX-bit" characterset, packing 6 characters into word.

Nevertheless, the point remains; CHAR_BIT == 8 does not imply ASCII, and
ASCII does not imply CHAR_BIT == 8

:-)

Keith Thompson

unread,
Apr 17, 2018, 1:40:04 PM4/17/18
to
Lew Pitcher <lew.p...@digitalfreehold.ca> writes:
> Keith Thompson wrote:
>> Lew Pitcher <lew.p...@digitalfreehold.ca> writes:
>> [...]
>>> For that matter, ASCII is a 7-bit character code, and is expressed in C
>>> in /at least/ 8 bits; your code would exclude (say) a PDP-11 (with 9-bit
>>> characters) that uses ASCII as it's character code.
>>
>> Which is fine, since there are no PDP-11s with 9-bit characters
>
> My memory was off, I guess :-)

Just a bit. 8-)}

>> (it has 8-bit bytes). But there have been other systems with 9-bit
>> characters.
>
> I'm likely thinking of the PDP-10 (yes, obsolete and likely without a
> conforming C implementation) which had 36-bit words, often subdivided into
> four 9-bit "bytes". I /have/ seen tools that pack five ASCII characters into
> a PDP-10 word (that would leave 1 bit unused in each word), and I know that
> DEC used a "SIX-bit" characterset, packing 6 characters into word.

CHAR_BIT==6 would have been non-conforming -- but I don't think that was
ever an issue.

> Nevertheless, the point remains; CHAR_BIT == 8 does not imply ASCII, and
> ASCII does not imply CHAR_BIT == 8
>
> :-)

Indeed.

Tim Rentsch

unread,
Apr 18, 2018, 10:58:26 AM4/18/18
to
Lew Pitcher <lew.p...@digitalfreehold.ca> writes:

> [systems with 9-bit characters.]
>
> I'm likely thinking of the PDP-10 (yes, obsolete and likely
> without a conforming C implementation) which had 36-bit words,
> often subdivided into four 9-bit "bytes". I /have/ seen tools
> that pack five ASCII characters into a PDP-10 word (that would
> leave 1 bit unused in each word),

The PDP-10 routinely packed 7-bit ASCII characters five
to a 36-bit word, with the low bit unused. There were
(if memory serves) two assembler directives, ASCII and
ASCIZ (which added a null terminator), to store string
literals in that format.
0 new messages