Bits, bytes, chars, sizeof: byte == char?? What is a 'byte' anyway?

John Barclay

unread,

Dec 2, 2002, 5:15:20 AM12/2/02

to

What is a 'char', and what is a 'byte'? This question was prompted by
a section in the C++ FAQ, which doesn't seem to agree with my reading
of K&R. I don't have any standards docs for C or C++, so clarification
would be appreciated.

K&R seems to be inconsistent about what a 'byte' is, and whether a
'byte' is actually a 'char'. Some quotes from K&R:

1) 2nd edn., p36: "char: a single byte, capable of holding one
character in the local character set".

2) limits.h: a char [note, *not* a byte] must hold at least 8 "bits",
but may hold more

3) A7.4.8: "The sizeof operator yields the number of bytes required to
store an object of the type of its operand... when sizeof is applied
to a char, the result is 1".

However, K&R doesn't, I think, say what a byte is. Is it the minimum
addressable word size of the machine? If it is, and the machine has a
minimum addressable word size of "8 bits", and a compiler on that
machine implements "16-bit chars" as allowed by limits.h, then surely
the sizeof a char on that machine/compiler is 2, not 1? Defining
sizeof char as 1 in this case appears to be both pointless and
incorrect. A7.4.8 practically mandates that a "byte" is 16 bits when a
compiler chooses to implement 16-bit chars on an 8-bit addressable
machine, so it also seems to say that a "byte" is *not* the minimum
addressable word size of the machine.

The C++ FAQ doesn't seem to make this much clearer. It has 6 sections
on this very question, but doesn't address the fundamental question of
what a "byte" is. It does say: "The C++ language guarantees a byte
must always have at least 8 bits", although K&R actually states that a
*char* must have at least 8 bits. The rest of the six sections appear
to go along with the idea that a byte is actually a char, without
actually saying it.

There must be compilers out there that implement 7-bit character sets.
How do they handle this?

The practical reason for this question is that I have some programs
which need to know exactly how many bits are in my own user-defined
types. I generally do this along these lines:

typedef unsigned char bit8;
typedef unsigned short bit16;
typedef unsigned long bit32;
typedef unsigned long long bit64;
....
// fix the typedefs above if any assertions fail
assert(sizeof(bit8) == 1);
assert(sizeof(bit16) == 2);
assert(sizeof(bit32) == 4);
assert(sizeof(bit64) == 8);

Do I need to fix this?

Thanks

John

Attila Feher

unread,

Dec 2, 2002, 6:17:46 AM12/2/02

to

"John Barclay" wrote:
> However, K&R doesn't, I think, say what a byte is.

It does. The byte in C and C++ is a char.

> Is it the minimum
> addressable word size of the machine?

No. It is a char.

> If it is, and the machine has a
> minimum addressable word size of "8 bits", and a compiler on that
> machine implements "16-bit chars" as allowed by limits.h, then surely
> the sizeof a char on that machine/compiler is 2, not 1?

No. The sizeof(char) is always 1. Byte is not 8 bits. The thing with
exactly 8 bits is called octet. Unfortunetaly for ages schools taught to
people that byte is 8 bits, because on the machines they new that was true.
But it isn't. Byte by the C and C++ standards definition is at least 8 bits
and it is equal to char in size.

> Defining
> sizeof char as 1 in this case appears to be both pointless and
> incorrect.

It is very correct and on the point. You think a byte is 8 bits. It isn't.
An octet is 8 bits a byte... well it is a bite or a chunk of bits - in C and
C++ at least 8 bits.

> A7.4.8 practically mandates that a "byte" is 16 bits when a
> compiler chooses to implement 16-bit chars on an 8-bit addressable
> machine, so it also seems to say that a "byte" is *not* the minimum
> addressable word size of the machine.

And that is never ever said in the standard that it is.

> The C++ FAQ doesn't seem to make this much clearer.

That is unfortunate if you find it this way.

> It has 6 sections
> on this very question, but doesn't address the fundamental question of
> what a "byte" is. It does say: "The C++ language guarantees a byte
> must always have at least 8 bits", although K&R actually states that a
> *char* must have at least 8 bits.

Since char and byte is the same from size point of view, these two statement
mean exactly the same thing.

> The rest of the six sections appear
> to go along with the idea that a byte is actually a char, without
> actually saying it.

Cool. Since it is not, only size-wise.

> There must be compilers out there that implement 7-bit character sets.
> How do they handle this?

I have no idea. But you have to realize that 7 bits character sets are also
represented on at least 8 bits. The 8th bit may be undefined in its value,
fixed 0 or 1 or some sort of parity etc.

> The practical reason for this question is that I have some programs
> which need to know exactly how many bits are in my own user-defined
> types.

AFAIK you will get all this info from the headers limits.h in C and in C++
limits.

> assert(sizeof(bit8) == 1);

> Do I need to fix this?

Yes. The above assertion will never fail by bit8 typedef to char. You need
to look into the limits header if you use C++. If you use C, look at the
limits.h - I cannot recall what exactly is there.

Attila

Alexander Terekhov

unread,

Dec 2, 2002, 6:33:45 AM12/2/02

to

Attila Feher wrote:
[...]

> If you use C, look at the limits.h - I cannot recall what exactly is there.

Extensions aside: http://www.opengroup.org/onlinepubs/007904975/basedefs/limits.h.html

regards,
alexander.

P.S. http://groups.google.com/groups?selm=3CC6D78C.CF4E562F%40web.de
(Subject: Re: byte & char ?)

< Kinda P.P.S. >

Brian Rodenborn wrote: [Subject: Re: C++ Standard]
>
> Tom <k...@kk.com> wrote in message news:z_tG9.32486$zX3....@news.indigo.ie...
>
> > Where can I get it? It has to be somewhere on the net. I would've thought
> > yous would have a link to it at your FAQ, that way yous would get ALOT
> > less stupid questions.

That's correct. Folks would then also be able to point others to the formal
standardeez prose *directly* [presuming that it is be published on the web
ala ISO/IEC 9945:2002(*)].

>
> The Standard is not public domain, and hence is protected under copyright
> law. Anyone that did have a copy on the web would be subject to legal action
> from the ANSI or ISO groups.

Funny. Here we go: http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2001/n1316

regards,
alexander.

(*) http://www.iso.ch/iso/en/commcentre/pressreleases/2002/Ref837.html

Rolf Magnus

unread,

Dec 2, 2002, 6:45:00 AM12/2/02

to

John Barclay wrote:

> What is a 'char', and what is a 'byte'?

A byte is a unit of storage. A char can hold 1 byte.

> K&R seems to be inconsistent about what a 'byte' is, and whether a
> 'byte' is actually a 'char'. Some quotes from K&R:
>
> 1) 2nd edn., p36: "char: a single byte, capable of holding one
> character in the local character set".
>
> 2) limits.h: a char [note, *not* a byte] must hold at least 8 "bits",
> but may hold more
>
> 3) A7.4.8: "The sizeof operator yields the number of bytes required to
> store an object of the type of its operand... when sizeof is applied
> to a char, the result is 1".

What's inconsistent about that?

> However, K&R doesn't, I think, say what a byte is. Is it the minimum
> addressable word size of the machine?

Not quite. It most of the time is, but it doesn't need to be. It's the
minimum addressable word size of the C or C++ environment.

> If it is, and the machine has a
> minimum addressable word size of "8 bits", and a compiler on that
> machine implements "16-bit chars" as allowed by limits.h, then surely
> the sizeof a char on that machine/compiler is 2, not 1?

No. sizeof(char) is _always_ 1. But a byte is 16 bit in this case.

> Defining
> sizeof char as 1 in this case appears to be both pointless and
> incorrect. A7.4.8 practically mandates that a "byte" is 16 bits when a
> compiler chooses to implement 16-bit chars on an 8-bit addressable
> machine, so it also seems to say that a "byte" is *not* the minimum
> addressable word size of the machine.

Right. A C byte doesn't need to be equal in size with the machine word
size. It's just the smallest unit that you can access in your C
program.

> The C++ FAQ doesn't seem to make this much clearer. It has 6 sections
> on this very question, but doesn't address the fundamental question of
> what a "byte" is. It does say: "The C++ language guarantees a byte
> must always have at least 8 bits", although K&R actually states that a
> *char* must have at least 8 bits.

What's the problem? The size of a char is always 1 byte, so both
statements are equivalent.

> The rest of the six sections appear
> to go along with the idea that a byte is actually a char, without
> actually saying it.
>
> There must be compilers out there that implement 7-bit character sets.
> How do they handle this?

They could use 8bit bytes and just leave 1 byte unused for characters.
You also shouldn't think of a char as a special variable type to hold
characters. It can do this, but actually, char is just a small integer
type.

> The practical reason for this question is that I have some programs
> which need to know exactly how many bits are in my own user-defined
> types.

May I ask why? Usually, you only need to know exact bit sizes in special
system-specific code, on which you can make assumptions about the sizes
of the different integer types anyway.

> I generally do this along these lines:
>
> typedef unsigned char bit8;
> typedef unsigned short bit16;
> typedef unsigned long bit32;
> typedef unsigned long long bit64;
> ....
> // fix the typedefs above if any assertions fail
> assert(sizeof(bit8) == 1);
> assert(sizeof(bit16) == 2);
> assert(sizeof(bit32) == 4);
> assert(sizeof(bit64) == 8);
>
> Do I need to fix this?

If you want the number of bits of a char, use CHAR_BITS:

assert(CHAR_BITS == 8);

OTOH, not all bits might be value bits. Some bits of a type can also be
padding bits. If you need special ranges, use stuff like CHAR_MIN and
CHAR_MAX.

If you can use C99, there are already special types for specific sizes
in stdint.h:

int8_t would be a type with exactly 8 bit, but is only defined if the
machine knows such a type.
int_least8_t would be the smallest type that has at least 8 bit.
int_fast8_t would be the fastest type that has at least 8 bit.

Dik T. Winter

unread,

Dec 2, 2002, 6:48:24 AM12/2/02

to

In article <pr9muugsudb9sr17a...@4ax.com> John Barclay <j...@yahoo.nospam.com> writes:
> What is a 'char', and what is a 'byte'? This question was prompted by
> a section in the C++ FAQ, which doesn't seem to agree with my reading
> of K&R. I don't have any standards docs for C or C++, so clarification
> would be appreciated.

In C (and I presume also in C++) 'char' is just another name for 'byte'.

> K&R seems to be inconsistent about what a 'byte' is, and whether a
> 'byte' is actually a 'char'. Some quotes from K&R:
>
> 1) 2nd edn., p36: "char: a single byte, capable of holding one
> character in the local character set".

And here K&R are not inconsistent, they say the same here.

> However, K&R doesn't, I think, say what a byte is. Is it the minimum
> addressable word size of the machine?

No. It is the same as a char and at least 8 bits. There are no
further requirements.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Attila Feher

unread,

Dec 2, 2002, 6:50:46 AM12/2/02

to

"Alexander Terekhov" wrote:
>
> Attila Feher wrote:
> [...]
> > If you use C, look at the limits.h - I cannot recall what exactly is
there.
>
> Extensions aside:
http://www.opengroup.org/onlinepubs/007904975/basedefs/limits.h.html

Yah.

I never ever wrote that:

> > The Standard is not public domain, and hence is protected under
copyright
> > law. Anyone that did have a copy on the web would be subject to legal
action
> > from the ANSI or ISO groups.

Attila

Alexander Terekhov

unread,

Dec 2, 2002, 7:06:02 AM12/2/02

to

Attila Feher wrote:
[...]

> I never ever wrote that:
> > > The Standard is not public domain, and hence is protected under copyright
> > > law. Anyone that did have a copy on the web would be subject to legal action
> > > from the ANSI or ISO groups.

Well, <http://groups.google.com/groups?group=sci.med.vision>, Wolf. Really.

*I wrote*:

: ....
: < Kinda P.P.S. >

:
: Brian Rodenborn wrote: [Subject: Re: C++ Standard]

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
: ....
: > The Standard is not public domain, and hence is protected under copyright

: > law. Anyone that did have a copy on the web would be subject to legal action
: > from the ANSI or ISO groups.

: ....

regards,
alexander.

Quixote

unread,

Dec 2, 2002, 7:11:23 AM12/2/02

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3DEB4519...@web.de...

> Brian Rodenborn wrote: [Subject: Re: C++ Standard]

> > The Standard is not public domain, and hence is protected under
copyright
> > law. Anyone that did have a copy on the web would be subject to legal
action
> > from the ANSI or ISO groups.
>
> Funny. Here we go:
http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2001/n1316
>
> regards,
> alexander.
>

This link appears to be to a draft --- a different matter entirely in terms
of copyright.

--

Quixote
1. To reply to email address, remove donald
2. Don't reply to email address (post here instead)

Dan Pop

unread,

Dec 2, 2002, 6:52:03 AM12/2/02

to

In <pr9muugsudb9sr17a...@4ax.com> John Barclay <j...@yahoo.nospam.com> writes:

>What is a 'char', and what is a 'byte'? This question was prompted by
>a section in the C++ FAQ, which doesn't seem to agree with my reading
>of K&R. I don't have any standards docs for C or C++, so clarification
>would be appreciated.

The C89 definiton of byte is:

* Byte --- the unit of data storage in the execution environment
large enough to hold any member of the basic character set of the
execution environment. It shall be possible to express the address of
each individual byte of an object uniquely. A byte is composed of a
contiguous sequence of bits, the number of which is
implementation-defined. The least significant bit is called the
low-order bit; the most significant bit is called the high-order bit.

Basically, all the character types (char, signed char and unsigned char)
occupy one byte, by definition, but only unsigned char is guaranteed to
actually use all the bits in a byte.

If a byte is wide enough, other integer types may consist of a single
byte. E.g. if a byte has 16 bits, all the integer types up to int
could occupy a single byte. And there are C implementations with 32-bit
bytes, where all the integral types take one byte.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Alexander Terekhov

unread,

Dec 2, 2002, 7:48:02 AM12/2/02

to

Quixote wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3DEB4519...@web.de...
> > Brian Rodenborn wrote: [Subject: Re: C++ Standard]
> > > The Standard is not public domain, and hence is protected under
> copyright
> > > law. Anyone that did have a copy on the web would be subject to legal
> action
> > > from the ANSI or ISO groups.
> >
> > Funny. Here we go:
> http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2001/n1316
> >
> > regards,
> > alexander.
> >
>
> This link appears to be to a draft --- a different matter entirely in terms
> of copyright.

<llort>

Oh! Yeah!! Sure!!! ;-)

Well, to make it even more comic than it already is, here's what
I've posted to c.l.c++.mod. [the 2nd one just a few minutes ago].

< 2 x Forward Inline >

-------- Original Message --------
Message-ID: <3DE9131D...@web.de>
Newsgroups: comp.lang.c++.moderated
Subject: Re: Error Handler class and standards questions

Francis Glassborow wrote:
[...]
> >1) POSIX C standard
>
> To the best of my knowledge there is no such thing thought the POSIX
> standard does provide C bindings.

POSIX.1 *extends* standard C99. "POSIX C standard" is this:

http://www.iso.ch/iso/en/commcentre/pressreleases/2002/Ref837.html

regards,
alexander.

--
http://tinyurl.com/342q http://tinyurl.com/342r http://tinyurl.com/342s
(Newsgroup c.l.c++, Subject: Re: Error Handler and standards questions)

-------- Original Message --------
Message-ID: <3DEB49E6...@web.de>
Newsgroups: comp.lang.c++.moderated
Followup-To: misc.int-property
Subject: Re: Error Handler class and standards questions

< Followup-To: misc.int-property >

Alexander Terekhov wrote:
[...]
> http://tinyurl.com/342q http://tinyurl.com/342r http://tinyurl.com/342s
> (Newsgroup c.l.c++, Subject: Re: Error Handler and standards questions)
>
> {Please note that following the chain of URLs provided can expose you to
> abusing intellectual property rights. Published material may only be
> legally used for the purpose for which it is published. -mod/fwg}

Could you please elaborate on this? I mean the risk of exposure
"abusing intellectual property rights" with respect to following
the chain of URLs I've posted?? Please note the followup setting.

TIA.

regards,
alexander.

Richard Bos

unread,

Dec 2, 2002, 8:36:57 AM12/2/02

to

"Quixote" <donald...@datafast.net.au> wrote:

> "Alexander Terekhov" <tere...@web.de> wrote in message
> news:3DEB4519...@web.de...
> > Brian Rodenborn wrote: [Subject: Re: C++ Standard]
> > > The Standard is not public domain, and hence is protected under
> > > copyright law. Anyone that did have a copy on the web would be
> > > subject to legal action from the ANSI or ISO groups.
> >
> > Funny. Here we go:
> > http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2001/n1316
>

> This link appears to be to a draft --- a different matter entirely in terms
> of copyright.

And note that this is a link to the site of the ISO workgroup itself -
the very people who own the copyright. They, of all people, would be all
right if they _did_ have the actual Standard posted.

Richard

Alexander Terekhov

unread,

Dec 2, 2002, 8:53:59 AM12/2/02

to

Richard Bos wrote:
[...]

> And note that this is a link to the site of the ISO workgroup itself -
> the very people who own the copyright.

Well, take this <http://www.comnets.rwth-aachen.de/doc/c++std>, then.

regards,
alexander.

--
http://www.google.com/search?q=C%2B%2B+standard+draft

Richard Bos

unread,

Dec 2, 2002, 9:04:35 AM12/2/02

to

Alexander Terekhov <tere...@web.de> wrote:

> Richard Bos wrote:
> [...]
> > And note that this is a link to the site of the ISO workgroup itself -
> > the very people who own the copyright.
>
> Well, take this <http://www.comnets.rwth-aachen.de/doc/c++std>, then.

That's a different Standard. And it's _also_ working draft.

Richard

Alexander Terekhov

unread,

Dec 2, 2002, 9:07:12 AM12/2/02

to

Richard Bos wrote:
>
> Alexander Terekhov <tere...@web.de> wrote:
>
> > Richard Bos wrote:
> > [...]
> > > And note that this is a link to the site of the ISO workgroup itself -
> > > the very people who own the copyright.
> >
> > Well, take this <http://www.comnets.rwth-aachen.de/doc/c++std>, then.
>
> That's a different Standard.

;-)

> And it's _also_ working draft.

Sure it is. ;-)

regards,
alexander.

John Barclay

unread,

Dec 2, 2002, 11:42:17 AM12/2/02

to

On Mon, 02 Dec 2002 12:33:45 +0100, Alexander Terekhov
<tere...@web.de> wrote:

> <snipped>

Wow! A byte expert. Thanks for all the links; I've downloaded the
draft/whatever spec.

Can you summarise the Posix situation? If I understand it correctly,
Posix has now stated the obvious, ie. a 'byte' was really 8 bits all
along. Is there an equivalent to sizeof that gives a 'real' size? Does
Posix have any relationship to the C and C++ standards? Does it extend
the standards or it is irrelevant to the standards?

John

John Barclay

unread,

Dec 2, 2002, 11:42:21 AM12/2/02

to

On Mon, 02 Dec 2002 12:45:00 +0100, Rolf Magnus <rama...@t-online.de>
wrote:

>John Barclay wrote:
>> The practical reason for this question is that I have some programs
>> which need to know exactly how many bits are in my own user-defined
>> types.
>
>May I ask why? Usually, you only need to know exact bit sizes in special
>system-specific code, on which you can make assumptions about the sizes
>of the different integer types anyway.

Sure. I design and simulate hardware, and I need to emulate arithmetic
in known-size registers. This means that I can't rely on the basic
types: I have to know exactly what assumptions the compiler has made.

>If you can use C99, there are already special types for specific sizes
>in stdint.h:

Luckily, I can use whatever I want - thanks, I'll use this and get rid
of the assertions.

John

Default User

unread,

Dec 2, 2002, 12:37:03 PM12/2/02

to

Alexander Terekhov wrote:

> Funny. Here we go: http://std.dkuug.dk/JTC1/SC22/WG21/docs/papers/2001/n1316

This is a working draft of some other document. I'll take your word for
it that the current ISO C++ standard is embedded in there, but I sure
can't tell that for certain by looking at it.

The document that is available from ANSI or ISO as the C++ standard is a
copyrighted document that is not available freely. The fact that you can
point to some draft of another document doesn't change that fact in the
slightest.

Brian Rodenborn

Alexander Terekhov

unread,

Dec 2, 2002, 12:56:33 PM12/2/02

to

John Barclay wrote:
[...]

> Can you summarise the Posix situation?

The POSIX Rationale can.

http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap03.html#tag_01_03_00_02

"Byte

The restriction that a byte is now exactly eight
bits was a conscious decision by the standard
developers. It came about due to a combination
of factors, primarily the use of the type
int8_t within the networking functions and the
alignment with the ISO/IEC 9899:1999 standard,
where the intN_t types are now defined.

According to the ISO/IEC 9899:1999 standard:

The [u]intN_t types must be two's complement
with no padding bits and no illegal values.

All types (apart from bit fields, which are not
relevant here) must occupy an integral number of
bytes.

If a type with width W occupies B bytes with C
bits per byte ( C is the value of {CHAR_BIT}),
then it has P padding bits where P+ W= B* C.

Therefore, for int8_t P=0, W=8. Since B>=1, C>=8,
the only solution is B=1, C=8.

The standard developers also felt that this was
not an undue restriction for the current state-of-
the-art for this version of IEEE Std 1003.1, but
recognize that if industry trends continue, a wider
character type may be required in the future."

"Character

The term "character" is used to mean a sequence
of one or more bytes representing a single graphic
symbol. The deviation in the exact text of the
ISO C standard definition for "byte" meets the
intent of the rationale of the ISO C standard
also clears up the ambiguity raised by the term
"basic execution character set". The octet-minimum
requirement is a reflection of the {CHAR_BIT} value."

> If I understand it correctly, Posix has now stated the
> obvious, ie. a 'byte' was really 8 bits all along.

Sort of. ;-)

> Is there an equivalent to sizeof that gives a 'real' size?

What do you mean? The size in {posix} bytes on a {non-posix}
impl that just can't address {posix} bytes? That would be pretty
useless, I'm afraid. Well, but you could probably calculate the
number of bits and divide it by 8, so to speak. ;-) The ISO C99
says: "Values stored in non-bit-field objects of any other
object type consist of n x CHAR_BIT bits, where n is the size of
an object of that type, in bytes". Heck, it's so funny. ;-)

> Does Posix have any relationship to the C and C++ standards?
> Does it extend the standards or it is irrelevant to the standards?

It extends the C99 standard and {currently} has no relationship
whatsoever to the C++ standard other than a bunch of liaisons
[on both sides, I guess], and kinda silly contradiction(s) ala
whether-Std.C-library-functions-can-throw [POSIX:OK vs C++:NAY].

regards,
alexander.

Christian Bau

unread,

Dec 2, 2002, 1:16:59 PM12/2/02

to

In article <da3nuucomqg9ujr5v...@4ax.com>,
John Barclay <j...@yahoo.nospam.com> wrote:

> On Mon, 02 Dec 2002 12:33:45 +0100, Alexander Terekhov
> <tere...@web.de> wrote:
>
> > <snipped>
>
> Wow! A byte expert. Thanks for all the links; I've downloaded the
> draft/whatever spec.
>
> Can you summarise the Posix situation? If I understand it correctly,
> Posix has now stated the obvious, ie. a 'byte' was really 8 bits all

--------------------------

Why would that be obvious? I have used machines with six bit bytes;
obviously not using C because C requires eight or more bits. It just
happens that at the moment, the majority of implementations use CHAR_BIT
== 8. But then more and more code uses characters that don't fit into
char.

> along. Is there an equivalent to sizeof that gives a 'real' size? Does
> Posix have any relationship to the C and C++ standards? Does it extend
> the standards or it is irrelevant to the standards?

The "equivalent to sizeof" is sizeof. On a POSIX compatible C compiler
or on a compiler with CHAR_BIT == 8, sizeof yields the number of octets.
You could use sizeof (...) * CHAR_BIT / 8, that would give the number of
octets whenever it makes sense. For example, if CHAR_BIT == 9 and sizeof
(long) == 4 then talking about octets doesn't make sense.

Of all the Standard C conforming compilers, only those with CHAR_BIT ==
8 are POSIX conformant, so POSIX would be restricting. On the other
hand, I don't think that all the functions of the Standard C library are
in the POSIX Standard; if they are not and POSIX doesn't require a C
compiler to be a Standard C compiler then a compiler that is not a
Standard C compiler could be a POSIX C compiler.

Rolf Magnus

unread,

Dec 2, 2002, 2:39:13 PM12/2/02

to

Christian Bau wrote:

>> Can you summarise the Posix situation? If I understand it correctly,
>> Posix has now stated the obvious, ie. a 'byte' was really 8 bits all
> --------------------------
>
> Why would that be obvious? I have used machines with six bit bytes;
> obviously not using C because C requires eight or more bits.

No, C doesn't require that. It is required _for_ a C compiler to have a
char of eight or more bits. If the hardware doesn't provide it, the
compiler has to emulate it somehow.

> It just
> happens that at the moment, the majority of implementations use
> CHAR_BIT == 8. But then more and more code uses characters that don't
> fit into char.

But this isn't usually the basic character set. They are wide
characters.

Tom

unread,

Dec 2, 2002, 3:47:28 PM12/2/02

to

John Barclay posted:

0 1 2 3 4 5 6 7 8 9

These are the numbers we use. We count like this

0
1
2
3
4
5
6
7
8
9

Then when we've run out of numbers, we roll back to zero and add a digit

9

roll back

10

11
12
13
14
15

Computer have the following digits to work with

0 1

So they count like this

0 ====== 0

1 ======== 1

Roll Back

10 ============= 2

11 ============= 3

RollBack

100 ======== 4

101 ======= 5

110 ======== 6

111 ======= 7

This is called binary, the only digits are 0 and 1. It's the most simple
number system. One digit is called a bit. 8 digits is a byte

01011101

Thats a byte, it's maximum number is 255

The set of characters we use contains 255 characters. Therefore a sizeof
(char) === 1 Byte. But it would be no problem if say a char was 16 Bit. It
wastes memory but it works.

BTW, there's another character set called UNICODE. It's 16 Bit. It contains
all the characters in pretty much all the languages of the world.

Tom

John Barclay

unread,

Dec 3, 2002, 5:05:20 AM12/3/02

to

On Mon, 02 Dec 2002 18:56:33 +0100, Alexander Terekhov
<tere...@web.de> wrote:

>John Barclay wrote:
>> Is there an equivalent to sizeof that gives a 'real' size?
>
>What do you mean? The size in {posix} bytes on a {non-posix}
>impl that just can't address {posix} bytes? That would be pretty
>useless, I'm afraid. Well, but you could probably calculate the
>number of bits and divide it by 8, so to speak. ;-) The ISO C99
>says: "Values stored in non-bit-field objects of any other
>object type consist of n x CHAR_BIT bits, where n is the size of
>an object of that type, in bytes". Heck, it's so funny. ;-)

Point taken, but I can think of two reasons why you would need a size
in Posix/real-world bytes, even though the compiler may not know what
they are, and may not be able to address them.

The first is that memory allocation routines reserve an area of memory
defined in {compiler} bytes. This isn't much use, since anyone who
actually owns a computer, and has bought memory for it, has bought
{posix} bytes. They don't necessarily know how many {compiler} bytes
of main memory that they have.

The second reason is directly related to what I do, which is hardware
simulation. I need to create a 'register' which has, say, 24 or 32
bits in it, and I need to do arithmetic on the contents of the
register. I *could* do this in a way which was independent of what the
compiler thought of as a "byte", but it would be incredibly
inefficient. To do this efficiently, I need to know, exactly, what the
compiler can address.

Hence my need for a real-world/Posix sizeof. However, given what
you've already said, I can construct this easily enough anyway, so
there's no fundamental problem.

John

those who know me have no need of my name

unread,

Dec 3, 2002, 7:11:33 AM12/3/02

to

in comp.lang.c i read:

>In article <da3nuucomqg9ujr5v...@4ax.com>,
> John Barclay <j...@yahoo.nospam.com> wrote:

>> Wow! A byte expert. Thanks for all the links; I've downloaded the
>> draft/whatever spec.

cool. will you now spout out-of-date or erroneous information as if it
were current and correct? do yourself a favor, use official standard(s)
rather than drafts, which in the case of c or c++ means spending around
usd 40.

>Of all the Standard C conforming compilers, only those with CHAR_BIT ==
>8 are POSIX conformant, so POSIX would be restricting. On the other
>hand, I don't think that all the functions of the Standard C library are
>in the POSIX Standard; if they are not and POSIX doesn't require a C
>compiler to be a Standard C compiler then a compiler that is not a
>Standard C compiler could be a POSIX C compiler.

yet another reason not to discuss off-topic items. posix/sus completely
embraces c99, though it also extends it or sets further restrictions, e.g.,
fopen is required to set errno on failure and CHAR_BIT must be eight,
respectively. further should be taken to comp.unix.programmer, or
comp.std.unix if about the standard itself.

--
bringing you boring signatures for 17 years

Alexander Terekhov

unread,

Dec 3, 2002, 7:34:02 AM12/3/02

to

those who know me have no need of my name wrote:
>
> in comp.lang.c i read:
> >In article <da3nuucomqg9ujr5v...@4ax.com>,
> > John Barclay <j...@yahoo.nospam.com> wrote:
>
> >> Wow! A byte expert. Thanks for all the links; I've downloaded the
> >> draft/whatever spec.
>
> cool. will you now spout out-of-date or erroneous information as if it
> were current and correct?

Oh, yeah (except that in this case, it's kinda the other way around,
Mr. "no-need-of-my-name").

regards,
alexander.

those who know me have no need of my name

unread,

Dec 3, 2002, 7:55:34 AM12/3/02

to

in comp.lang.c i read:

>This is called binary, the only digits are 0 and 1. It's the most simple
>number system. One digit is called a bit. 8 digits is a byte

8 bits per byte is an unwarranted assertion/assumption within the context
appropriate to these groups, the c and c++ languages where the number of
bits in a byte is an implementation detail and need not be 8. 8 is very
common, to the point that people have a hard time with the concept of other
values, but as one's experience grows this becomes less of a concern.

>The set of characters we use contains 255 characters.

we? who are you calling `we'? perhaps you are royalty.

>Therefore a sizeof (char) === 1 Byte. But it would be no problem if say a
>char was 16 Bit. It wastes memory but it works.

when CHAR_BIT is 16 or 32 it is typical when working with narrow character
sets to manually pack the characters into the byte when working with masses
of them. i.e., reading, writing and/or using a few at a time is done
without regard to the potentially wasted space.

John Barclay

unread,

Dec 3, 2002, 10:59:39 AM12/3/02

to

On 03 Dec 2002 12:11:33 GMT, those who know me have no need of my name
<not-a-rea...@usa.net> wrote:

>in comp.lang.c i read:
>>In article <da3nuucomqg9ujr5v...@4ax.com>,
>> John Barclay <j...@yahoo.nospam.com> wrote:
>
>>> Wow! A byte expert. Thanks for all the links; I've downloaded the
>>> draft/whatever spec.
>
>cool. will you now spout out-of-date or erroneous information as if it
>were current and correct?

No thanks. Looks like you've got that one sewn up already.

John

Tom

unread,

Dec 6, 2002, 1:39:17 PM12/6/02

to

those who know me have no need of my name posted:

faggot

Tom

Kevin Goodsell

unread,

Dec 6, 2002, 3:02:05 PM12/6/02

to

On Fri, 06 Dec 2002 18:39:17 GMT, Tom <k...@kk.com> wrote:

This message violates your service provider's Acceptable Usage Policy.
So does using their service to illegally download copyrighted
material, as you admitted in another thread. You have been reported
for abuse on both counts.

-Kevin

Micah Cowan

unread,

Dec 6, 2002, 4:08:26 PM12/6/02

to

Tom <k...@kk.com> writes:

> faggot

You are gone. *plonk*.

--
Now available for part-time and full-time permanent or contract
positions in the Sacramento, CA (USA) area (or telecommute). Yes, it's
my real email address.

CBFalconer

unread,

Dec 6, 2002, 5:45:49 PM12/6/02

to

Tom wrote:
>
... snip ...
>
> faggot

*PLONK*

--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

David Thompson

unread,

Dec 7, 2002, 1:51:48 AM12/7/02

to

John Barclay <j...@yahoo.nospam.com> wrote :

> On Mon, 02 Dec 2002 12:45:00 +0100, Rolf Magnus <rama...@t-online.de>
> wrote:

...

> >May I ask why? Usually, you only need to know exact bit sizes in special
> >system-specific code, on which you can make assumptions about the sizes
> >of the different integer types anyway.
>
> Sure. I design and simulate hardware, and I need to emulate arithmetic
> in known-size registers. This means that I can't rely on the basic
> types: I have to know exactly what assumptions the compiler has made.
>

Then you don't really need exact width types, just at-least,
and that's somewhat easier. Even in C89, unsigned char
must be at least 8 bits, unsigned short and int at least 16,
unsigned long at least 32; and in C99 unsigned long long
at least 64. Just mask off the possible extra/garbage bits
in any situation where it matters; and where the mask is
redundant, e.g. ucharval & 0xFF if CHAR_BIT==8 which
it almost always is, a decent compiler will optimize it away.
If you really really want, put the masking in a macro which
you can #define away on compilers too dumb to optimize
(but known to have the exactly right widths). Or, for widths
not exceeding that of unsigned int use struct bitfield(s)
and the masking is effectively done for you on store.

Although equivalent requirements apply for signed types,
use unsigned types for data where you want to control,
manipulate, or rely on the bit patterns. C allows an
implementation to use a representation other than
two's complement for signed integer types, namely
ones' complement or sign-and-magnitude, though
there are only a tiny handful of machines that do so;
and to trap* on overflow, or on use of the -ipow(2,n-1)
"value" in 2sC, which are somewhat less uncommon;
either of which can screw up your emulated operations.
Unsigned integer types OTOH are required to give
exactly the same results on all implementations,
for a given width (which you can sufficiently control).

* Formally, signed overflow or use of a trap representation
is Undefined Behavior and the standard allows *anything*;
the canonical example hereabouts is that it would be legal
to cause demons to fly out of your nose. That option
hasn't actually been implemented; in practice, the worst
that happens is an unhandleable trap, but for your purposes
I'll bet that's quite bad enough. A silently wrong result is
also legal and would presumably be even worse.

--
- David.Thompson 1 now at worldnet.att.net

Joona I Palaste

unread,

Dec 7, 2002, 12:43:59 PM12/7/02

to

Tom <k...@kk.com> scribbled the following
on comp.lang.c:
> faggot

If you think this makes a valid counterargument, I'm amazed you ever
got past kindergarten.

--
/-- Joona Palaste (pal...@cc.helsinki.fi) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"'It can be easily shown that' means 'I saw a proof of this once (which I didn't
understand) which I can no longer remember'."
- A maths teacher

Ben Pfaff

unread,

Dec 7, 2002, 12:45:41 PM12/7/02

to

Joona I Palaste <pal...@cc.helsinki.fi> writes:

> Tom <k...@kk.com> scribbled the following

> > faggot
>
> If you think this makes a valid counterargument, I'm amazed you ever
> got past kindergarten.

This is Usenet, so for all we know he *hasn't* made it past
kindergarten.
--
"...what folly I commit, I dedicate to you."
--William Shakespeare, _Troilus and Cressida_

Dan Pop

unread,

Dec 9, 2002, 10:24:38 AM12/9/02

to

In <astc0v$fm0$1...@oravannahka.helsinki.fi> Joona I Palaste <pal...@cc.helsinki.fi> writes:

>Tom <k...@kk.com> scribbled the following
>on comp.lang.c:
>> faggot
>
>If you think this makes a valid counterargument, I'm amazed you ever

>got past kindergarten. ^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
How do you know he actually did? ;-)

Tom

unread,

Dec 9, 2002, 2:54:51 PM12/9/02

to

Dan Pop posted:

another inbred fuck

Joona I Palaste

unread,

Dec 9, 2002, 3:17:28 PM12/9/02

to

Tom <k...@kk.com> scribbled the following
on comp.lang.c:
> Dan Pop posted:
>> In <astc0v$fm0$1...@oravannahka.helsinki.fi> Joona I Palaste
>> <pal...@cc.helsinki.fi> writes:
>>
>>>Tom <k...@kk.com> scribbled the following on comp.lang.c:
>>>> faggot
>>>
>>>If you think this makes a valid counterargument, I'm amazed you ever
>>>got past kindergarten. ^^^^^^^^
>>>^^^^^^^^^^^^^^^^^^^^^
>> How do you know he actually did? ;-)

> another inbred fuck

Is that all you ever can say? Goodbye, idiot.

*PLONK*

--
/-- Joona Palaste (pal...@cc.helsinki.fi) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/

"It sure is cool having money and chicks."
- Beavis and Butt-head