B. Stroustrup, RE: sizeof(int) and sizeof(char)

Bob Hairgrove

unread,

Apr 13, 2002, 7:42:39 PM4/13/02

to

In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
there is an interesting passage on page 24:

"A char variable is of the natural size to hold a character on a given
machine (typically a byte), and an int variable is of the natural size
for integer arithmetic on a given machine (typically a word)."

Now the last statement (i.e. sizeof(int) typically == a word)
certainly shows the age of the text here. In the meantime, the
"natural" size of an int has grown to a 32-bit DWORD on most machines,
whereas 64-bit int's are becoming more and more common.

But what does this mean for char?? I was always under the assumption
that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
especially since there is no C++ "byte" type. As we now have the
wchar_t as an intrinsic data type, wouldn't this cement the fact that
char is always 1 byte?

What does the ANSI standard have to say about this?

Bob Hairgrove
rhairgro...@Pleasebigfoot.com

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Edwin Robert Tisdale

unread,

Apr 14, 2002, 2:56:42 AM4/14/02

to

Bob Hairgrove wrote:

> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character

> on a given machine (typically a byte) and an int variable

> is of the natural size for integer arithmetic
> on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here.
> In the meantime, the "natural" size of an int
> has grown to a 32-bit DWORD on most machines,
> whereas 64-bit int's are becoming more and more common.
>
> But what does this mean for char?

> I was always under the assumption that sizeof(char)
> is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type.
> As we now have the wchar_t as an intrinsic data type,
> wouldn't this cement the fact that char is always 1 byte?
>
> What does the ANSI standard have to say about this?

A byte is a data size -- not a data type.
A byte is 8 bits on virtually every modern processor
and the memories are almost always byte addressable.
A machine word is as wide as the integer data path
throught the Arithmetic and Logic Unit (ALU).

The old Control Data Corporation (CDC) computers
had 60 bit words and were word addressable.
Characters were represented by 60 bit words
or were packed into a word 10 at a time
which means that the CDC character code set
had just 64 distinct codes represented by a 6 bit byte.

Witless

unread,

Apr 14, 2002, 3:01:54 AM4/14/02

to

Bob Hairgrove wrote:

> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character on a given
> machine (typically a byte), and an int variable is of the natural size
> for integer arithmetic on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here.

No. You are applying a corruption of the term "word". It does not mean 16
bits. It means the natural size for the machine, typically the register
size. On a 128-bit machine it is 128 bits. On an 8-bit machine it is 8
bits.

> In the meantime, the
> "natural" size of an int has grown to a 32-bit DWORD on most machines,

No it hasn't. Most machines do not have DWORDs. 32-bit machines often have
words and half words.

4-bit machines that became 8-bit machines that became 16-bit machines that
became 32-bit machines have DWORDs. Nobody else has anything half as silly.

>
> whereas 64-bit int's are becoming more and more common.

64-bit registers are becoming more common. Many people who believe in DWORDs
object to 64-bit ints because their religion says that ints are 32 bits.

>
>
> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,

Your assumption is invalid.

> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?

No.

The type char can be 16 bits like Unicode or even 32 bits like the ISO
character sets.

>
>
> What does the ANSI standard have to say about this?

Have you read it?

James Kuyper Jr.

unread,

Apr 14, 2002, 3:01:02 AM4/14/02

to

Bob Hairgrove wrote:
>
> In Bjarne Stroustrup's 3rd edition of "The C++ Programming Language",
> there is an interesting passage on page 24:
>
> "A char variable is of the natural size to hold a character on a given
> machine (typically a byte), and an int variable is of the natural size
> for integer arithmetic on a given machine (typically a word)."
>
> Now the last statement (i.e. sizeof(int) typically == a word)
> certainly shows the age of the text here. In the meantime, the
> "natural" size of an int has grown to a 32-bit DWORD on most machines,
> whereas 64-bit int's are becoming more and more common.
>
> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?
>
> What does the ANSI standard have to say about this?

The standard mandates sizeof(char)==1. The only requirements on the size
of an 'int' are those implied by the requirements that INT_MIN<=-32767,
and INT_MAX>=32767 (these limits are incorporated by reference from the
C standard, rather than being specified in the C++ standard itself).

Bjarne's statement is technically incorrect, but true to the history of
C, when he identifies "char" more closely with "character" than with
"byte". His statement about "words" is actually more accurate;
traditionally a "word" of memory wasn't a fixed amount of memory, but
varied from machine to machine. On a 32-bit machine, a "word" should
properly be a 32-bit chunk of memory. However, when people are used to
programming only for a limited range of architectures, all of which
share the same word size, they tend to assume that "word" means the same
amount of memory on all machines, that it refers to on the machines
they're used to. If enough people do this, the term may even end up
being redefined; confusing people who still remember the original
definition.

James Kanze

unread,

Apr 14, 2002, 11:22:56 AM4/14/02

to

Edwin Robert Tisdale <E.Robert...@jpl.nasa.gov> writes:

|> Bob Hairgrove wrote:

|> > In Bjarne Stroustrup's 3rd edition of "The C++ Programming
|> > Language", there is an interesting passage on page 24:

|> > "A char variable is of the natural size to hold a character
|> > on a given machine (typically a byte) and an int variable
|> > is of the natural size for integer arithmetic
|> > on a given machine (typically a word)."

|> > Now the last statement (i.e. sizeof(int) typically == a word)
|> > certainly shows the age of the text here. In the meantime, the
|> > "natural" size of an int has grown to a 32-bit DWORD on most
|> > machines, whereas 64-bit int's are becoming more and more common.

Excuse me, but on 32 bit machines (at least the ones I've seen), DWORD
is 64 bits. The "traditional" widths (from IBM, since the 360) are:

BYTE 8 bits
HWORD 16 bits
WORD 32 bits
DWORD 64 bits

The only place I've seen otherwise is on 16 bit machines. Where word is
16 bits. Or, of course, on 36 bit machines, with 36 bit words, or 48
bit machines, with 48 bit words.

|> > But what does this mean for char? I was always under the
|> > assumption that sizeof(char) is ALWAYS guaranteed to be exactly 1
|> > byte, especially since there is no C++ "byte" type. As we now
|> > have the wchar_t as an intrinsic data type, wouldn't this cement
|> > the fact that char is always 1 byte?

The standard defines the results of sizeof as the size in bytes. And
guarantees that sizeof(char) == 1. So by definition, the size of a char
is one byte, even if that char has 32 bits.

|> > What does the ANSI standard have to say about this?
|>
|> A byte is a data size -- not a data type.
|> A byte is 8 bits on virtually every modern processor and the
|> memories are almost always byte addressable.

I'm not so sure. From what I've heard, more than a few DSP use 32 bit
char's.

|> A machine word is as wide as the integer data path throught the
|> Arithmetic and Logic Unit (ALU).

Or as wide as the memory bus?

I'm not sure that there is a real definition of "word". I've used
machines (Interdata 32/7) where the ALU was 16 bits wide, but the native
instruction set favored 32 bits (through judicious microcode), and if I
remember correctly, the memory bus was 32 bits wide (but it has been a
long time, and I could be mistaken).

|> The old Control Data Corporation (CDC) computers had 60 bit words
|> and were word addressable. Characters were represented by 60 bit
|> words or were packed into a word 10 at a time which means that the
|> CDC character code set had just 64 distinct codes represented by a 6
|> bit byte.

This wouldn't be legal in C/C++, since UCHAR_MAX must be at least 255.
A C/C++ implementation on this machine would probably use 6 10 bit bytes
to the word. (This is a C/C++ specific. The original use of byte was
for a 6 bit chunk of data.)

There have definitly been C implementations on 36 bit machines, normally
with 9 bit bytes, and there are implementations today (for DSPs) with 32
bit bytes. There probably are, and have been, others as well.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Mike Wahler

unread,

Apr 14, 2002, 11:21:52 AM4/14/02

to

Witless <wit...@attbi.com> wrote in message
news:3CB8E3C9...@attbi.com...

No, it's not invalid, it is precisely correct.
sizeof (char) is required to be one byte. Note
that byte size can and does vary among platforms,
and that a char (byte) is required by the C standard
to have at least eight bits, but is not prevented
from having more.

> > especially since there is no C++ "byte" type. As we now have the
> > wchar_t as an intrinsic data type, wouldn't this cement the fact that
> > char is always 1 byte?
>
> No.

A char is indeed always one byte, but the definition
of type 'wchar_t' has no influence upon this.
"sizeof(char) == one byte" is mandated by the standard.

>
> The type char can be 16 bits like Unicode or even 32 bits like the ISO
> character sets.

Yes, on machines with 16-bit or 32-bit *bytes*.
On a machine with e.g. 8-bit bytes, type 'char'
cannot represent every Unicode character. Thus
'wchar_t' was invented.

>
> >
> >
> > What does the ANSI standard have to say about this?
>
> Have you read it?

Have *you*? :-)

-Mike

Gennaro Prota

unread,

Apr 14, 2002, 11:21:57 AM4/14/02

to

On Sun, 14 Apr 2002 07:01:02 GMT, "James Kuyper Jr."
<kuy...@wizard.net> wrote:

> when people are used to
>programming only for a limited range of architectures, all of which
>share the same word size, they tend to assume that "word" means the same
>amount of memory on all machines, that it refers to on the machines
>they're used to. If enough people do this, the term may even end up
>being redefined; confusing people who still remember the original
>definition.

Yes, and a similar confusion already exists for "byte", which many
people incorrectly assume to mean "8-bit byte".

Genny

James Kanze

unread,

Apr 14, 2002, 11:24:08 AM4/14/02

to

Witless <wit...@attbi.com> writes:

|> > In the meantime, the "natural" size of an int has grown to a
|> > 32-bit DWORD on most machines,

|> No it hasn't. Most machines do not have DWORDs. 32-bit machines
|> often have words and half words.

IBM 360's (the prototypical 32 bit machine) certainly have DWORDS. A
DWORD is a 8 byte quantity, often initialized with 16 BCD digits. (The
IBM 360 had machine instructions for all four operations on such
quantities, as well as instructions for 4 bit left and right shifts over
DWORDs. Very useful for Cobol, or other languages that used decimal
arithmetic. We once converted the BCD arithmetic routines in a Basic
interpreter from C to assembler -- something like 150 lignes of C became
10 lignes of assembler, and ran four or five magnitudes faster.)

|> 4-bit machines that became 8-bit machines that became 16-bit
|> machines that became 32-bit machines have DWORDs. Nobody else has
|> anything half as silly.

That's because nobody else has been around half as long:-)? Seriously,
historical reasons lead to all kinds of silliness, where the normal
registers are called extended, and the non-extended registers need a
special instruction prefix to access them.

In the mean time, there are 64 bit machines out there where int is only
32 bits, and you need long to get 64 bits. That sounds pretty silly,
too, until you realize that the vendors have a lot of customers who were
stupid enough to write code which depended on int being exactly 32 bits.
And making your customer feel like an idiot has never been a
particularly successful commercial policy, even if it is sometimes the
truth.

In the good old days (pre-360), of course, no one worried about
compatibility, so a WORD in IBM's assembler could change from one
machine to the next. We didn't get such silliness. But we did have to
rewrite all of our code every time we upgraded the processor.

|> > whereas 64-bit int's are becoming more and more common.

|> 64-bit registers are becoming more common. Many people who believe
|> in DWORDs object to 64-bit ints because their religion says that
|> ints are 32 bits.

|> > But what does this mean for char?? I was always under the
|> > assumption that sizeof(char) is ALWAYS guaranteed to be exactly 1
|> > byte,

|> Your assumption is invalid.

I think you misread something. He said that his assumtion was that
sizeof(char) is guaranteed to be exactly one byte. Which is exactly
what the standard says.

|> > especially since there is no C++ "byte" type. As we now have the
|> > wchar_t as an intrinsic data type, wouldn't this cement the fact
|> > that char is always 1 byte?

|> No.

Yes. ISO 14882, 5.3.3 and ISO 9899 6.5.3.4.

|> The type char can be 16 bits like Unicode or even 32 bits like the
|> ISO character sets.

The type char can be 16 bits, or 32 bits. In the past, it has often
been 9 bits, and I think that there have also been 10 bit
implementations.

But the size of char in bytes is always 1.

|> > What does the ANSI standard have to say about this?

|> Have you read it?

Have you?

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

---

Carl Daniel

unread,

Apr 14, 2002, 11:30:42 AM4/14/02

to

"Bob Hairgrove" <rhairgro...@Pleasebigfoot.com> wrote in message
news:3cb820f5...@news.ch.kpnqwest.net...

> But what does this mean for char?? I was always under the assumption
> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
> especially since there is no C++ "byte" type. As we now have the
> wchar_t as an intrinsic data type, wouldn't this cement the fact that
> char is always 1 byte?

sizeof(char) is guaranteed to be 1. 1 what though? 1 memory allocation
unit. All other types must have sizes which are multiples of sizeof(char).
The standard makes no claim that 1 memory allocation unit == 1 byte. On a
system with a 16-bit "natural character", sizeof(char) and sizeof(wchar_t)
might both be 1, and sizeof(int), though it's 32 bits, would be 2 not 4.

Further, there's no guarantee that you have any access to the smallest
addressable unit of storage, only to storage which is allocated in multiples
of char. For example, on an 8051, the smallest addressable unit is 1 bit,
but char is still 8 bits on 8051 C compilers - those addressable bits are
simply outside the C/C++ memory model on such a system (of course, and 8051
compiler will provide a way to access them, but it will do so by an
extension - nothing in the standard makes it possible).

HTH

-cd

Gabriel Dos Reis

unread,

Apr 14, 2002, 11:32:07 AM4/14/02

to

"James Kuyper Jr." <kuy...@wizard.net> writes:

[...]

| Bjarne's statement is technically incorrect, but true to the history of
| C, when he identifies "char" more closely with "character" than with

^^^^^^^^^^
| "byte".

Firstly, note that B. Stroustrup didn't *identify* "char" with
"character"; rather, I quote (from the original poster):

"A char variable is of the natural size to hold a character on a given
machine (typically a byte)

Secondly, it has been the tradition that 'char', in C++, is the
natural type for holding characters, as examplified by the standard
type std::string and the standard narrow streams.

--
Gabriel Dos Reis, dos...@cmla.ens-cachan.fr

Bob Hairgrove

unread,

Apr 14, 2002, 11:35:26 AM4/14/02

to

On Sun, 14 Apr 2002 07:01:54 GMT, Witless <wit...@attbi.com> wrote:

>> But what does this mean for char?? I was always under the assumption
>> that sizeof(char) is ALWAYS guaranteed to be exactly 1 byte,
>
>Your assumption is invalid.
>

Check out Mike Wahler's response ... seems that the standard does
guarantee this (although a byte doesn't have to be 8 bits). That is,
the guarantee seems to be that sizeof(char)==1 under all
circumstances.

>> What does the ANSI standard have to say about this?
>
>Have you read it?

Hmm ... I thought it was more expensive than it is ... Now that I have
gone to www.ansi.org, I was delighted to discover that it is only $18.
I'm sure this will be well worth buying.

Bob Hairgrove
rhairgro...@Pleasebigfoot.com

Martin v. Löwis

unread,

Apr 14, 2002, 2:33:09 PM4/14/02

to

James Kanze <ka...@gabi-soft.de> writes:

> Excuse me, but on 32 bit machines (at least the ones I've seen), DWORD
> is 64 bits.

I guess you have not seen Microsoft Windows, then. Just try

#include <windows.h>
#include <stdio.h>

int main()
{
printf("%d\n", sizeof(DWORD));
}

in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.

Regards,
Martin

Chris Wolfe

unread,

Apr 14, 2002, 4:04:13 PM4/14/02

to

"Martin v. Löwis" wrote:
>
> James Kanze <ka...@gabi-soft.de> writes:
>
> > Excuse me, but on 32 bit machines (at least the ones I've seen), DWORD
> > is 64 bits.
>
> I guess you have not seen Microsoft Windows, then. Just try
>
> #include <windows.h>
> #include <stdio.h>
>
> int main()
> {
> printf("%d\n", sizeof(DWORD));
> }
>
> in MSVC++ 6 or so. It prints 4, and it uses 8-bit bytes.
>
> Regards,
> Martin

AFAIK that is for backwards compatibility with 16-bit DOS and Windows
3.x. A double word at the assembler level is still 64 bits.

And as we're well off-topic at this point...

Cheers,
Chris

James Kuyper Jr.

unread,

Apr 14, 2002, 4:03:56 PM4/14/02

to

Carl Daniel wrote:
....

> sizeof(char) is guaranteed to be 1. 1 what though? 1 memory allocation
> unit. All other types must have sizes which are multiples of sizeof(char).
> The standard makes no claim that 1 memory allocation unit == 1 byte. On a

Section 5.3.3: "The sizeof operator yields the number of bytes in the
object representation of its operand."

> system with a 16-bit "natural character", sizeof(char) and sizeof(wchar_t)
> might both be 1, and sizeof(int), though it's 32 bits, would be 2 not 4.

Correct. For instance, that means that on such a system, 'int' is two
16-bit bytes long.

Carl Daniel

unread,

Apr 14, 2002, 5:02:07 PM4/14/02

to

"James Kuyper Jr." <kuy...@wizard.net> wrote in message
news:3CB9CE84...@wizard.net...

> Carl Daniel wrote:
> ....
> > sizeof(char) is guaranteed to be 1. 1 what though? 1 memory allocation
> > unit. All other types must have sizes which are multiples of
sizeof(char).
> > The standard makes no claim that 1 memory allocation unit == 1 byte. On
a
>
> Section 5.3.3: "The sizeof operator yields the number of bytes in the
> object representation of its operand."

I hadn't looked at that section before this morning. I'm surprised they
worded it that way, since it's patently false given the most common meaning
of 'byte' (8 bits). It would have helped if the standard actually defined
the word byte, or simply not used it at all. As is, the section is
confusing at best.

And yes, I realize that in the past 'byte' was used more flexibly, with
'bytes' being 6, 7, 8, 9, 10, 12, and even 15 bits on various systems.
Surely today, and as surely in 1998, most readers think "8 bits" when they
see the word "byte".

-cd