Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why didn't c++ standardize the sizeof primatives?

1 view
Skip to first unread message

Richard Powell

unread,
Apr 7, 2007, 1:22:23 PM4/7/07
to
It seems to me that a large problem with porting code across multiple
systems is having to worry about what sizes the primitives are.
Almost all code I've seen has some variant on mytypes.h which defines
types for the rest of the system (something like BYTE, WORD, DWORD,
etc).

I'm curious when c++ was being standardize the committees did not
standardize the size of primitives. I can understand the desire to
want to stay in stride with c, so char, int, and long all follow the
same rules as c, but why not a new set of primitives char8, int16,
long32, where the size and limits are guaranteed across platforms?

(Unless maybe there is some #include <stdtypes> that I don't know
about...>


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Daniel K. O.

unread,
Apr 7, 2007, 6:19:59 PM4/7/07
to
Richard Powell escreveu:

> (Unless maybe there is some #include <stdtypes> that I don't know
> about...>

See <stdint.h> from C99.

---
Daniel K. O.

Carl Barron

unread,
Apr 7, 2007, 6:20:31 PM4/7/07
to
Richard Powell <rmpow...@gmail.com> wrote:

> It seems to me that a large problem with porting code across multiple
> systems is having to worry about what sizes the primitives are.
> Almost all code I've seen has some variant on mytypes.h which defines
> types for the rest of the system (something like BYTE, WORD, DWORD,
> etc).
>
> I'm curious when c++ was being standardize the committees did not
> standardize the size of primitives. I can understand the desire to
> want to stay in stride with c, so char, int, and long all follow the
> same rules as c, but why not a new set of primitives char8, int16,
> long32, where the size and limits are guaranteed across platforms?
>
> (Unless maybe there is some #include <stdtypes> that I don't know
> about...>

C++98 does not provide any.
TR1 and the current draft provide for cstdint which is essentially
C99's stdint.h. It provides standard typedefs for ints of common sizes
and ints that are at least a common size. Same for unsinged ints. see
C99 standard for details, possibly n2134.pdf but can't check on this
machine.
examples:
int16_t a 16 bit word
int_least16_t integral type of smallest size holding at least 16
bits.

Jack Klein

unread,
Apr 7, 2007, 6:21:23 PM4/7/07
to
On Sat, 7 Apr 2007 11:22:23 CST, "Richard Powell"
<rmpow...@gmail.com> wrote in comp.lang.c++.moderated:

> It seems to me that a large problem with porting code across multiple
> systems is having to worry about what sizes the primitives are.
> Almost all code I've seen has some variant on mytypes.h which defines
> types for the rest of the system (something like BYTE, WORD, DWORD,
> etc).
>
> I'm curious when c++ was being standardize the committees did not
> standardize the size of primitives. I can understand the desire to
> want to stay in stride with c, so char, int, and long all follow the
> same rules as c, but why not a new set of primitives char8, int16,
> long32, where the size and limits are guaranteed across platforms?
>
> (Unless maybe there is some #include <stdtypes> that I don't know
> about...>

Compatibility with C was one of the important reasons.

Performance was another, and perhaps even more important reason, as it
is in C.

C++ adds complexity and features to its inherited subset of C, but
still remains true to the philosophy of C: you don't pay for what you
don't use.

Mandating 32-bit ints, for example, as Java has, would have a moors
performance impact on 16-bit platforms, especially now as C++ is
beginning to make inroads into such areas in embedded systems.

And what would you have the standard require for "char8" on platforms
where bytes have more than 8 bits? There are such platforms, you
know, most of them Digital Signal Processors, and there are C++
compilers for at least some of them.

The 1999 major update of the C standard, commonly referred to as C99,
did do something about this, namely the file <stdint.h>, which will
become a part of the next revision of the C++ standard. The first two
hits when you google stdint.h are opengroup.org and Wikipedia, with
good explanations.

Note, however, that the exact width integer types (intx_t and uintx_t)
for 8, 16, 32, and 64 bits are optional in the sense that an
implementation is only required to provide them if it actually has the
corresponding underlying hardware types.

I wrote a packet formatting/parsing routine for a communications
interface a few years ago, using uint_least8_t and a little care. The
same (C) code compiled and executed properly on both sides of the
interface, the 32-bit host end where characters have 8 bits, and the
DSP slave end where characters have 16 bits.

It is not too hard to write a usable <stdint.h> header for any
conforming C or C++ compiler, at least not for the subset of types the
implementation supports. There are quite a few that do not support
the C99 long long int type which has at least 64 bits.

Also, from the Google links, it is quite possible you will find a
useable <stdint.h> already written by somebody for you particular
compiler.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html

Pete Becker

unread,
Apr 7, 2007, 6:24:43 PM4/7/07
to
Richard Powell wrote:
>
> (Unless maybe there is some #include <stdtypes> that I don't know
> about...>
>

There's <cstdint>, available if you've got an implementation of TR1, and
soon to be available as part of C++0x. There's also a chance that your
compiler has <stdint.h>, which came into C with C99.


--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

Ron Natalie

unread,
Apr 8, 2007, 2:46:59 PM4/8/07
to
Richard Powell wrote:
> It seems to me that a large problem with porting code across multiple
> systems is having to worry about what sizes the primitives are.
> Almost all code I've seen has some variant on mytypes.h which defines
> types for the rest of the system (something like BYTE, WORD, DWORD,
> etc).
Which demonstrates it's commplete and utter stupidity. Microsnots
WORD type is only a WORD on a two generations earlier processor.
DWORD is really the word size on the next generation. Then
on the 64-bit processors we get abominations like DWORD_PTR which
is neither a DWORD nor a PTR of any sort.

C++ has a byte type, it's called char. Although C++ defines
bytes a little differently than many other people do. A byte
is the minimum unit of addressable storage. Unfortunately
C++ is defective in it assumes that's also the size of a character
in the native character set.

Jeff Koftinoff

unread,
Apr 8, 2007, 5:32:38 PM4/8/07
to
On Apr 8, 11:46 am, Ron Natalie <r...@spamcop.net> wrote:
>
> C++ has a byte type, it's called char. Although C++ defines
> bytes a little differently than many other people do. A byte
> is the minimum unit of addressable storage. Unfortunately
> C++ is defective in it assumes that's also the size of a character
> in the native character set.
>

But the char type is not necessarily 8 bits. It is very enlightening
to port a bitwise compression algorithm from a platform using 8 bit
chars to a platform that uses 32 bit chars!

--jeffk++

Ron Natalie

unread,
Apr 9, 2007, 9:39:19 AM4/9/07
to
Jeff Koftinoff wrote:
> On Apr 8, 11:46 am, Ron Natalie <r...@spamcop.net> wrote:
>> C++ has a byte type, it's called char. Although C++ defines
>> bytes a little differently than many other people do. A byte
>> is the minimum unit of addressable storage. Unfortunately
>> C++ is defective in it assumes that's also the size of a character
>> in the native character set.
>>
>
> But the char type is not necessarily 8 bits. It is very enlightening
> to port a bitwise compression algorithm from a platform using 8 bit
> chars to a platform that uses 32 bit chars!
>
I never said char is necessarily 8 bits. I said it is expected
to be the minimum addressable storage AND the native character set
size. If your native character set size is 16 bits (or even 32)
but your minimal addressable storage is 8, you've got a ungodly
mess in C++. You can't even use wchar_t effectively because
a whole slew of functions have no wide overloads.

Thomas Richter

unread,
Apr 9, 2007, 4:57:07 PM4/9/07
to
Ron Natalie wrote:
> Richard Powell wrote:
>> It seems to me that a large problem with porting code across multiple
>> systems is having to worry about what sizes the primitives are.
>> Almost all code I've seen has some variant on mytypes.h which defines
>> types for the rest of the system (something like BYTE, WORD, DWORD,
>> etc).

> Which demonstrates it's commplete and utter stupidity. Microsnots
> WORD type is only a WORD on a two generations earlier processor.
> DWORD is really the word size on the next generation. Then
> on the 64-bit processors we get abominations like DWORD_PTR which
> is neither a DWORD nor a PTR of any sort.

What do you want to say by that? I don't understand it. A "WORD" is
whatever people like to define as a WORD, and if that is a 16 bit signed
integer, where is the problem? It might be no longer the size of the CPU
data bus, or the native datatype for bus transmissions (that would be
something like 128 or 256 bit nowadays) or the size of the registers of
the CPU (which would be 32 or 64 bits), but the definition of 16 bit =
WORD just remained.

> C++ has a byte type, it's called char.

Huh? What is a "byte" for you? In my understanding, a byte is an octet
of bits by convention, but char need not to be 8 bits wide.


> Although C++ defines
> bytes a little differently than many other people do. A byte
> is the minimum unit of addressable storage.

Does C++ really define what a "byte" is? Not that I'm aware of at least,
it defines what a "char" is.

> Unfortunately
> C++ is defective in it assumes that's also the size of a character
> in the native character set.

Is this the case? I think it just defines what a "C string" is, namely
an array of char. I don't think anything's defined whether there is any
kind of "native" character set. Probably that's then "wide char", and
printf(), and outstream have to convert to this external native
character set. An array of char is one possible representation for a
string, an array of wide characters (another native type) is another.

So long,
Thomas

Ron Natalie

unread,
Apr 10, 2007, 9:53:51 AM4/10/07
to
Thomas Richter wrote:

>
> What do you want to say by that? I don't understand it. A "WORD" is
> whatever people like to define as a WORD, and if that is a 16 bit signed
> integer, where is the problem?

Well except for the Microsnot/Intel view of the world, a WORD is the
default operand size of most of the processor instructions. On the
32-bit processors, a word is 32-bits.

>> C++ has a byte type, it's called char.
>
> Huh? What is a "byte" for you? In my understanding, a byte is an octet
> of bits by convention, but char need not to be 8 bits wide.
>

In the context of this group, a byte is how the C++ standard defines
it. It is defined to be the minimum addressable storage. A byte
and a char are synonymous in C++. A byte is not necessarily an
OCTET (either in C++ or in any process or practice).


> Does C++ really define what a "byte" is? Not that I'm aware of at least,
> it defines what a "char" is.
>

Yes it does, and it defines it just how I've explained it.

>> Unfortunately
>> C++ is defective in it assumes that's also the size of a character
>> in the native character set.
>
> Is this the case?

Yes it is the case. Why don't you read come C++ reference or the
standard rather than just coming here and saying "You're wrong?" to
every single point I've made.

> I think it just defines what a "C string" is, namely
> an array of char.

It doesn't define "C string" at all. It defines null-terminated
char arrays (and wchar arrays) as well as the entire basic_string
class and it's variants. However, all that is immaterial to what
I am talking about.

> I don't think anything's defined whether there is any
> kind of "native" character set.

You need to read the standard. There is assumed to be native character
sets for both the source code itself and for the representation of chars
at runtime. How on earth could any of the string handling or I/O
functions be written otherwise?

> Probably that's then "wide char", and
> printf(), and outstream have to convert to this external native
> character set. An array of char is one possible representation for a
> string, an array of wide characters (another native type) is another.
>

I have no clue what the above gibberish is trying to say. Printf has
nothing to do with wide chars (it specifically is narrow chars).

Lets go over this again:

A char is a byte type in C++ which is the minimum addressable storage
unit (lets say for example, your narrow view of the world is true and
it is 8 bit units).

A char is also defined by the standard to be the basic execution
character set character. Lets say we're using a 16 bit type.

Now we have to make a decision. We have to break one of the above
rules. It's pretty hard to get around the first rule, so we punt
and violate #2. That's almost OK, because C++ defines a type
wchar_t that is an "alternate wider char" implementaiton. So we
use that and all the related things that use wide chars.

The problem is that things like program arguments, file names,
and a few other things don't have any wchar_t based interfaces
to for them. Therefore, we now have to invent an arbitrary
conversion that maps 16 bit constant size characters to 8 bit
multibyte characters (it's not sufficient to map two 1 16 bit
thing into two-8bit things because we must insure that there
are no 8-bit 0-bytes that would be inappropriately treated
as terminators for strings when in fact the other half of the
16 bit character is not null).

The latter kludge is what many systems do.
It still sucks. C++ should have really divorced the concept
of CHARACTER and BYTE a long time ago, but it's probably too
late now. At least they could clean up the kludge and
define wchar_t versions of all the interfaces to make
things a bit more normal.

Martin Bonner

unread,
Apr 10, 2007, 1:16:41 PM4/10/07
to
On Apr 10, 2:53 pm, Ron Natalie <r...@spamcop.net> wrote:
> Thomas Richter wrote:
>
> > What do you want to say by that? I don't understand it. A "WORD" is
> > whatever people like to define as a WORD, and if that is a 16 bit signed
> > integer, where is the problem?
>
> Well except for the Microsnot/Intel view of the world, a WORD is the
> default operand size of most of the processor instructions. On the
> 32-bit processors, a word is 32-bits.

You had better add DEC to the list of the misguided. The VAX
instruction set used "word" to mean 16 bits when the word size was 32
bits. I would argue that "word" now has two meanings in computing -
16 bits, or "the natural word size".


>
> >> C++ has a byte type, it's called char.
>
> > Huh? What is a "byte" for you? In my understanding, a byte is an octet
> > of bits by convention, but char need not to be 8 bits wide.
>
> In the context of this group, a byte is how the C++ standard defines
> it. It is defined to be the minimum addressable storage. A byte
> and a char are synonymous in C++. A byte is not necessarily an
> OCTET (either in C++ or in any process or practice).
>
> > Does C++ really define what a "byte" is? Not that I'm aware of at least,
> > it defines what a "char" is.
>
> Yes it does, and it defines it just how I've explained it.

Specifically, (if I recall correctly), the standard refers to sizeof
returning the size in "bytes", and then goes on to say these are
actually chars.


>
> >> Unfortunately
> >> C++ is defective in it assumes that's also the size of a character
> >> in the native character set.
>
> > Is this the case?
>
> Yes it is the case. Why don't you read come C++ reference or the
> standard rather than just coming here and saying "You're wrong?" to
> every single point I've made.

I think you might have pointed out to Thomas that "native character
set" is a term from the standard.


>
> > I think it just defines what a "C string" is, namely
> > an array of char.
>
> It doesn't define "C string" at all. It defines null-terminated
> char arrays (and wchar arrays) as well as the entire basic_string
> class and it's variants. However, all that is immaterial to what
> I am talking about.
>
> > I don't think anything's defined whether there is any
> > kind of "native" character set.

It does. It defines what it means by "native character set".


>
> You need to read the standard. There is assumed to be native character
> sets for both the source code itself and for the representation of chars
> at runtime. How on earth could any of the string handling or I/O
> functions be written otherwise?
>
> > Probably that's then "wide char", and
> > printf(), and outstream have to convert to this external native
> > character set. An array of char is one possible representation for a
> > string, an array of wide characters (another native type) is another.
>
> I have no clue what the above gibberish is trying to say. Printf has
> nothing to do with wide chars (it specifically is narrow chars).
>
> Lets go over this again:

[snip a discussion of the meaning of "char" in C++.]
Note that exactly the same discussion applies to C.


> The latter kludge is what many systems do.
> It still sucks. C++ should have really divorced the concept
> of CHARACTER and BYTE a long time ago, but it's probably too
> late now.

I disagree with the last sentence. It's *C* that should have divorced
"byte" and "char". I believe that Douglas Gwyn made such a proposal,
for the original C standard, but it was opposed. (Presumably because
it would have involved adding a new keyword, and they hadn't worked
out the "make ugly name a keyword, and then define a header to provide
the nice name" approach at that point. I think once C decided to have
sizeof(char) = 1, C++ was stuck with it forever (and I don't think it
can be fixed now).


> At least they could clean up the kludge and
> define wchar_t versions of all the interfaces to make
> things a bit more normal.

That would help

Ron Natalie

unread,
Apr 10, 2007, 6:10:33 PM4/10/07
to
Martin Bonner wrote:

> You had better add DEC to the list of the misguided. The VAX
> instruction set used "word" to mean 16 bits when the word size was 32
> bits. I would argue that "word" now has two meanings in computing -
> 16 bits, or "the natural word size".

For the same stupid reason as the Intel. The VAX was proposed as an
"extension" to the 16 bit architecture (well it really was a new
processor that had an 16-bit emulation). They didn't want to redefine
the term WORD as they should have so they invented the "long word.

> I disagree with the last sentence. It's *C* that should have divorced
> "byte" and "char". I believe that Douglas Gwyn made such a proposal,
> for the original C standard, but it was opposed.

Doug Gwyn is a sharp cookie :-)

Yes, it should have been fixed in C, along with making arrays full
fledged types (back in the day when they fixed structs to work
properly... 1979 or so).

>> At least they could clean up the kludge and
>> define wchar_t versions of all the interfaces to make
>> things a bit more normal.
>
> That would help
>

Unfortunately, I was never able to get any traction in the standards
committee for such a change. They assume that there exists one and
only one MBTOWCS conversion (and vice versa).

Jerry Coffin

unread,
Apr 10, 2007, 10:34:00 PM4/10/07
to
In article <1175961874.7...@e65g2000hsc.googlegroups.com>,
rmpow...@gmail.com says...

> It seems to me that a large problem with porting code across multiple
> systems is having to worry about what sizes the primitives are.

IME, that's not really true much of the time.

> Almost all code I've seen has some variant on mytypes.h which defines
> types for the rest of the system (something like BYTE, WORD, DWORD,
> etc).

While this is frequently _used_, it's rarely _needed_, at least IME.

> I'm curious when c++ was being standardize the committees did not
> standardize the size of primitives.

Because doing so would run directly contrary to some of the basic ideas
of C++ (and C). In particular, C++ is intended to support system-level
programming.

For example, assume somebody (me, perhaps) designs a 48-bit processor
that sets arbitrary bits in a 48-bit word for the MMU, or something on
that order.

A language that decrees that there can be no such thing as a 48-bit data
type makes that sort of thing rather more difficult at best.

> I can understand the desire to
> want to stay in stride with c, so char, int, and long all follow the
> same rules as c, but why not a new set of primitives char8, int16,
> long32, where the size and limits are guaranteed across platforms?

On the (relatively rare) occasion that these are really needed, the
technique currently in use is pretty easy to deal with as a rule. The
C99 standard has also added a header much like you describe, and it's
been adopted into C++ TR1, so there's a high likelihood that it'll also
be present in the next C++ standard.

IMO, this is really a mistake. Though the C committee went to almost
heroic lengths to avoid it (not just adding types for exact sizes, but
also things like the fastest type of at least a given size) but I think
more often than not will be code that becomes unnecessarily dependent on
specific sizes rather that transparantely adapting to the best sizes for
the target machine. I'll openly admit there are a _few_ times you really
need to deal with exact sizes -- but I'd say at least 90% of the code
I've seen that includes typedefs like you've mentioned did NOT need
anything of the sort, and would have been far better off without them.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Pete Becker

unread,
Apr 11, 2007, 10:24:32 AM4/11/07
to
Jerry Coffin wrote:
>
> On the (relatively rare) occasion that these are really needed, the
> technique currently in use is pretty easy to deal with as a rule. The
> C99 standard has also added a header much like you describe, and it's
> been adopted into C++ TR1, so there's a high likelihood that it'll also
> be present in the next C++ standard.
>

Higher, actually. <g> It's also been added to the working draft for C++0x.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

0 new messages