Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Type sizes (Was:big-endian vs. little-endian...)

0 views
Skip to first unread message

Clark S. Cox III

unread,
Aug 14, 2001, 11:28:47 AM8/14/01
to
Greg Weston <gwesto...@CAPShome.com> wrote:
>
> Clark S. Cox III <clar...@yahoo.com> wrote:
> >
> > Jens Ayton <jAyton_...@nettaxi.com> wrote:
> > >
> > > sizeof(short) and sizeof(long) are also implementation dependant. What
> > > is guaranteed is:
> > >
> > > sizeof(char) == 1 // it's the unit
> > > sizeof(char) <= sizeof(short)
> > > sizeof(short) <= sizeof(int)
> > > sizeof(int) <= sizeof(long)
> > > sizeof(long) <= sizeof(long long) // (in C99)
> >
> > Also, char must be at least 8-bits, short must at least 16 bits, and
> > long long must be at least 64-bits.
>
> My references disagree. They say char is a single byte (without
> requiring byte to be any specific size...you could have a valid C
> compiler on those weird old machines that had 5- or 7-bit bytes)

That was before C was standardized. Back then, there was no real way
to test if a compiler were valid or not.

> and the only comment on the other types is that given by Jens. If you can
> cite documentation for the size requirements you gave, I'd happily update
> my reference library.

The bit requirements aren't specifically stated, however, a char
must be able to hold +-127. In order to do that, the char must be at
least 8-bits. The same goes for the others. A short must be able to hold
at least +-32767, and a long long must be able to hold at least
+-(2**63).
I don't have my copy of the standard with me, however, follow-ups
are set to comp.lang.c, someone there will surely either back me up, or
correct me if I am wrong.

--
Clark S. Cox III
clar...@yahoo.com
http://www.whereismyhead.com/clark/

Micah Cowan

unread,
Aug 14, 2001, 1:37:01 PM8/14/01
to
clar...@yahoo.com (Clark S. Cox III) writes:

> The bit requirements aren't specifically stated, however, a char
> must be able to hold +-127.
> In order to do that, the char must be at
> least 8-bits. The same goes for the others. A short must be able to hold
> at least +-32767, and a long long must be able to hold at least
> +-(2**63).

Actually, they are stated, for char. (CHAR_MAX).

> I don't have my copy of the standard with me, however, follow-ups
> are set to comp.lang.c, someone there will surely either back me up, or
> correct me if I am wrong.

Sure. Here's C89 2.2.4.2 (my draft copy):

The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Their
implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.

* maximum number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

* minimum value for an object of type signed char
SCHAR_MIN -127

* maximum value for an object of type signed char
SCHAR_MAX +127

* maximum value for an object of type unsigned char
UCHAR_MAX 255

* minimum value for an object of type char
CHAR_MIN see below

* maximum value for an object of type char
CHAR_MAX see below

* maximum number of bytes in a multibyte character, for any supported locale
MB_LEN_MAX 1

* minimum value for an object of type short int
SHRT_MIN -32767

* maximum value for an object of type short int
SHRT_MAX +32767

(etc.)

--
"But please remember that the sum of all skills totals up to
some constant in the end. So whenever someone shows all signs of
being a exceptionally skilled C-programmer, chances are that his
social skills are below average ..." --willem veenhoven

Greg Weston

unread,
Aug 14, 2001, 9:37:56 PM8/14/01
to
In article <1ey4n7b.18fi98dmqaq8iN%clar...@yahoo.com>, Clark S. Cox
III <clar...@yahoo.com> wrote:

> > > Also, char must be at least 8-bits, short must at least 16 bits, and
> > > long long must be at least 64-bits.
> >
> > My references disagree. They say char is a single byte (without
> > requiring byte to be any specific size...you could have a valid C
> > compiler on those weird old machines that had 5- or 7-bit bytes)
>
> That was before C was standardized. Back then, there was no real way
> to test if a compiler were valid or not.

But you could still _have_ a C90 toolchain on such a machine.

> > and the only comment on the other types is that given by Jens. If you can
> > cite documentation for the size requirements you gave, I'd happily update
> > my reference library.
>
> The bit requirements aren't specifically stated, however, a char
> must be able to hold +-127. In order to do that, the char must be at
> least 8-bits. The same goes for the others. A short must be able to hold
> at least +-32767, and a long long must be able to hold at least
> +-(2**63).
> I don't have my copy of the standard with me, however, follow-ups
> are set to comp.lang.c, someone there will surely either back me up, or
> correct me if I am wrong.

I'd like to see either. I'm not familiar with the existence of the
requirements you've posed. More particularly, I can assert that "char"
does not have to be able to hold [-128,127]. Presuming you've got an
8-bit char, "signed char" would have that requirement, but unmodified
"char" could be either that range or [0,255]. There's an amusing little
clause that says printables must be positive, so if you're using an
environment for which isprint(128) returns non-zero, char should be
equivalent to unsigned char. I don't have a copy of the spec in this
room, but I do have K&R nearby. [flipflipflip]

Okay, here's what I've got from Section 2.2:
1. short and int must be at least 16 bits.
2. long must be at least 32 bits.
3. sizeof(short) <= sizeof(int) <= sizeof(long).
4. sizeof(char) == 1, but the number of bits is explicitly not required.

G

Clark S. Cox III

unread,
Aug 14, 2001, 9:49:24 PM8/14/01
to
Greg Weston <gwesto...@CAPShome.com> wrote:

> In article <1ey4n7b.18fi98dmqaq8iN%clar...@yahoo.com>, Clark S. Cox
> III <clar...@yahoo.com> wrote:
>
> > > > Also, char must be at least 8-bits, short must at least 16 bits, and
> > > > long long must be at least 64-bits.
> > >
> > > My references disagree. They say char is a single byte (without
> > > requiring byte to be any specific size...you could have a valid C
> > > compiler on those weird old machines that had 5- or 7-bit bytes)
> >
> > That was before C was standardized. Back then, there was no real way
> > to test if a compiler were valid or not.
>
> But you could still _have_ a C90 toolchain on such a machine.

Not with a 5- or 7-bit char.

>
> > > and the only comment on the other types is that given by Jens. If you can
> > > cite documentation for the size requirements you gave, I'd happily update
> > > my reference library.
> >
> > The bit requirements aren't specifically stated, however, a char
> > must be able to hold +-127. In order to do that, the char must be at
> > least 8-bits. The same goes for the others. A short must be able to hold
> > at least +-32767, and a long long must be able to hold at least
> > +-(2**63).
> > I don't have my copy of the standard with me, however, follow-ups
> > are set to comp.lang.c, someone there will surely either back me up, or
> > correct me if I am wrong.
>
> I'd like to see either. I'm not familiar with the existence of the
> requirements you've posed. More particularly, I can assert that "char"
> does not have to be able to hold [-128,127]. Presuming you've got an
> 8-bit char, "signed char" would have that requirement, but unmodified
> "char" could be either that range or [0,255].

OK, let me revise myself: sined char must be able to hold at least
+-127, and unsigned char must hold at least [0,255]

> There's an amusing little
> clause that says printables must be positive, so if you're using an
> environment for which isprint(128) returns non-zero, char should be
> equivalent to unsigned char. I don't have a copy of the spec in this
> room, but I do have K&R nearby. [flipflipflip]

As posted by Micah Cowan <mi...@cowanbox.com>:

C89 2.2.4.2 (draft copy):

The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Their
implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.

* maximum number of bits for smallest object that is not a bit-field
(byte)
CHAR_BIT 8

* minimum value for an object of type signed char
SCHAR_MIN -127

* maximum value for an object of type signed char
SCHAR_MAX +127

* maximum value for an object of type unsigned char
UCHAR_MAX 255

* minimum value for an object of type char
CHAR_MIN see below

* maximum value for an object of type char
CHAR_MAX see below

* maximum number of bytes in a multibyte character, for any supported
locale
MB_LEN_MAX 1

* minimum value for an object of type short int
SHRT_MIN -32767

* maximum value for an object of type short int
SHRT_MAX +32767


--

Clark S. Cox III
clar...@yahoo.com

http://www.whereismyhead.com/clark/

Kaz Kylheku

unread,
Aug 14, 2001, 10:22:49 PM8/14/01
to
In article <1ey5gho.1kt9uk61xmklqgN%clar...@yahoo.com>, Clark S. Cox
III wrote:

>Greg Weston <gwesto...@CAPShome.com> wrote:
>> But you could still _have_ a C90 toolchain on such a machine.
>
> Not with a 5- or 7-bit char.

Sure you can. Pair them up and you have 10 bit bytes or 14 bit bytes.
There is no rule which says that the byte has to be the smallest addressable
unit of storage addressable by the hardware, only the smallest addressable
unit of a C object, which is an abstraction. By the way, bitfield
manipulation could be implemented using accesses to the smaller units.

--
Articulable, clever reduction of outstretched nonsense, yet meaningful.

Clark S. Cox III

unread,
Aug 14, 2001, 10:30:38 PM8/14/01
to
Kaz Kylheku <k...@ashi.footprints.net> wrote:

> In article <1ey5gho.1kt9uk61xmklqgN%clar...@yahoo.com>, Clark S. Cox
> III wrote:
> >Greg Weston <gwesto...@CAPShome.com> wrote:
> >> But you could still _have_ a C90 toolchain on such a machine.
> >
> > Not with a 5- or 7-bit char.
>
> Sure you can. Pair them up and you have 10 bit bytes or 14 bit bytes.

That's what I'm saying. In order to have a standard
compliant-implementation on such a machine, you couldn't use the native
byte size as the size of the char. You would have to use a different
byte syze.

> There is no rule which says that the byte has to be the smallest addressable
> unit of storage addressable by the hardware, only the smallest addressable
> unit of a C object, which is an abstraction. By the way, bitfield
> manipulation could be implemented using accesses to the smaller units.


--

Kaz Kylheku

unread,
Aug 14, 2001, 11:15:00 PM8/14/01
to
In article <1ey5igs.1e9gn1ewfzamoN%clar...@yahoo.com>, Clark S. Cox III wrote:
>Kaz Kylheku <k...@ashi.footprints.net> wrote:
>
>> In article <1ey5gho.1kt9uk61xmklqgN%clar...@yahoo.com>, Clark S. Cox
>> III wrote:
>> >Greg Weston <gwesto...@CAPShome.com> wrote:
>> >> But you could still _have_ a C90 toolchain on such a machine.
>> >
>> > Not with a 5- or 7-bit char.
>>
>> Sure you can. Pair them up and you have 10 bit bytes or 14 bit bytes.
>
> That's what I'm saying. In order to have a standard
>compliant-implementation on such a machine, you couldn't use the native
>byte size as the size of the char. You would have to use a different
>byte syze.

Ah, my reading comprehension module just returned from the shop.

Martin Ambuhl

unread,
Aug 15, 2001, 12:32:29 AM8/15/01
to
"Clark S. Cox III" wrote:
>
> Greg Weston <gwesto...@CAPShome.com> wrote:
>
> > In article <1ey4n7b.18fi98dmqaq8iN%clar...@yahoo.com>, Clark S. Cox
> > III <clar...@yahoo.com> wrote:
> >
> > > > > Also, char must be at least 8-bits, short must at least 16 bits, and
> > > > > long long must be at least 64-bits.
> > > >
> > > > My references disagree. They say char is a single byte (without
> > > > requiring byte to be any specific size...you could have a valid C
> > > > compiler on those weird old machines that had 5- or 7-bit bytes)
> > >
> > > That was before C was standardized. Back then, there was no real way
> > > to test if a compiler were valid or not.
> >
> > But you could still _have_ a C90 toolchain on such a machine.
>
> Not with a 5- or 7-bit char.

No one mentioned a 5- or 7-bit char. A machine in which the addressable
unit is 5- or 7- bits, perhaps called a "byte" in discussion of the
hardware, can still have a conforming C compiler, for which the C type
"char" with a C alias of "byte" had at least 8 bits. The easiest way to
do this is to have 10- or 14-bit chars.

Clark S. Cox III

unread,
Aug 15, 2001, 12:57:01 AM8/15/01
to
Martin Ambuhl <mam...@earthlink.net> wrote:

> "Clark S. Cox III" wrote:
> >
> > Greg Weston <gwesto...@CAPShome.com> wrote:
> >
> > > Clark S. Cox III <clar...@yahoo.com> wrote:
> > >
> > > > > > Also, char must be at least 8-bits, short must at least 16
> > > > > > bits, and long long must be at least 64-bits.
> > > > >
> > > > > My references disagree. They say char is a single byte (without
> > > > > requiring byte to be any specific size...you could have a valid C
> > > > > compiler on those weird old machines that had 5- or 7-bit bytes)
> > > >
> > > > That was before C was standardized. Back then, there was no real
> > > > way to test if a compiler were valid or not.
> > >
> > > But you could still _have_ a C90 toolchain on such a machine.
> >
> > Not with a 5- or 7-bit char.
>
> No one mentioned a 5- or 7-bit char.

Yes they did. Look at the above quotes, specifically the parts "char
is a single byte ... 5- or 7-bit bytes"

> A machine in which the addressable
> unit is 5- or 7- bits, perhaps called a "byte" in discussion of the
> hardware, can still have a conforming C compiler, for which the C type
> "char" with a C alias of "byte" had at least 8 bits. The easiest way to
> do this is to have 10- or 14-bit chars.

That's exactly what I was saying. You can have a conforming C
compiler, but, as I said, "Not with a 5- or 7-bit char."

--

Clark S. Cox III
clar...@yahoo.com

http://www.whereismyhead.com/clark/

Bill Godfrey

unread,
Aug 15, 2001, 5:46:48 AM8/15/01
to
k...@ashi.footprints.net (Kaz Kylheku) writes:

> > Not with a 5- or 7-bit char.

> Sure you can. Pair them up and you have 10 bit bytes or 14 bit bytes.

Even better, have a standard compliant mode which behaves as you describe,
or have an alternative NQC mode (perhaps via a program or command line
option) which implements the lesser, but faster, char types.

Minimum integer sizes are one of those things that are neccessary,
but annoying when inconvient.

I'm surprised that ANSI didn't go with 4 bit minimum for char and 8
minimum for int, or perhaps I've been playing about with 6502 chips
for too long.

Oh yes, NQC="not quite C", the family of languages which pretty close
to C, but fail in some significant way.

Bill, NQBill.
--
"We quietly obscure around robust cosmetic cheleons."

Richard Bos

unread,
Aug 15, 2001, 5:46:24 AM8/15/01
to
Greg Weston <gwesto...@CAPShome.com> wrote:

> There's an amusing little
> clause that says printables must be positive, so if you're using an
> environment for which isprint(128) returns non-zero, char should be
> equivalent to unsigned char.

Or CHAR_BIT should be greater than 8.

Richard

Hallvard B Furuseth

unread,
Aug 15, 2001, 10:47:29 AM8/15/01
to nobody
Greg Weston <gwesto...@CAPShome.com> writes:

> I can assert that "char" does not have to be able to hold [-128,127].
> Presuming you've got an 8-bit char, "signed char" would have that

> requirement, (...)

Yup. For that matter, you don't need to presume 8-bit char. Rather,
it's because of that requirement that char must be at least 8-bit.

> There's an amusing little clause that says printables must be
> positive,

No, I don't think so.

There may be a clause that printables _in C's required character set_
(i.e. most characters in ASCII) must be positive; I don't remember.

Or if you refer to <ctype.h>, what it says is that the argument to ctype
functions shall be a an `int' in the range of `unsigned char' or EOF --
i.e. just like the return value from getchar(). Otherwise the behaviour
is undefined. Which means that the behaviour of this program is
undefined if it receives a `char' with the 8th bit = 1 on hosts where
`char' is signed and 8-bit:

#include <ctype.h>
int main(int argc, char **argv)
{
return (argc > 1 && isprint(*argv[1])) ? 0 : 1;
}

The correct way to write it is `isprint((unsigned char) *argv[1])'.

The reason for that particular insanity is that ctype.h and signed
`char' types are older than 8-bit character sets, so it didn't matter
very much at first, and they introduced EOF as a possible argument.
When the time came to handle 8-bit characters properly, it was too late
to back out the requirement that isprint(EOF) works, and it was too late
to require that no `char' == EOF (i.e. that either `char' is unsigned or
EOF < SCHAR_MIN). So they had to require character arguments to be in
the range of `unsigned char' in order to support the whole character
set, even on hosts where `char' is signed.

--
Hallvard

Micah Cowan

unread,
Aug 15, 2001, 3:01:23 PM8/15/01
to
Greg Weston <gwesto...@CAPShome.com> writes:

> In article <1ey4n7b.18fi98dmqaq8iN%clar...@yahoo.com>, Clark S. Cox
> III <clar...@yahoo.com> wrote:
>
> > > > Also, char must be at least 8-bits, short must at least 16 bits, and
> > > > long long must be at least 64-bits.
> > >
> > > My references disagree. They say char is a single byte (without
> > > requiring byte to be any specific size...you could have a valid C
> > > compiler on those weird old machines that had 5- or 7-bit bytes)
> >
> > That was before C was standardized. Back then, there was no real way
> > to test if a compiler were valid or not.
>
> But you could still _have_ a C90 toolchain on such a machine.

No, you could not.

>
> > > and the only comment on the other types is that given by Jens. If you can
> > > cite documentation for the size requirements you gave, I'd happily update
> > > my reference library.
> >
> > The bit requirements aren't specifically stated, however, a char
> > must be able to hold +-127. In order to do that, the char must be at
> > least 8-bits. The same goes for the others. A short must be able to hold
> > at least +-32767, and a long long must be able to hold at least
> > +-(2**63).
> > I don't have my copy of the standard with me, however, follow-ups
> > are set to comp.lang.c, someone there will surely either back me up, or
> > correct me if I am wrong.
>
> I'd like to see either. I'm not familiar with the existence of the
> requirements you've posed. More particularly, I can assert that "char"
> does not have to be able to hold [-128,127]. Presuming you've got an
> 8-bit char, "signed char" would have that requirement, but unmodified
> "char" could be either that range or [0,255]. There's an amusing little
> clause that says printables must be positive, so if you're using an
> environment for which isprint(128) returns non-zero, char should be
> equivalent to unsigned char. I don't have a copy of the spec in this
> room, but I do have K&R nearby. [flipflipflip]

Did you not see my post of a couple days ago? Where I specifically
quoted the Standard?

> Okay, here's what I've got from Section 2.2:
> 1. short and int must be at least 16 bits.
> 2. long must be at least 32 bits.
> 3. sizeof(short) <= sizeof(int) <= sizeof(long).
> 4. sizeof(char) == 1, but the number of bits is explicitly not required.

Well, K&R doesn't define C, and in fact I don't have a copy; but I
doubt this is the case - at least for K&R2.

The Standard (as I quoted earlier, so will not quote again)
specifically requires that SCHAR_MAX, SCHAR_MIN, UCHAR_MAX, and
CHAR_BIT must be at least 127, at most -127, at least 255, and at
least 8, respectively.

Micah

--
"Everytime you declare main() as returning void - somewhere a little
baby cries. So please, do it for the children." -- Daniel Fox

Micah Cowan

unread,
Aug 15, 2001, 3:07:55 PM8/15/01
to
Hallvard B Furuseth <h.b.fu...@usit.uio.no> writes:

> Greg Weston <gwesto...@CAPShome.com> writes:

> > There's an amusing little clause that says printables must be
> > positive,
>
> No, I don't think so.
>
> There may be a clause that printables _in C's required character set_
> (i.e. most characters in ASCII) must be positive; I don't remember.

Right. The requirement is that members of the "basic execution
character set" must be representable by a positive value in char.
[C99 6.2.5#3].

Micah

--
Computers are basically very fast idiots.

Richard Bos

unread,
Aug 16, 2001, 3:38:31 AM8/16/01
to
Micah Cowan <mi...@cowanbox.com> wrote:

> Greg Weston <gwesto...@CAPShome.com> writes:
>
> > In article <1ey4n7b.18fi98dmqaq8iN%clar...@yahoo.com>, Clark S. Cox
> > III <clar...@yahoo.com> wrote:
> >
> > > > My references disagree. They say char is a single byte (without
> > > > requiring byte to be any specific size...you could have a valid C
> > > > compiler on those weird old machines that had 5- or 7-bit bytes)
> > >
> > > That was before C was standardized. Back then, there was no real way
> > > to test if a compiler were valid or not.
> >
> > But you could still _have_ a C90 toolchain on such a machine.
>
> No, you could not.

Yes, you could. You just couldn't use the hardware bytes as C's bytes.

Richard

Clark S. Cox III

unread,
Aug 29, 2001, 5:15:14 PM8/29/01
to
John G. Otto <jo...@nisus.com> wrote:

> > Martin Ambuhl wrote:
> >> "Clark S. Cox III" wrote:

> >>> Greg Weston wrote:
> >>>> Clark S. Cox III wrote:

> >>>>> ????? wrote:


> >>>>>> ?????? wrote:
> >>>>>> Also, char must be at least 8-bits, short must at least 16 bits, and
> >>>>>> long long must be at least 64-bits.
>

> I thought the rule was only that long had to be at least as long as
> short, long long had to be at least as long as long.

No:
- char must be able to hold +-(2**7-1) (i.e. must >= 16-bit)
- short must be able to hold +-(2**15-1) (i.e. must >= 16-bit)
- int must be at least as large as short
- long must be able to hold +-(2**31-1) ( i.e. must >= 32-bit)
and be at least as large as int
- long long must be able to hold +-(2**63-1) ( i.e. must >= 64-bit)
and be at least as large as long

However, it is entirely possible that they could all be the same
size (>=64-bits).


--

Clark S. Cox III
clar...@yahoo.com

http://www.whereismyhead.com/clark/

Keith Thompson

unread,
Aug 29, 2001, 6:47:34 PM8/29/01
to
jo...@nisus.com (John G. Otto) writes:
[...]
> 8 bits is now referred to as an "octet" in the standards docs.

Yes.

> bytes, OTOH, could be 6 or 8 or 12 or 46... bits. I used to
> work on systems that used both 6-bit and 12-bit bytes and a 48
> bit int (int, short, or long). And here in _The C Programmer's
> HandBook_ I see that the Honeybucket 6000 used a 9 bit char and
> a 36 bit short.

In modern sloppy usage, a byte is 8 bits. In modern correct usage, a
byte can be nearly any size, though 8 bits is currently the most
common. In C (note that this is posted to comp.lang.c), a byte *must*
be at least 8 bits. (The number of bits in a byte can be determined
by looking at the value of the macro CHAR_BIT.)

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Cxiuj via bazo apartenas ni.

Kelsey Bjarnason

unread,
Aug 30, 2001, 12:39:15 AM8/30/01
to
[snips]

"Clark S. Cox III" <clar...@yahoo.com> wrote in message
news:1eywvk2.jxzxbbnjedt8N%clar...@yahoo.com...

> No:
> - char must be able to hold +-(2**7-1) (i.e. must >= 16-bit)

i.e. must be >= *8* bit.


Clark S. Cox III

unread,
Aug 30, 2001, 2:35:26 AM8/30/01
to
Kelsey Bjarnason <kel...@xx.spamkill.yy.telus.net> wrote:

Ack, copy and paste error :-)

--

Clark S. Cox III
clar...@yahoo.com

http://www.whereismyhead.com/clark/

0 new messages