short int always 16 bits or not?

Shriramana Sharma

unread,

Apr 19, 2013, 8:14:26 PM4/19/13

to

Hello. I am reading the C99 standard as available from: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

I note that it specifies (on p 34) macros defining the minimum and maximum values of a short int corresponding to a size of 16 bits. However it doesn't explicitly say that short int-s should be of 16 bits size. So can I trust short int-s to be 16 bits size or not?

Also, doesn't prescribing #define-s for integer type min/max values conflict with the general (?) understanding that the size of these types are implementation defined? I mean, is the general understanding wrong? (For instance see: http://en.wikipedia.org/wiki/Short_integer#cnote_b_grp_notesc)

Finally, why would anyone want char to be other than 8 bits? *Is* char on any platform *not* 8 bits?

Thanks.

Barry Schwarz

unread,

Apr 19, 2013, 8:40:33 PM4/19/13

to

On Fri, 19 Apr 2013 17:14:26 -0700 (PDT), Shriramana Sharma
<sam...@gmail.com> wrote:

>Hello. I am reading the C99 standard as available from: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
>I note that it specifies (on p 34) macros defining the minimum and maximum values of a short int corresponding to a size of 16 bits. However it doesn't explicitly say that short int-s should be of 16 bits size. So can I trust short int-s to be 16 bits size or not?

You need to go to page 33 and read the last sentence that introduces
the table on page 34.

>Also, doesn't prescribing #define-s for integer type min/max values conflict with the general (?) understanding that the size of these types are implementation defined? I mean, is the general understanding wrong? (For instance see: http://en.wikipedia.org/wiki/Short_integer#cnote_b_grp_notesc)

In what way? Since both the value of the type and the size of the
type are determined by the particular implementation, why would they
be inconsistent? What would be inconsistent is using the values from
one implementation on a different one.

>Finally, why would anyone want char to be other than 8 bits? *Is* char on any platform *not* 8 bits?

Because the real world is not limited to your imagination. (There are
even character encoding schemes other than ASCII.)

Yes.

--
Remove del for email

Eric Sosman

unread,

Apr 19, 2013, 9:04:28 PM4/19/13

to

On 4/19/2013 8:14 PM, Shriramana Sharma wrote:
> Hello. I am reading the C99 standard as available from: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
> I note that it specifies (on p 34) macros defining the minimum and maximum values of a short int corresponding to a size of 16 bits. However it doesn't explicitly say that short int-s should be of 16 bits size. So can I trust short int-s to be 16 bits size or not?

From the minimum and maximum values, you can deduce that
`short int' is *at least* sixteen bits wide. But it might be
eighteen bits wide, or twenty-eight, or thirty-two, or ...

> Also, doesn't prescribing #define-s for integer type min/max values conflict with the general (?) understanding that the size of these types are implementation defined? I mean, is the general understanding wrong? (For instance see: http://en.wikipedia.org/wiki/Short_integer#cnote_b_grp_notesc)

Not at all. Something that is "implementation-defined" means
that the implementation must document the definition. Macros like
SHRT_MIN and UINT_MAX are documentation of the implementation's
choices.

> Finally, why would anyone want char to be other than 8 bits? *Is* char on any platform *not* 8 bits?

If I may rephrase your question slightly, you have asked
"لما क्यों किसी को भी चार अन्य 8 बिट होना चाहेगा". In this form, the
question may well answer itself.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Shriramana Sharma

unread,

Apr 19, 2013, 10:30:22 PM4/19/13

to

Hello and thanks people for your clarifications. My mistake for not reading the introductory text correctly.

BTW despite my name, I am not really that good at Hindi (it's not my mother tongue), so (auto-?)translating to it doesn't really facilitate anything. (Thanks for trying though!)

James Kuyper

unread,

Apr 19, 2013, 11:06:13 PM4/19/13

to

On 04/19/2013 08:14 PM, Shriramana Sharma wrote:
> Hello. I am reading the C99 standard as available from:
> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

You should look for n1570.pdf, which is the latest version of the
standard. Nothing relevant to this discussion has changed, but other
things have.

> I note that it specifies (on p 34) macros defining the minimum and
> maximum values of a short int corresponding to a size of 16 bits.
> However it doesn't explicitly say that short int-s should be of 16
> bits size. So can I trust short int-s to be 16 bits size or not?

That's actually page 22. You should use the page number at the bottom of
the page, rather than the page number shown by your PDF reader. I use
Acrobat Reader, which refers to pages iii-xiv as pages 1-12; the real
page 1 is referred to as page 13 by Acrobat Reader.

It's even better to cite by section rather than by page number. That's
partly because your citation can usually be more specific, but also
because section numbers change less between different versions of the
standard. That is section 5.2.4.2.1p1 in both n1256.pdf and n1570.pdf,
even though it's on page 22 in the first, and on page 27 in the second.

To answer your question, notice that earlier in the same section it
says "Their implementation-defined values shall be equal or greater in
magnitude(absolute value) to those shown, with the same sign." Consider
the implications of "or greater". That means that an implementation is
free to define SHRT_MAX as 65535, in which case 'short' would have to
have 17 value bits, and an unspecified number of padding bits.

> Also, doesn't prescribing #define-s for integer type min/max values
> conflict with the general (?) understanding that the size of these

> types are implementation defined? ...

Since all that is prescribed is the minimum value for those #defines,
with the actual value being implementation-defined, there is no conflict.

...

> Finally, why would anyone want char to be other than 8 bits? *Is*
> char on any platform *not* 8 bits?

The single most popular alternative to 8-bit char is 16-bit char, which
is the case on many DSPs.
--
James Kuyper

Eric Sosman

unread,

Apr 19, 2013, 11:11:35 PM4/19/13

to

On 4/19/2013 10:30 PM, Shriramana Sharma wrote:
> Hello and thanks people for your clarifications. My mistake for not reading the introductory text correctly.
>
> BTW despite my name, I am not really that good at Hindi (it's not my mother tongue), so (auto-?)translating to it doesn't really facilitate anything. (Thanks for trying though!)

Your English is much better than my Hindi! ;-) If you like,
auto-translate the question from Hindi to Pinyin, or to Thai, or
to Arabic, or to Russian, or to ... and I think the question about
wanting a wider-than-8-bit `char' will answer itself.

Besides: Fashions come, and fashions go, and this is a fashion-
driven industry. In my first few years of writing programs, I used
systems with 8-bit, 6-bit, 9-bit, and (yes!) 6.644-bit characters.
Some of these were bitextual: A machine with 36-bit words holding
either six 6-bit or four 9-bit characters, another with 48-bit
words that did eight 6's or six 8's. The 8-bit character is very
common today, but ... Πάντα ῥεῖ καὶ οὐδὲν μένει, as the fellow said.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Les Cargill

unread,

Apr 20, 2013, 1:24:46 AM4/20/13

to

Shriramana Sharma wrote:
> Hello. I am reading the C99 standard as available from:
> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
> I note that it specifies (on p 34) macros defining the minimum and
> maximum values of a short int corresponding to a size of 16 bits.
> However it doesn't explicitly say that short int-s should be of 16
> bits size. So can I trust short int-s to be 16 bits size or not?
>

No, not really. You have to verify this. This being said,
it's true for a vast majority of platforms.

> Also, doesn't prescribing #define-s for integer type min/max values
> conflict with the general (?) understanding that the size of these
> types are implementation defined? I mean, is the general
> understanding wrong? (For instance see:
> http://en.wikipedia.org/wiki/Short_integer#cnote_b_grp_notesc)
>

Supposedly, said macros would be adapted to be platform specific.

> Finally, why would anyone want char to be other than 8 bits? *Is*
> char on any platform *not* 8 bits?
>

We can't say in general. It's a heck of a founding assumption
to give up, so hopefully there's a good reason.

> Thanks.
>

--
Les Cargill

Keith Thompson

unread,

Apr 20, 2013, 1:58:17 AM4/20/13

to

James Kuyper <james...@verizon.net> writes:
> On 04/19/2013 08:14 PM, Shriramana Sharma wrote:
>> Hello. I am reading the C99 standard as available from:
>> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
> You should look for n1570.pdf, which is the latest version of the
> standard. Nothing relevant to this discussion has changed, but other
> things have.

Actually it's a draft of the latest version of the standard. It turns
out there are a few (minor) differences between it and the official
released standard

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Apr 20, 2013, 1:59:24 AM4/20/13

to

Shriramana Sharma <sam...@gmail.com> writes:
> Hello. I am reading the C99 standard as available from:
> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
> I note that it specifies (on p 34) macros defining the minimum and
> maximum values of a short int corresponding to a size of 16
> bits. However it doesn't explicitly say that short int-s should be of
> 16 bits size. So can I trust short int-s to be 16 bits size or not?

I've actually worked on a system with 32-bit shorts, and another with
64-bit shorts (both were Cray supercomputers).

Lynn McGuire

unread,

Apr 20, 2013, 2:14:25 AM4/20/13

to

The first computer that I wrote code on was a Univac
1108. Chars were only six bits and ASCII. So, there
was no lower case characters (lower case is the 7th bit).

Lynn

Malcolm McLean

unread,

Apr 20, 2013, 4:49:15 AM4/20/13

to

Denis Ritchie, who designed C, made a mistake by making "char" (a variable that
holds a character in a human-readable language) and "byte" (the smallest
addressible unit of memory) the same same thing.
256 characters aren't enough for some purposes. And whilst most computers use
8 bit bytes internally, this isn't universal, particularly on big machines.

So C has a bit of a problem. The solution, which sort of works, is to allow char
to be more than 8 bits, on some platforms to solve the byte issue, and to
introduce wchar_t to solve the bigt alphabet issue.

As for redefining every basic type, this is often done by people with a limited
understanding of software engineering, who think that they are making the
program more robust by allowing the possibility of redefining the type. In
practice, it's most unlikely that this won't break things, and the introduction
of new types causes more problems than it solves, certainly it makes it hard to
integrate code from two programs.

--
Malcolm's website
http://www.malcolmmclean.site11.com/www

James Kuyper

unread,

Apr 20, 2013, 6:45:24 AM4/20/13

to

On 04/19/2013 11:11 PM, Eric Sosman wrote:
> On 4/19/2013 10:30 PM, Shriramana Sharma wrote:
>> Hello and thanks people for your clarifications. My mistake for not reading the introductory text correctly.
>>
>> BTW despite my name, I am not really that good at Hindi (it's not my mother tongue), so (auto-?)translating to it doesn't really facilitate anything. (Thanks for trying though!)
>
> Your English is much better than my Hindi! ;-) If you like,
> auto-translate the question from Hindi to Pinyin, or to Thai, or
> to Arabic, or to Russian, or to ... and I think the question about
> wanting a wider-than-8-bit `char' will answer itself.

Not really. Your message containing Hindi text displayed as actual
characters on my system, not the rectangles it shows when it doesn't
know how to display a character. I presume that they were the correct
characters - Google translate translates them back to English as "Why
anyone would want to be لما four other 8-bit", which I presume is close
to what you wanted to say, aside from the character it refused to
translate. However, it arrived at my newsreader with the following headers:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

The use of UTF-9, UTF-16, UTF-18, UTF-32, UCS2, or UCS4 would have been
evidence of a need for chars wider than 8 bits, but UTF-8 is actually an
argument against that being necessary.
--
James Kuyper

James Kuyper

unread,

Apr 20, 2013, 6:59:19 AM4/20/13

to

On 04/20/2013 04:49 AM, Malcolm McLean wrote:
> On Saturday, April 20, 2013 1:14:26 AM UTC+1, Shriramana Sharma wrote:

...

>> Finally, why would anyone want char to be other than 8 bits? *Is* char on any > platform *not* 8 bits?
>>
>
> Denis Ritchie, who designed C, made a mistake by making "char" (a variable that
> holds a character in a human-readable language) and "byte" (the smallest
> addressible unit of memory) the same same thing.

Not quite. 'char' is a data type, while 'byte' is a unit for measuring
the amount of memory required to store an object. As a data type, 'char'
has an integer conversion rank, and if signed, it might have either 1's
complement, 2's complement, or sign-magnitude representation. As a unit
for measuring storage, a byte has none of those things. He decided to
make sizeof(char) == 1 byte.

C would arguably have been better if designed from the start with
something similar to the current wchar_t and size-named types,
preferably with different names, rather than with char, short, int, and
long. I'd recommend thinking along those lines when designing a new
language. However, it would break too much legacy code to ever move C in
that direction.
--
James Kuyper

James Kuyper

unread,

Apr 20, 2013, 7:04:10 AM4/20/13

to

On 04/20/2013 06:45 AM, James Kuyper wrote:
...

> Not really. Your message containing Hindi text displayed as actual
> characters on my system, not the rectangles it shows when it doesn't
> know how to display a character. I presume that they were the correct
> characters - Google translate translates them back to English as "Why
> anyone would want to be لما four other 8-bit", which I presume is close
> to what you wanted to say, aside from the character it refused to
> translate.

I just realized that the Hindi character it refused to translate back to
English is probably the one that was produced to (mis?)translate "char".
--
James Kuyper

Eric Sosman

unread,

Apr 20, 2013, 8:48:04 AM4/20/13

to

Do you, yourself, ever find yourself wanting something that
isn't strictly necessary? Perhaps because something beyond the
bare minimum might be more convenient, or more pleasant? Those
white spaces in your source code: Are they all necessary?

Multi-byte character encodings are possible, but clumsy.
Library functions like strchr() do not deal well with them,
programmers who must write (or should have written!) calls to
wcstombs() and the like do not deal well with them, fseek()
does not deal well with them, ... Wouldn't It Be Nicer if a
single "atom" of data could encode an entire character, without
relying on surrounding context to allow decoding?

I do not offer the existence of rich glyph sets as evidence
that `char' *must* be made wider, only as evidence that someone
might have reason to *want* it wider (the O.P.'s question).

--
Eric Sosman
eso...@comcast-dot-net.invalid

Eric Sosman

unread,

Apr 20, 2013, 8:58:01 AM4/20/13

to

Also, keep in mind the amount of memory on the machines where
early C and Unix were born. Quoth one DMR:

"During [B's] development, [Thompson] continually struggled
against memory limitations: each language addition inflated
the compiler so it could barely fit, but each rewrite taking
advantage of the feature reduced its size."

In that sort of environment, one hasn't the luxury of adding every
desirable feature.

--
Eric Sosman
eso...@comcast-dot-net.invalid

Malcolm McLean

unread,

Apr 20, 2013, 9:05:43 AM4/20/13

to

There are problems with big alphabets, however.

One is that keyboards won't enter them. Whilst you can fix this with a virtual
keyboard of some description, it's very difficult.
Then it becomes impossible to provide glyphs for every character, unless you
are a corporation with massive resources.
Another problem is that the vast majority of symbols are meaningless to the
vast majority of programmers. So if there's a spurious squiggly X-like thing,
the programmer doesn't even know the name of the symbol causing the bug,
much less its unicode or what it might represent.

Often it's better to say that computers speak English. If you want another
language, it's built on top of English, e.g using sequences like α to
represent non-English letters.

--
Malcolm's website
http://www.malcommclean.site11.com/www

BGB

unread,

Apr 20, 2013, 4:01:59 PM4/20/13

to

part of the issue with C in these regards is that it aliases characters
and bytes.

having a separate char and byte types could have made more sense.
granted... 'wchar_t'...

in this case, 'char' more typically represents a byte, and it could make
more sense simply to nail down the byte size as 8 bits, and by
extension, 'char'.

many of us have good results using UTF-8 for nearly everything. those
things which don't work well in UTF-8, can typically use UTF-16 or UTF-32.

generally, UTF-32 is often unnecessary:
it is rare to find text using any characters outside the BMP;
it is also rare to find fonts which support it (hard, actually, even to
find fonts which effectively support most of the Unicode BMP);
...

so, while some people may object, naive UCS2 may actually work pretty
well in many cases involving internationalized text.

however, a person can use 32-bit characters, and treat the high bits as
formatting data (text color and style).

for example, in my case I have a tweaked character encoding which uses
32-bits per character:
if the character fits in 16 bits, then the high 16-bits are used as
formatting;
if the character does not, it has 20 bits, loses its background color
(the background color comes from a prior character).

some combinations of formatting options are also assumed to be mutually
exclusive to help save bits (such as
superscript/subscript/strikethrough, ...).

in the source-text form (UTF-8), this information is generally
represented using ANSI-codes (though other options could be possible).

FWIW (OT):
in my own (scripting) language, it more goes the route of making bytes
and characters semantically different types:
byte/sbyte/ubyte: bytes, defined as always 8 bits (sbyte = signed byte);
char: default character (*1);
cchar: C character, defined as being (by default) 8 bits;
char8: explicit 8-bit character;
char16: explicit 16-bit character;
char32: explicit 32-bit character.

*1: generally, it is 16-bits in storage (arrays or structs), but 32-bits
when in 'working' forms (in variables or function arguments). elsewhere,
it will try to align with 'wchar_t'.

they also differ partly in that they represent different parts of the
numeric tower (byte and friends are part of the integer tower, with
'char' and friends as a partially disjoint character tower, where casts
are used to convert between them).

within the FFI (C <-> BS):
'char' <-> 'cchar';
'unsigned char' <-> 'byte/ubyte';
'signed char' <-> 'sbyte'.
'wchar_t' <-> 'char';
...

note that, sizes of byte/short/int/long/... are explicitly defined as
8/16/32/64 bits. in targets where C differs, they will not line up with
their C name-equivalents (for example, a hypothetical implementation on
a 16-bit target would still use a 32-bit 'int', even if C were using a
16-bit 'int').

...

Nobody

unread,

Apr 20, 2013, 5:25:56 PM4/20/13

to

On Sat, 20 Apr 2013 15:01:59 -0500, BGB wrote:

> (hard, actually, even to find fonts which effectively support most of the
> Unicode BMP); ...

Technically, it's impossible.

A "font" (or, outside the US, "fount") is a complete set of type in a
particular style and size (the term "scalable font" is an oxymoron; a set
of glyphs in a common style without any particular size is a "typeface",
not a "font").

The inherent differences between scripts mean that it's impossible for
as set of glyphs for the Latin script to have the same "style" as a set of
glyphs for e.g. an Arabic or Han script, so distinct scripts cannot be
part of the same "font".

You might get multiple fonts for multiple scripts in a single TTF file,
but that's not the same thing as a font. It's also not a particularly good
idea, as it requires making completely arbitrary choices as to which
typeface to use for each script. It's normally done as a workaround for
software which expects the user to choose a single "font" for all text
regardless of the scripts which are used.

Edward A. Falk

unread,

Apr 20, 2013, 8:06:09 PM4/20/13

to

Heh. When I first learned the language, the only thing the spec
guaranteed was that short would not be longer than long.

If you care about word size, use int16_t, etc.

--
-Ed Falk, fa...@despams.r.us.com
http://thespamdiaries.blogspot.com/

Gordon Burditt

unread,

Apr 20, 2013, 8:26:54 PM4/20/13

to

>>Finally, why would anyone want char to be other than 8 bits? *Is* char on any platform *not* 8 bits?

There are some signal-processing embedded processors that use 32
bits for char, short, and int. The instruction set addresses memory
as 32-bit words; there are no lower-order address bits to select a
byte in a word. While there are possible ways to work around this,
it seems that doing so was sufficiently painful that they decided
not to. The emphasis for these processors is processing signals,
not text, although they still might use some text for logging or
user interface.

Other posters have talked about Cray supercomputers.

Old (obsolete) systems include the PDP-8, with 12 bit wide memory
and registers, and the GE-635 which used 36-bit-wide memory where
characters were either 6 or 9 bits, selectable by a bit in a tally
word (which now might be called a "fat pointer"). The Baudot
character set used with early Teletype machines used 5 bits (with
one bit of state carried between characters).

For some languages (e.g. Chinese, Japanese, and Korean), it's pushing
it trying to squeeze a character into *16* bits.

Malcolm McLean

unread,

Apr 21, 2013, 6:51:33 AM4/21/13

to

On Saturday, April 20, 2013 10:25:56 PM UTC+1, Nobody wrote:
> On Sat, 20 Apr 2013 15:01:59 -0500, BGB wrote:
>
> The inherent differences between scripts mean that it's
> impossible for as set of glyphs for the Latin script to have the
> same "style" as a set of glyphs for e.g. an Arabic or Han script,
> so distinct scripts cannot be part of the same "font".
>

Also some languages have multiple scripts for the same alphabet. Hebrew has an archaic paleoHebrew script, which you might still
want for scholarly purposes, the Masoretic script which is used for
modern printed matter and handwritten religious texts, and a
simpler handwritten script which is used for everyday note taking.

In English of course we have upper and lower case letters, but
computers treat those as different "characters".

gpiets...@gmail.com

unread,

Apr 21, 2013, 5:50:55 PM4/21/13

to

On Friday, April 19, 2013 8:14:26 PM UTC-4, Shriramana Sharma wrote:
> Hello. I am reading the C99 standard as available from: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>
>
>
> I note that it specifies (on p 34) macros defining the minimum and maximum values of a short int corresponding to a size of 16 bits. However it doesn't explicitly say that short int-s should be of 16 bits size. So can I trust short int-s to be 16 bits size or not?

No. On a Cray, all the integers outside of char were 64 bits, and that included short.

>
> Also, doesn't prescribing #define-s for integer type min/max values conflict with the general (?) understanding that the size of these types are implementation defined? I mean, is the general understanding wrong? (For instance see: http://en.wikipedia.org/wiki/Short_integer#cnote_b_grp_notesc)

I don't think there's a conflict.

>
> Finally, why would anyone want char to be other than 8 bits? *Is* char on any platform *not* 8 bits?

Think Unicode.
>
> Thanks.

Keith Thompson

unread,

Apr 21, 2013, 7:57:31 PM4/21/13

to

gpiets...@gmail.com writes:
> On Friday, April 19, 2013 8:14:26 PM UTC-4, Shriramana Sharma wrote:
>> Hello. I am reading the C99 standard as available from:
>> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf
>>
>> I note that it specifies (on p 34) macros defining the minimum and
>> maximum values of a short int corresponding to a size of 16
>> bits. However it doesn't explicitly say that short int-s should be of
>> 16 bits size. So can I trust short int-s to be 16 bits size or not?
>
> No. On a Cray, all the integers outside of char were 64 bits, and that
> included short.

Depends on the Cray.

On the vector systems (at least the ones I worked on, J90 and T90),
short, int, and long were 64 bits. On the T3E (which used Alpha CPUs),
short was 32 bits, and int and long were 64 bits. char was 8 bits on
both systems -- which required some extra work on the T90.

Nobody

unread,

Apr 22, 2013, 8:57:19 AM4/22/13

to

On Sun, 21 Apr 2013 14:50:55 -0700, gpietsch08618 wrote:

>> Finally, why would anyone want char to be other than 8 bits? *Is* char
>> on any platform *not* 8 bits?
>
> Think Unicode.

That's normally supported via wchar_t.

As has been pointed out elsewhere, in spite of the name, a "char" isn't
necessarily a character. The semantics of "char" appear to be designed
around the assumption that it's a byte, i.e. the hardware's unit of
addressable memory.

AFAICT, all general-purpose microprocessors use an 8-bit byte, and thus so
do systems built around them. Some systems whose CPU was built from
discrete logic used other sizes (I'm aware of 6, 10, and 12 bits).
Dedicated DSPs often have 32-bit bytes (i.e. memory is addressed in words).

James Kuyper

unread,

Apr 22, 2013, 12:58:37 PM4/22/13

to

On 04/19/2013 11:06 PM, James Kuyper wrote:
...
> ... That means that an implementation is

> free to define SHRT_MAX as 65535, in which case 'short' would have to

> have 17 value bits, ...

Correction: 16 value bits and 1 sign bit.

Bill Leary

unread,

Apr 22, 2013, 1:00:11 PM4/22/13

to

"Nobody" wrote in message news:pan.2013.04.22...@nowhere.com...

> On Sun, 21 Apr 2013 14:50:55 -0700, gpietsch08618 wrote:
>>>
>> Finally, why would anyone want char to be other than 8 bits? *Is* char
>>> on any platform *not* 8 bits?
>>
>> Think Unicode.
>
> That's normally supported via wchar_t.
>
> As has been pointed out elsewhere, in spite of the name, a "char"
> isn't necessarily a character. The semantics of "char" appear to be
> designed around the assumption that it's a byte, i.e. the hardware's
> unit of addressable memory.

Your first sentence, yes. But your second sentence, by my recollection of
the period, not so much. From what I recall the data type was meant to
handle characters, in whatever size and by whatever means the machine
supported them. Or, even if it did not support them. At the time, it was
by no means assumed that bytes were eight bits. And quite a number of
machines couldn't directly address bytes or chars. I remember both six and
eight bit chars on machines which did not support addressing data in those
sizes. On some machines int and char pointers are architected and processed
differently.

On Nova's, for example, int and long pointers are fifteen bits. 0x00000010
being the sixteenth word, the thirty-second and thirty-third bytes.. And
0x00000011 being the seventeenth word, the thirty-fourth and thirty-fifth
bytes. Char pointers are sixteen bits, 0x00000010 being the sixteenth
byte, in one half of the eighth word. And 0x00000011 being the seventeenth
byte, in the other half of the eighth word. Byte accesses where handled by
a small subroutine or a few in lined instructions or, in later models, byte
instructions, still using different pointer structures.

If you did this:

int *ip;
int i;
char *cp;
char c[10];
. . .
cp = &c[i];
ip = (char *)cp;
cp = (int *)ip;

The pointer cp would still be pointing to the variable c only if i happened
to be even. If i was odd, ip would now be pointing to the byte before the
one it used to point to.

- Bill

Edward A. Falk

unread,

Apr 22, 2013, 2:22:01 PM4/22/13

to

In article <EL2dnQ8NVYLTr-7M...@posted.internetamerica>,

Gordon Burditt <gordon...@burditt.org> wrote:
>
>Old (obsolete) systems include the PDP-8, with 12 bit wide memory
>and registers, and the GE-635 which used 36-bit-wide memory where
>characters were either 6 or 9 bits, selectable by a bit in a tally

>word (which now might be called a "fat pointer")...

And the PDP-10 which packed five 7-bit characters into a 36-bit word.

And don't get started on the sixbit and radix-50 alphabets used
by the pdp-10 and pdp-11 respectively.

glen herrmannsfeldt

unread,

Apr 22, 2013, 2:39:15 PM4/22/13

to

Edward A. Falk <fa...@rahul.net> wrote:
> In article <EL2dnQ8NVYLTr-7M...@posted.internetamerica>,
> Gordon Burditt <gordon...@burditt.org> wrote:

>>Old (obsolete) systems include the PDP-8, with 12 bit wide memory
>>and registers, and the GE-635 which used 36-bit-wide memory where
>>characters were either 6 or 9 bits, selectable by a bit in a tally
>>word (which now might be called a "fat pointer")...

> And the PDP-10 which packed five 7-bit characters into a 36-bit word.

> And don't get started on the sixbit and radix-50 alphabets used
> by the pdp-10 and pdp-11 respectively.

I used to write programs in PDP-10 Fortran that would read 9 track
tapes with EBCDIC data and convert to ASCII. I believe it reads four
tape bytes into a 36 bit word.

There is also a way to write 36 bit words to tape, and read them back
again, which I believe writes two words to nine tape bytes.

But yes, the normal ASCII form for PDP-10 data files stores five
characters per word, with one bit left over. Line oriented editors
would write five digit line numbers in the first word of the line,
with the low bit set. Compilers would recognize those, ignore the number
as far as program input went, but use the number in error messages.

It would confuse TECO users, though, if you sent them line numbered
files.

-- glen

glen herrmannsfeldt

unread,

Apr 22, 2013, 2:44:18 PM4/22/13

to

Nobody <nob...@nowhere.com> wrote:
> On Sun, 21 Apr 2013 14:50:55 -0700, gpietsch08618 wrote:

>>> Finally, why would anyone want char to be other than 8 bits?
>>> *Is* char on any platform *not* 8 bits?

(snip)

> As has been pointed out elsewhere, in spite of the name, a "char" isn't
> necessarily a character. The semantics of "char" appear to be designed
> around the assumption that it's a byte, i.e. the hardware's unit of
> addressable memory.

> AFAICT, all general-purpose microprocessors use an 8-bit byte,
> and thus so do systems built around them. Some systems whose CPU
> was built from discrete logic used other sizes (I'm aware of 6,
> 10, and 12 bits).

Not knowing the exact definition of "general purpose microprocessor"
the Intersil IM6100 uses a 12 bit word. That came out about the time
the transition was being made from 8 bit (8080, 6800, 6502) to 16 bit
(8086, 9900, 68000) processors. It didn't get as popular as some other
processors, though.

> Dedicated DSPs often have 32-bit bytes (i.e. memory is addressed in words).

I believe that some DSPs use 24 bit memory words.

-- glen

William Ahern

unread,

Apr 23, 2013, 12:55:17 AM4/23/13

to

Nobody <nob...@nowhere.com> wrote:
> On Sun, 21 Apr 2013 14:50:55 -0700, gpietsch08618 wrote:

> >> Finally, why would anyone want char to be other than 8 bits? *Is* char
> >> on any platform *not* 8 bits?
> >
> > Think Unicode.

> That's normally supported via wchar_t.

Poorly. What most people consider a "character" is in Unicode a grapheme.
For example, an identifier atom in a programming language. However, even in
the realm of the BMP not all graphemes will map to a single codepoint in any
of the UTF encodings.

Perl 6 addressed this by dynamically mapping graphemes without precomposed
equivalents to a single codepoint at runtime. AFAIK, no other language or
language implementation takes such an approach, even though it's basically
the only way to achieve low-level indexing of Unicode "characters" in C.
Thus, in C iswalpha() cannot work as advertised, even for UTF-32, unless one
plays games with terminology.

Generally, engineers seem to get round this dilemma by waving their hands
and saying that nobody cares about such sequences. This is how the C
standard does it. It specifically disallows composing characters in its
identifiers, presumably because it's impossible to actually support them
using simple interfaces like iswalpha, etc.

But such sleight of hand doesn't fly when you're trying to parse free form
text. And in any event, you usually need a richer set of interfaces to parse
text attributes, such as word boundaries. C's wide character interfaces are
wholly inadequate for the task.

Bart van Ingen Schenau

unread,

Apr 23, 2013, 10:38:22 AM4/23/13

to

On Sat, 20 Apr 2013 08:48:04 -0400, Eric Sosman wrote:

> Do you, yourself, ever find yourself wanting something that
> isn't strictly necessary? Perhaps because something beyond the bare
> minimum might be more convenient, or more pleasant? Those white spaces
> in your source code: Are they all necessary?
>
> Multi-byte character encodings are possible, but clumsy.
> Library functions like strchr() do not deal well with them, programmers
> who must write (or should have written!) calls to wcstombs() and the
> like do not deal well with them, fseek() does not deal well with them,
> ... Wouldn't It Be Nicer if a single "atom" of data could encode an
> entire character, without relying on surrounding context to allow
> decoding?

Welcome to the world of Unicode, where *all* encodings (even UCS4/UTF-32)
should be considered to be variable-width (which is even worse than multi-
byte) thanks to the existence of combining characters and the like.

Bart v Ingen Schenau

Ken Brody

unread,

Apr 25, 2013, 9:46:46 AM4/25/13

to

On 4/19/2013 10:30 PM, Shriramana Sharma wrote:
> Hello and thanks people for your clarifications. My mistake for not
> reading the introductory text correctly.
>
> BTW despite my name, I am not really that good at Hindi (it's not my
> mother tongue), so (auto-?)translating to it doesn't really facilitate
> anything. (Thanks for trying though!)

I believe his point was that "ا क्यों किसी को भी चार अन्य 8 बिट होना चाहेगा"
doesn't fit well into 8-bit characters, which IMHO facilitated answering
both of your questions:

Vincenzo Mercuri

unread,

Apr 27, 2013, 8:43:41 PM4/27/13

to

On 25/04/2013 15:46, Ken Brody wrote:
[..]

> *Is* char on any platform *not* 8 bits?
>

From Wikipedia ( http://en.wikipedia.org/wiki/36-bit ):

"The standard C programming language requires that the size of the char
data type be at least 8 bits, and that all data types other than bitfields
have a size that is a multiple of the character size, so standard C
implementations on 36-bit machines would typically use 9-bit chars,
although 12-bit, 18-bit, or 36-bit would also satisfy the requirements
of the standard."

However, the POSIX standard mandates CHAR_BIT to be exactly 8.

--
Vincenzo Mercuri