Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Must casting destroy lvalueness?

31 views
Skip to first unread message

NET-RELAY.ARPA>@brl-smoke.arpa

unread,
Oct 15, 1986, 6:35:01 AM10/15/86
to
I'm relatively new to info-c and don't know if you've had this debate
so just banish me to the archive if you have.

C pedants claim that casting destroys lvalueness. Their argument is
essentially that they can imagine a machine on which casting forces
the use of a temp so lvalueness is gone.

C users, on the other hand, find they have to program real machines not
hypothetical ones and that almost all of these real machines don't use a
temp when casting. For example, a useful and readable way to move a pointer
through a buffer containing a mixture of objects of different sizes is

((OBJECT)pointer)++

This construct is disallowed by Harbison's compiler.

I fear that the C standards committee is going to take away such practical
constructs and turn production quality C compilers into academic quality
ones. Who knows, there may develop a brisk business in C compilers that
promise NOT to be standard conforming.

How sayeth the C standard committee? How sayeth the users?

Regards, Scott

"You can hack any formalism so why not have useful formalisms?"

Wayne Throop

unread,
Oct 20, 1986, 5:48:45 PM10/20/86
to
> NET-RELAY.ARPA>@brl-smoke.ARPA (Scott)

> I'm relatively new to info-c and don't know if you've had this debate
> so just banish me to the archive if you have.

It has come around before.

> C pedants claim that casting destroys lvalueness. Their argument is
> essentially that they can imagine a machine on which casting forces
> the use of a temp so lvalueness is gone.

Not that they can imagine such a machine. That such machines actually
exist. I am editing on one now.

> C users, on the other hand, find they have to program real machines not
> hypothetical ones and that almost all of these real machines don't use a
> temp when casting. For example, a useful and readable way to move a pointer
> through a buffer containing a mixture of objects of different sizes is
>
> ((OBJECT)pointer)++
>
> This construct is disallowed by Harbison's compiler.

Not really very readible. Let's take a similar example. Would you
expect

int i;
((short)i)++;

to do anything sensible? If so, why? If not, why should the pointer
case work sensibly? After all, there *are* machines upon which integers
and shorts "mean the same thing", so the cast and increment could work
sometimes, right?

> I fear that the C standards committee is going to take away such practical
> constructs and turn production quality C compilers into academic quality
> ones. Who knows, there may develop a brisk business in C compilers that
> promise NOT to be standard conforming.

Only among those who don't care about portability or maintainability.

> How sayeth the C standard committee? How sayeth the users?

I sayeth that if thou wishest to taketh an object as a different typeth,
thou mayest do so. However, casts are not the way to do this in C, and
the practice is not portable. If you must take the bits of one pointer
type as being those of another pointer type, use

(*((some_type **)&p))++

or use unions like God intended. Don't try to pervert casts to do
something they weren't intended for. Casts convert, unions take-as.
The take-as operation is inherently non-portable.

--
"Thank you for your support."
--
Wayne Throop <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

NET-RELAY.ARPA>@brl-smoke.arpa

unread,
Oct 20, 1986, 6:11:17 PM10/20/86
to
I'm relatively new to info-c and don't know if you've had this debate
so just banish me to the archive if you have.

C pedants claim that casting destroys lvalueness. Their argument is


essentially that they can imagine a machine on which casting forces
the use of a temp so lvalueness is gone.

C users, on the other hand, find they have to program real machines not


hypothetical ones and that almost all of these real machines don't use a
temp when casting. For example, a useful and readable way to move a pointer
through a buffer containing a mixture of objects of different sizes is

((OBJECT)pointer)++

This construct is disallowed by Harbison's compiler.

I fear that the C standards committee is going to take away such practical

constructs and turn production quality C compilers into academic quality
ones. Who knows, there may develop a brisk business in C compilers that
promise NOT to be standard conforming.

How sayeth the C standard committee? How sayeth the users?

Regards, Scott

Stuart D. Gathman

unread,
Oct 21, 1986, 11:28:58 AM10/21/86
to
In article <46...@brl-smoke.ARPA>, NET-RELAY.ARPA>@brl-smoke.ARPA writes:
> For example, a useful and readable way to move a pointer
> through a buffer containing a mixture of objects of different sizes is

> ((OBJECT)pointer)++

A more correct syntax:

{
char *pointer;
/* . . . */
pointer += sizeof (OBJECT);
/* . . . */
}

And clearer to boot if you ask me.
--
Stuart D. Gathman <..!seismo!{vrdxhq|dgis}!BMS-AT!stuart>

David desJardins

unread,
Oct 23, 1986, 7:01:44 AM10/23/86
to

In article <46...@brl-smoke.ARPA>, NET-RELAY.ARPA>@brl-smoke.ARPA writes:
> For example, a useful and readable way to move a pointer
> through a buffer containing a mixture of objects of different sizes is
>
> ((OBJECT)pointer)++

In article <2...@BMS-AT.UUCP> stu...@BMS-AT.UUCP (Stuart D. Gathman) writes:
>A more correct syntax:
>
>{
> char *pointer;
> /* . . . */
> pointer += sizeof (OBJECT);
> /* . . . */
>}
>
>And clearer to boot if you ask me.

Wrong if you ask me. First, in the original example it is clear that
OBJECT is a pointer type. And second, your sample code does not work, and
can not readily be fixed, if sizeof (char) != 1.

In article <657@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes:
>I sayeth that if thou wishest to taketh an object as a different typeth,
>thou mayest do so. However, casts are not the way to do this in C, and
>the practice is not portable. If you must take the bits of one pointer
>type as being those of another pointer type, use
>
> (*((some_type **)&p))++

In C, pointers and lvalues are the same thing (there is a bijection given
by & and *). Essentially, = (and the other assignment operators) automatic-
ally take the address of their left-hand arguments, in much the same way that
setq/set! quote their first arguments. In both cases this is done simply to
enhance readability and reduce mistakes; x = x+1 is clearer and less error-
prone than &x <- x+1 would be [store x+1 in the location &x], just as
(setq x (+ x 1)) is clearer and less error-prone than (set (quote x) (+ x 1)).
Without this syntactic sugar, the situation would be clearer. The C
statement (foo) x = ... can correspond either to &((foo) x) <- ... or to
(foo *) (&x) <- ..., depending on when the implicit address calculation takes
place. Since only the latter makes any real sense, I submit that casting of
lvalues should be interpreted in this way.
So I would conclude that the statement ((OBJECT) pointer)++ should be
interpreted as

* ((OBJECT *) &pointer) = (OBJECT) pointer + 1;

It is not absolutely clear that this is equivalent to Wayne Throop's
alternative formulation (* ((OBJECT *) &pointer))++, which would give

* ((OBJECT *) &pointer) = * ((OBJECT *) &pointer) + 1;

or to the standard C construction for this type of operation, which is

pointer = (char *) ((OBJECT) pointer + 1);

But in any implementation in which casting of pointers works at all, I think
that these should almost certainly give the same result. At any rate they
are all legal C.

>or use unions like God intended. Don't try to pervert casts to do
>something they weren't intended for. Casts convert, unions take-as.
>The take-as operation is inherently non-portable.

But the point is that casts of pointers *don't* convert. Either they
simply take-as, or they are meaningless. So, if the language allows casting
of pointers, then I see no valid reason to complain when the programmer uses
this feature (especially since essentially all C implementations make it
impossible to avoid when using any sort of dynamic memory allocation!).
And if he is allowed to use casting, why force him to write *((foo *) &x) =
when (foo) x = will do? At any rate, I think that the answer to the question
"*Must* casting destroy lvalues?" is clearly "No."

-- David desJardins

ste...@datacube.uucp

unread,
Oct 23, 1986, 9:19:00 AM10/23/86
to

I generally use or define caddr_t as the type of a generic pointer, and then
use macros to perform the indicated operations, i.e.:

caddr_t generic_pointer;

#define DATA( p, type ) (*((type *)(p)))
#define SUCC( p, type ) ((p) += sizeof(type))
#define PRED( p, type ) ((p) -= sizeof(type))

a = DATA(p,int);
SUCC(a, int);
b = DATA(p,double);
SUCC(p, double);

In the case of a compiler where the cast in DATA is invalid, an
alternate formulation can be made.


Stephen Watkins UUCP: ihnp4!datacube!stephen
Datacube Inc.; 4 Dearborn Rd.; Peabody, Ma. 01960; 617-535-6644

David desJardins

unread,
Oct 23, 1986, 10:11:12 PM10/23/86
to
In article <657@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes:
>Let's take a similar example. Would you expect
>
> int i;
> ((short)i)++;
>
>to do anything sensible? If so, why?

In my opinion this should have the result

* ((short *) &i) = (short) i + 1;

Obviously the result of this operation is machine-dependent (since the effect
of casting int to short is machine-dependent). But on an appropriate machine
this does indeed have not only a sensible but a *useful* effect -- it will
increment the low bytes of i without carrying into the high bytes.

At any rate the casting of int to short performs a fundamentally different
operation than does casting of pointers (in Wayne's terminology, the former
"converts" and the latter "takes-as"), and so it is not necessary for one to
make sense in order for the other to be allowed.

Note also that on most machines the statements

int *p;
((int) p)++;

makes sense (it increments the address referenced by p by the addressing unit
of the machine). In fact this is arguably the correct way to use the value
produced by 'sizeof';

(int) p += sizeof (foo);

makes sense on any machine where 'sizeof' gives results in multiples of the
addressing unit of the machine, whereas the more common alternative

p = (int *) ((char *) p + sizeof (foo));

is both clumsier and makes the unnecessary assumption that sizeof (char) == 1.


Another justification for casting of lvalues is the case of register
variables. In this case Wayne's alternative syntax doesn't work:

register int *p;
(* ((foo **) &p))++; <== ERROR

But the idea of casting (or "taking-as") the pointer p as a pointer to foo is
still perfectly valid, and the proposed syntax

((foo *) p)++;

still makes sense, and can be understood and implemented by a compiler.

-- David desJardins

Bob Larson

unread,
Oct 25, 1986, 10:15:44 AM10/25/86
to
In article <5...@cartan.Berkeley.EDU> desj@brahms (David desJardins) writes:

[some stuff so wrong that I decided to reply]

>In article <46...@brl-smoke.ARPA>, NET-RELAY.ARPA>@brl-smoke.ARPA writes:
>> For example, a useful and readable way to move a pointer
>> through a buffer containing a mixture of objects of different sizes is
>>
>> ((OBJECT)pointer)++

Here is a major misconseption, based on a limited set of machines and a
limited imagination:


> But the point is that casts of pointers *don't* convert.

Repeat until remembered:

CASTS DO ANY NESSISARY CONVERSION.

There are machines with more than one type of pointer. C makes no
assuptions that require there only to be one type of pointer.
"Byte" pointers on a PDP-10 include both the bit position and size of
the object being pointed to. There are quite a few machines with
different "word" and "character" pointers.

To do what the original poster wanted (as portably as possible):

(pointer = (type_of_pointer) ((char *)pointer + sizeof(OBJECT)))

It says what you mean. It is no more ugly than what you are trying to
do. The explicit casts are obvious places to look for portablility
problems.

[ If casts did not implicitly lose the lvalueness of the expression,
can you tell me what the following code fragment should do? Please
explain in terms the begining C programer who meant f += (float)i;
could understand, as well as the non-portable C coder who means
*(int *)&f = i;

float f = 0.1;
int i = 2;
(int)f += i;
]
--
Bob Larson
Arpa: Bla...@Usc-Eclb.Arpa or bla...@usc-oberon.arpa
Uucp: (ihnp4,hplabs,tektronix)!sdcrdcf!usc-oberon!blarson

Wayne Throop

unread,
Oct 27, 1986, 4:48:50 PM10/27/86
to
David is very misinformed on this question. It is apparently an easy
thing to do, since mis- and dis- information about casts and pointers in
C is so common. Never fear, I'll point out, case by case, where he went
wrong. :-)

> desj@brahms (David desJardins)
>> throopw@dg_rtp.UUCP (Wayne Throop)
>>> Stuart Gathman

>>>> ((OBJECT)pointer)++


>>>A more correct syntax:
>>> char *pointer;

>>> pointer += sizeof (OBJECT);


>>>And clearer to boot if you ask me.
> Wrong if you ask me. First, in the original example it is clear that
> OBJECT is a pointer type. And second, your sample code does not work, and
> can not readily be fixed, if sizeof (char) != 1.

True, true. But ANSI is likely to decide that sizeof(char) MUST ALWAYS
BE one (and I think this is universally true on existing
implementations... if I'm wrong, somebody let me know). The relevant
incantation from the draft standard (3.3.3.4):

The sizeof operator yields the size (in bytes) of its operand, which
may be an expression of the parenthesized name of a type. [...]
When aplied to an operand that has type char, unsigned char, or
signed char, the result is 1.

In order to make Stuart's method work (since OBJECT is a pointer type,
as David correctly points out) one must (oddly enough) say:

pointer += sizeof( *((OBJECT)0) );


Now, in his critique of my earlier posting, David falls into *really*
serious error, as follows:

>>I sayeth that if thou wishest to taketh an object as a different typeth,
>>thou mayest do so. However, casts are not the way to do this in C, and
>>the practice is not portable. If you must take the bits of one pointer
>>type as being those of another pointer type, use
>> (*((some_type **)&p))++
> In C, pointers and lvalues are the same thing (there is a bijection given
> by & and *).

False. The "&" operator does not work on bit fields nor register
values, and yet these are lvalues. In fact, close reading of K&R, H&S,
and the draft ANSI standard make it clear that not all legal lvalues map
to legal pointer typed expressions (via &), nor do all legal pointer
typed expressions map to legal lvalues (via *).

> Essentially, = (and the other assignment operators) automatic-
> ally take the address of their left-hand arguments, in much the same way that
> setq/set! quote their first arguments.

Again, false, and for about the same reasons. The notion of an
assignment operator implicitly quoting the assignee is nice, somewhat
elegant, and a familiar notion in LISP. But don't be fooled, folks. C
is not now and has never been LISP, and C does not have this simple,
elegant, unifying notion. C has "lvalues" instead.

> [expansion of this misunderstanding of lvalue, omitted]


> So I would conclude that the statement ((OBJECT) pointer)++ should be
> interpreted as
> * ((OBJECT *) &pointer) = (OBJECT) pointer + 1;

And you would be wrong.

> [more discussion based on the misunderstanding of lvalue, omitted]


> At any rate they are all legal C.

It is important to note that the original construct, ((OBJECT)pointer)++
is *DEFINITELY* *NOT* legal C. (It is not clear to me whether David is
claiming that it IS legal... in any case it is important to note that it
is NOT.)

>> [...] use unions like God intended. Don't try to pervert casts to do


>>something they weren't intended for. Casts convert, unions take-as.
>>The take-as operation is inherently non-portable.
> But the point is that casts of pointers *don't* convert. Either they
> simply take-as, or they are meaningless.

Absolutely wrong. Casts *ALWAYS* convert. Harbison and Steele say (on
page 152):

The cast causes the operand value to be converted to the type named
within the parentheses. Any permissible conversion may be invoked
by a cast expression.

The draft ANSI document says (3.3.4):

Preceeding an expression by a parenthesized type name converts the
value of the expression to the named type.

K&R say similar things in several places. (I'll let y'all look those up
by y'selfs.)

Another problem in the above passage is that David seems to have a
serious case of "pointers is pointers" disease. Pointers to different
types may (and often do) have *COMPLETELY* *DIFFERENT* bit-wise formats.
Thus, the notion of converting an (int *) typed pointer to a (char *)
typed pointer is hardly "meaningless". On the DG MV architecture (to
randomly choose an example I'm modestly familiar with) these two pointer
types have the "ring field" in different locations, and one has an
indirect bit that the other lacks. A pointer to a given word needs to
be *CONVERTED* *TO* *A* *DIFFERENT* *FORMAT* to be a pointer to the
first byte in that word. This is just what casts were intended for
(first and foremost in the arithmetic types, but clearly useful and
necessary for pointers also).

David, I'm not sure who told you "casts of pointers don't convert".
Find whoever told you that base canard, and pummel some sense into the
miscreant, willya? You have been severely misled.

> So, if the language allows casting
> of pointers, then I see no valid reason to complain when the programmer uses
> this feature (especially since essentially all C implementations make it
> impossible to avoid when using any sort of dynamic memory allocation!).

Yes, casting is necessary to write a memory allocator in any even nearly
portable way. *BUT*, casts are *STILL* conversions, *NOT*, *NOT*, *NOT*
taken-ases.

> And if he is allowed to use casting, why force him to write *((foo *) &x) =
> when (foo) x = will do?

Granting the implicit hypothetical, no reason, of course.
But (foo)x *won't* do.

> At any rate, I think that the answer to the question
> "*Must* casting destroy lvalues?" is clearly "No."

Well, waxing philosophical, I agree. That is, I rather expect that one
can come up with some consistent set of semantics that will give meaning
to the notion of a cast as an lvalue. But it is well to keep in mind:
(deep breath, all together now) these semantics won't be *C* semantics.

--
"Pwease Mistew Game Wawden, ... can you teww me what season it WEAWWY is?"
"Why SOIT'NY, m'boy! It's BASEBALL season!"
--- (Elmer and Bugs, of course)

Chris Torek

unread,
Oct 28, 1986, 5:39:40 PM10/28/86
to
>In article <657@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes:
>> (*((some_type **)&p))++

In article <5...@cartan.Berkeley.EDU> desj@brahms (David desJardins) writes:

>... the point is that casts of pointers *don't* convert.

Yes they do, and Wayne Throop has to know it: His machine does
indeed convert. `char *' is 48 bits; `int *' is 32 bits. DG
machines use word pointers, except when dealing with bytes.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: ch...@mimsy.umd.edu

John Gilmore

unread,
Oct 29, 1986, 12:41:23 AM10/29/86
to
One interesting thing for people who don't understand this issue to do is
to ask yourself what the compiler should do for a construct like:

int i = 17;
((double)i)++;

Again, don't post your favorite answer to the net. Just think about it.

PS: When you're done that one, try:

double i = 17;
((int)i)++;
--
John Gilmore {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu jgil...@lll-crg.arpa
Overheard at a funeral: "I know this may be an awkward time, but do
you recall him ever mentioning source code?" -- Charles Addams

David desJardins

unread,
Oct 29, 1986, 8:15:04 PM10/29/86
to
In article <40...@umcp-cs.UUCP> ch...@umcp-cs.UUCP (Chris Torek) writes:
>In article <5...@cartan.Berkeley.EDU> desj@brahms (David desJardins) writes:
>>... the point is that casts of pointers *don't* convert.
>
>Yes they do, and Wayne Throop has to know it: His machine does
>indeed convert. `char *' is 48 bits; `int *' is 32 bits. DG
>machines use word pointers, except when dealing with bytes.

My apologies; I misspoke. What I meant to say is that casts of pointers
refer to the same physical locations, although the representations of pointers
of different types may be different.

-- David desJardins

John Sambrook

unread,
Oct 31, 1986, 12:42:26 PM10/31/86
to

Just a small correction. On the MV series both types of pointers are
32 bits. It is also true that character pointers have a different
format than pointers to other types. Casting from one type to the other
on the MV series requires a conversion.

--
John Sambrook Work: (206) 545-7433
University of Washington WD-12 Home: (206) 487-0180
Seattle, Washington 98195 UUCP: uw-beaver!uw-nsr!john

Doug Gwyn

unread,
Nov 3, 1986, 12:53:04 PM11/3/86
to
In article <663@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes:
>True, true. But ANSI is likely to decide that sizeof(char) MUST ALWAYS
>BE one (and I think this is universally true on existing
>implementations... if I'm wrong, somebody let me know).

X3J11 as it stands requires sizeof(char)==1. I have proposed that
this requirement be removed, to better support applications such as
Asian character sets and bitmap display programming. Along with
this, I proposed a new data type such that sizeof(short char)==1.
It turns out that the current draft proposed standard has to be
changed very little to support this distinction between character
objects (char) and smallest-addressable objects (short char). This
is much better, I think, than a proposal that introduced (long char)
for text characters.

Unfortunately, much existing C code believes that "char" means "byte".
My proposal would allow implementors the freedom to decide whether
supporting this existing practice is more important than the benefits
of making a distinction between the two concepts.

It is possible to write code that doesn't depend on sizeof(char)==1,
and some C programmers are already careful about this. Transition
to the more general scheme would occur gradually (if at all) for
existing C implementations, with only implementors of systems for
the Asian market and of bitmap display architectures initially taking
advantage of the opportunity to make these types different sizes.

Wayne Throop

unread,
Nov 3, 1986, 5:21:43 PM11/3/86
to
> desj@brahms (David desJardins)
>> throopw@dg_rtp.UUCP (Wayne Throop)

>>Let's take a similar example. Would you expect


>> int i;
>> ((short)i)++;
>>to do anything sensible? If so, why?
> In my opinion this should have the result
> * ((short *) &i) = (short) i + 1;

"Opinion." OK. Fine. But David's opinion clearly and trivially differs
from that of the folks who designed and implemented the C language. In
particular, David's interpretation of casts has them sometimes
converting, and sometimes taking-as. K&R, H&S, and the draft X3J11
standard are all as clear as they can be... casts *ALWAYS* convert.

> Obviously the result of this operation is machine-dependent (since the effect
> of casting int to short is machine-dependent).

Casting is *NOT* machine dependent in anywhere near the same sense that
taking the bits of an integer as if they were a short is. Again, David
has a fundamental misunderstanding of what it is that a cast does. It
*ALWAYS* converts. Casting an int to a short is machine dependant in
the limit, but in the case where the two types share range, the result
is dependable and machine independant. The draft X3J11 standard even
outlines some guarantees on what this range of portable casting is.

> But on an appropriate machine
> this does indeed have not only a sensible but a *useful* effect -- it will
> increment the low bytes of i without carrying into the high bytes.

Oh, please! Just because some illegal construction can be made to do
something useful on some machine-or-other is no reason to attempt to
legislate it so. Especially when this operation can be specified with
legal C constructs.

> At any rate the casting of int to short performs a fundamentally different
> operation than does casting of pointers (in Wayne's terminology, the former
> "converts" and the latter "takes-as"), and so it is not necessary for one to
> make sense in order for the other to be allowed.

False, false, false! Pointers have differing bit-wise formats, just as
arithmetic types do. The common examples are the word-addressed
machines, where the byte or character address format differs from the
"natural" architectural pointer format.

> In fact this is arguably the correct way to use the value
> produced by 'sizeof';
> (int) p += sizeof (foo);

One can argue it. And anybody that did so would be wrong. A pointer
taken-as an integer need not address in sizeof-unit-sized chunks. In
fact, there are many machines where this is not the case. Further,
there are machines where pointers aren't even the same *SIZE* as
integers. Surely these two trivial, well-known facts should point out
some flaws in any such argument?

> Another justification for casting of lvalues is the case of register
> variables. In this case Wayne's alternative syntax doesn't work:
> register int *p;
> (* ((foo **) &p))++; <== ERROR
> But the idea of casting (or "taking-as") the pointer p as a pointer to foo is
> still perfectly valid, and the proposed syntax
> ((foo *) p)++;
> still makes sense, and can be understood and implemented by a compiler.

Rubbish. David just argued in a previous note that casts retaining
lvalueness was justified because lvalues and addresses are "the same
thing". Now he claims that register values further support this
ludicrous position, despite the fact that they are a counterexample to
his previous justification!! Puhleeeze!


I'm trying hard to be civil here, honest I am, but really! If somebody
wants to act like a C guru, it might help to learn some C first. Get
this straight: casts convert. Unions take-as. A cast of a pointer can
take-as the bits it points to, BUT NOT THE POINTER ITSELF! The take-as
operation is completely machine dependant, and code using it is rendered
non-portable by this use. This is trivial, basic stuff folks. You
can't even *BEGIN* to understand C until you get *THIS* straight.

--
"Don't do this at home, kids..."

Wayne Throop

unread,
Nov 4, 1986, 11:23:56 AM11/4/86
to
> ch...@umcp-cs.UUCP (Chris Torek)
>> desj@brahms (David desJardins)

>>... the point is that casts of pointers *don't* convert.
> Yes they do, and Wayne Throop has to know it: His machine does
> indeed convert. int *' is 32 bits. DG

> machines use word pointers, except when dealing with bytes.

Chris is right in the grand sweep (as usual) but has made a little
mistake with the details (not very usual at all).

DG MV machines indeed have differing formats for (char *) and (int *)
pointers, but both are 32 bits. It's somebody else that has the 48-bit
character pointers (can't call to mind who that is just now). In fact,
there are three architecturally supported pointer types in the DG MV
world, word (16-bit-granular) pointers, byte (8-bit-granular) pointers,
and bit (1-bit-granular) pointers. These pointers indicate locations in
a 4-gigabyte address space of eight rings. The rings are the MV memory
protection method, as well as the method of regulating OS privileges.
The first two pointer formats fit into 32 bits, and are the only formats
that are used as pointer types in C. The third type of pointer requires
64 bits (more than strictly necessary).

For those not already bored, the three formats are like so:

1 2 3 1 2 3
12345678901234567890123456789012 12345678901234567890123456789012

word IRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWW
byte RRRBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
bit IRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWW XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

'I' is the indirect bit, 'RRR' is the ring number, 'WWW...' is offset in
words, 'BBB...' is offset in bytes, 'XXX...' is offset in bits.
(Obviously, a lot of redundant offset in that bit pointer. Basically,
it forms a word address first, and then a bit offset from there.)

Clearly, pointer casting on an MV machine *NEEDS* to be a conversion,
not a taken-as. And other word-addressed machines have similar
requirements.

--
"Don't try this at home, kids..."

Wayne Throop

unread,
Nov 4, 1986, 11:26:20 AM11/4/86
to
> ch...@umcp-cs.UUCP (Chris Torek)
>> desj@brahms (David desJardins)
>>... the point is that casts of pointers *don't* convert.
> Yes they do, and Wayne Throop has to know it: His machine does
> indeed convert. `char *' is 48 bits; `int *' is 32 bits. DG
> machines use word pointers, except when dealing with bytes.

Chris is right in the grand sweep (as usual) but has made a little

Guy Harris

unread,
Nov 4, 1986, 4:24:05 PM11/4/86
to
> X3J11 as it stands requires sizeof(char)==1. I have proposed that
> this requirement be removed, to better support applications such as
> Asian character sets and bitmap display programming. Along with
> this, I proposed a new data type such that sizeof(short char)==1.
> It turns out that the current draft proposed standard has to be
> changed very little to support this distinction between character
> objects (char) and smallest-addressable objects (short char). This
> is much better, I think, than a proposal that introduced (long char)
> for text characters.

Why? If this is the AT&T proposal, it did *not* "introduce (long char) for
text characters"; it introduced (long char) for *long* text characters.
"char" is still to be used when processing text that does not include long
(16-bit) characters. I believe the theory here was that requiring *all*
programs that process text ("cat" doesn't count; it doesn't - or, at least,
shouldn't - process text) to process them in 16-bit blocks might cut their
performance to a degree that customers who would not use the ability to
handle Kanji would find unacceptable. I have seen no data to confirm or
disprove this.

(Changing the meaning of "char" does not directly affect the support of
"bitmap display programming" at all. It only affects applications that
display things like Asian character sets on bitmap displays, but it doesn't
affect them any differently than it affects applications that display them
on "conventional" terminals that support those character sets.)

> Unfortunately, much existing C code believes that "char" means "byte".
> My proposal would allow implementors the freedom to decide whether
> supporting this existing practice is more important than the benefits
> of making a distinction between the two concepts.

Both "short char"/"char" and "char"/"long char" make a distinction between
the two concepts; one may have aesthetic objections with the way the latter
scheme draws the distinction, but that's another matter. (Is 16 bits enough
if you want to give every single character a code of its own?)

> It is possible to write code that doesn't depend on sizeof(char)==1,
> and some C programmers are already careful about this.

It is possible to write *some* code so that it doesn't depend on
sizeof(char)==1. Absent a data type one byte long, other code is difficult
at best to write this way.

> Transition to the more general scheme would occur gradually (if at all) for
> existing C implementations, with only implementors of systems for
> the Asian market and of bitmap display architectures initially taking
> advantage of the opportunity to make these types different sizes.

I think "if at all" is appropriate here. There are a *lot* of interfaces
that think that "char" is a one-byte data type; e.g., "read", "write", etc..
I see no evidence that converting existing code and data structures to use
"short char" would be anything other than highly disruptive.

Adding "long char" would permit new programs to be written to support long
characters, and permit existing programs to be rewritten to support them,
without breaking existing programs; this indicates to me that it would make
it much more likely that "long char" would be widely adopted and used than
that "short char" would. I see no reason why a proposal that would, quite
likely, lead to two different C-language environments existing in parallel
for a long time to come is superior to one that would permit environments to
add on the ability to handle long characters and thus would make it easier
for them to do so and thus more likely that they would. (This is especially
true when you consider that most of the programs in question would have to
be changed quite a bit to support Asian languages *anyway*; just widening
"char" to 16 bits, recompiling them, and linking them with a library with a
brand new standard I/O, etc. would barely begin to make them support those
languages.)
--
Guy Harris
{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
g...@sun.com (or g...@sun.arpa)

Daniel R. Levy

unread,
Nov 5, 1986, 12:53:48 AM11/5/86
to
In article <51...@brl-smoke.ARPA>, gw...@brl-smoke.ARPA (Doug Gwyn ) writes:
>X3J11 as it stands requires sizeof(char)==1. I have proposed that
>this requirement be removed, to better support applications such as
>Asian character sets and bitmap display programming. Along with
>this, I proposed a new data type such that sizeof(short char)==1.
>It turns out that the current draft proposed standard has to be
>changed very little to support this distinction between character
>objects (char) and smallest-addressable objects (short char). This
>is much better, I think, than a proposal that introduced (long char)
>for text characters.
>
>Unfortunately, much existing C code believes that "char" means "byte".
>
>It is possible to write code that doesn't depend on sizeof(char)==1,
>and some C programmers are already careful about this.

A question: what about the jillions of C programs out there which
declare "char *malloc()"? Will they all need to be changed? Common
sense says no, since malloc() is supposed to return a "maximally aligned"
address anyhow, so as far as anyone cares it could be declared float * or
double * or short int * or (anything else)* if malloc() in the malloc() code
itself were declared the same way. So if "char" happened to be a two byte
quantity, no sweat, right? Or was there any particular reason for declaring
malloc() to be a "char *"? And thus, might something break in malloc() or
the usage thereof if char might no longer be the smallest addressable quantity?
--
------------------------------- Disclaimer: The views contained herein are
| dan levy | yvel nad | my own and are not at all those of my em-
| an engihacker @ | ployer or the administrator of any computer
| at&t computer systems division | upon which I may hack.
| skokie, illinois |
-------------------------------- Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
go for it! allegra,ulysses,vax135}!ttrdc!levy

Jerry Leichter

unread,
Nov 5, 1986, 1:35:34 PM11/5/86
to

All this has really gotten out of hand. I've found I've wanted to use casts
as lvalues in exactly one situation: Incrementing a pointer by an amount
that is NOT to be taken as a multiple of the size of the type pointed to.
A "common" - NONE of these are what I'd really want to call common, but I
HAVE run into them more than once - case is in a (machine dependent!) storage
allocator that wants to make sure of proper alignment. This doesn't arise
in something like malloc, which wants a char * (void *, in ANSI C) anyway,
but in specialized applications. For example, I need to allocate stuff out
of a large array. The "stuff" will be of a known type - a struct whose last
element is a varying-length array - and will thus be varying in size. The
begining of anything allocated must be on an even address. So I have a pointer
to some big structure that I'd like to increment by 1. Not 1*size of the
structure, but 1. YES, THIS IS MACHINE DEPENDENT - I got tied down by such
a dependency when I cast the pointer to int to look at the bottom bit!

I can think of no particular use for casting of arbitrary lvalues, but in
situations as above, the following definition for a cast argument to op=
would be handy:

(type)a op= b

shall mean:

a = (type)a op b

(except that a is only evaluated once).

Pre- and post-decrement and increment should work in the obvious way. Note
that the type of (type)a op= b (and of ++(type)a, etc.) is the type of a,
NOT the type being cast to.

I can think of no real uses for this construct where "op" is anything but
"+" or "-".
-- Jerry

Lambert Meertens

unread,
Nov 6, 1986, 3:07:22 AM11/6/86
to rnews@mcvax
I still don't get what is so fundamentally wrong with the following
*addition* to C, which it seems to me that desj@brahms (David desJardins)
is arguing for:

A cast (T)E in a context where an lvalue is required
stands for *(T *)&(E) and so is lawful if the latter
would be allowed here.

For example, this would then make the following lawful:

int i;
char *p = (char *)&i;
++(int)*p;

At least, in `my' cc (BSD 4.3) everything works as expected if I

#define Lcast(T,E) (*(T*)&(E))

and then use

++Lcast(int, *p);

(I don't know if this is allowed by the ANSI C draft.) Note that I am not
particularly arguing in favour of this extension, the fact being that this
is one of the areas where I can live with current C. But neither do I see
a reason to get upset about it. It allows you to write pretty meaningless
things, but that is the case already for current casts. One thing that is
certain is that it does not invalidate existing code. Also, it seems
consistent to me with the design philosophy of C: start by putting some
very confusing things in (pointers vs. arrays, here pointers vs. lvalues),
then legitimate some of the most obvious unambiguous mistakes.

--

Lambert Meertens, CWI, Amsterdam; lam...@mcvax.UUCP

Mike (Don't have strength to leave) Meyer

unread,
Nov 6, 1986, 8:18:38 PM11/6/86
to
In article <1...@olamb.UUCP> ki...@olamb.UUCP (Kim Chr. Madsen) writes:
>Why not take the full step and let the datatype char be of variable size,
>like int's and other types. Then invent the datatype ``byte'' which is exactly
>8 bits long.

Ok, so what should those with C compilers on the QM/C (18 bit words,
word addressable) or the C/70 (20 bit words, two 10-bit address units
per word) do, hmmm? And yes, there are C compilers for those two
machines.

Not only is all the world not a VAX, it all isn't even addressable in
eight-bit units!

<mike

Doug Gwyn

unread,
Nov 6, 1986, 9:18:34 PM11/6/86
to
Guy missed the meaning of my reference to bitmap display programming.
What I really care about in this context is support for direct bit
addressing. I know for a fact that one reason we don't HAVE this on
some current architectures is the lack of access to the facility from
high-level languages. I would like it to be POSSIBLE for some designer
of an architecture likely to be used for bit-mapped systems to decide
to make bits directly addressable. I know I have often wished that I
had bit arrays in C when programming bitmap display applications.

The 8-bit byte was an arbitrary packaging decision (made by IBM for
the System/360 family, by DEC for the PDP-11, and by some others, but
definitely not by EVERY vendor). There are already some 9-, 10-, and
12-bit oriented C implementations; I would like to give implementors
the OPTION of choosing to use 16-bit (char)s even if their machine can
address individual 8-bit bytes or even individual bits.

The idea of a "character" is that of an individually manipulable
primitive unit of text. The idea of "byte" is that of an individually
addressable unit of storage. From one point of view, it doesn't matter
what the two basic types would be called if and when this distinction is
made in the C language. However, in X3J11 practically everything that
now refers to (char) arrays is designed principally for text application,
while practically everything that refers to arbitrary storage uses
(void *), not (char *). (The one exception is strcoll(), which
specifically produces a (char[]) result; Prosser and I discussed this
and agreed that this was acceptable for its intended use. In a good
implementation using my (char)/(short char) distinction, it would be
POSSIBLE to maintain a reasonable default collating sequence for (char)s
so that a kludge like strcoll() would not normally be necessary.)
Using (long char) for genuine text characters would conflict with
existing definitions for text-oriented functions, which is the main
reason I decided that (char) is STILL the proper type for text units.

I realize that many major vendors in the international UNIX market
have already adopted "solutions" to the problem of "international
character sets"; however, each has taken a different approach! There
is nothing in my proposal to preclude an implementor from continuing
to force sizeof(char)==sizeof(short char) and preserving his previous
vendor-specific "solution"; however, what I proposed ALLOWS an
implementor to choose a much cleaner solution if he so desires,
without forcing him to if he prefers other methods, and it also allows
nybble- or bit-addressable architectures to be nicely supported at the
C language level. The trade-off is between more compact storage (as
in AT&T's approach) requiring kludgery to handle individual textual
units, versus a clean, simple model of characters and storage cells
that supports uncomplicated, straightforward programming.

It happens that the text/binary stream distinction of X3J11 fits the
corresponding character/byte distinction very nicely. The only wart
is for systems like UNIX that allow mixing of text-stream operations,
such as scanf(), with binary-stream operations, such as fread(); there
is a potential alignment problem in doing this. (By the way, I also
propose new functions [f]getsc()/[f]putsc() for getting/putting single
(short char)s; this is necessary for the semantic definition of
fread()/fwrite() on binary streams. In my original proposal these
were called [f]getbyte()/[f]putbyte(), but the new names are better.)

ANY C implementation that makes a real distinction between characters
and bytes is going to cause problems for people porting their code
to it. The choices are, first, whether to ever make such a distinction,
and second, if so, how to do so. I believe the distinction is
important, and much prefer a clean solution over one that requires
programmers to convert text data arrays back and forth, or to keep
track of two sets of otherwise identical library functions. As with
function prototypes, a transition period can exist during which (char)
and (short char) have the same size, which is no worse than the current
situation, and implementors could choose when if ever to split these
types apart.

Please note that there is not much impact of my proposal on current
good C coding practice; for example, the following continue to work
no matter what choices the C implementor has made:

struct foo bar[SIZE], barcpy;
unsigned nelements = sizeof bar / sizeof bar[0];
fread( bar, sizeof(struct foo), SIZE, fp );
fread( bar, sizeof bar, 1, fp );
memcpy( &barcpy, &bar[3], sizeof(struct foo) );
/* the above requires casting anyway if prototype not in scope */

char str[] = "text";
printf( "\"%s\" contains %d characters\n", str, strlen( str ) );

While it is POSSIBLE to run into problems, such as in using the
result of strlen() as the length of a memcpy() operation, these
don't arise so often that it is hopeless to make the transition.
One thing for sure, if we don't make the character/byte distinction
POSSIBLE in the formal ANSI C standard, it will be too late to do
it later. The absolute minimum necessary is to remove the
requirement that sizeof(char)==1 from the standard, although this
opens up a hole in the spec that needs plugging by a proposal like
mine (X3J11/86-136, revised to fit the latest draft proposed standard
and to change the names of the primitive byte get/put functions).

Doug Gwyn

unread,
Nov 6, 1986, 9:37:23 PM11/6/86
to
In article <12...@ttrdc.UUCP> le...@ttrdc.UUCP (Daniel R. Levy) writes:
>A question: what about the jillions of C programs out there which
>declare "char *malloc()"? Will they all need to be changed? Common
>sense says no, since malloc() is supposed to return a "maximally aligned"
>address anyhow, so as far as anyone cares it could be declared float * or
>double * or short int * or (anything else)* if malloc() in the malloc() code
>itself were declared the same way. So if "char" happened to be a two byte
>quantity, no sweat, right? Or was there any particular reason for declaring
>malloc() to be a "char *"? And thus, might something break in malloc() or
>the usage thereof if char might no longer be the smallest addressable quantity?

X3J11 malloc() returns type (void *) anyway, so this is already an issue
independently of the multi-byte (char) issue. The answer is, on most
machines the old (char *) declaration of malloc() will not result in
broken code under X3J11, but it is POSSIBLE that it would break under
some X3J11 implementations (one assumes that the implementer will take
pains to keep this from happening if at all possible).

Under the multi-byte (char) proposal, malloc() still returns (void *)
and is not affected at all by the proposal. sizeof() still returns
the number of primitive storage cells occupied by a data object, which
is still the right information to feed malloc() as a parameter.

The X3J11 draft proposed standard as it now stands has actually managed
to enforce a rather clean distinction between (char) data and arbitrary
data. The additional changes to the draft to introduce a separate data
type for the smallest addressable storage unit are really very minor.

Doug Gwyn

unread,
Nov 6, 1986, 9:44:10 PM11/6/86
to
In article <1...@olamb.UUCP> ki...@olamb.UUCP (Kim Chr. Madsen) writes:
>Why not take the full step and let the datatype char be of variable size,
>like int's and other types. Then invent the datatype ``byte'' which is exactly
>8 bits long.

When fully elaborated to address the related issues, this idea differs
from what I have proposed in only two fundamental ways:
(1) no support for smallest addressable chunk sizes other than
8 bits;
(2) introduction of a new keyword, one likely to be in heavy
use in existing carefully-written C code.

Guy Harris

unread,
Nov 7, 1986, 7:21:56 PM11/7/86
to
> Guy missed the meaning of my reference to bitmap display programming.
> What I really care about in this context is support for direct bit
> addressing.

I am not at all convinced that anybody *should* care about this, at least
from the standpoint of bitmap display programming. If a vendor permits you
to bang bits on a display, they should provide you with routines to do this;
frame buffers are not all the same, and code that works well on one display
may not work well at all on another. Furthermore, some hardware may do some
bit-banging operations for you; if you approach the display at the right
level of abstraction, this can be done transparently, but not if you just
write into a bit array.

Furthermore, it's not clear that displays should be programmed at the
bit-array level anyway; James Gosling and David Rosenthal have made what I
consider a very good case against doing this (and no, I don't consider it a
good case just because I work at Sun and we're trying to push NeWS).

> I know for a fact that one reason we don't HAVE this on some current
> architectures is the lack of access to the facility from
> high-level languages.

If that is the case, then the architect made a mistake. If it's really
important, they can extend the language. Yes, this means a non-standard
extension; however, the only way to get it to be a standard extension is to
get *every* vendor to adopt it, regardless of whether they support bit
addressing or not. In the case of C, this means longer-than-32-bit "void *"
on lots of *existing* machines; I don't think the chances of this happening
are very good at all.

> I would like it to be POSSIBLE for some designer of an architecture
> likely to be used for bit-mapped systems to decide to make bits directly
> addressable.

It is ALREADY possible to do this. The architect merely has to avoid
thinking "if I can't get at this feature from unextended ANSI C, I shouldn't
put it in." The chances are very slim indeed that there will be a standard
way to do bit addressing in ANSI C, since this would require ANSI C to
mandate that all implementations support it, and would require ANSI C to be
rather more different from current C implementations that most vendors would
like.

> The idea of a "character" is that of an individually manipulable
> primitive unit of text.

As I've already pointed out, it is quite possible that there may be more
than one such notion on a system.

> However, in X3J11 practically everything that now refers to (char)
> arrays is designed principally for text application, while practically
> everything that refers to arbitrary storage uses (void *), not (char *).

However, you're now introducing a *third* type; when you are dealing with
arbitrary storage, sometimes you use "void *" as a pointer to arbitrary
storage and sometimes you use "short char" as an element of arbitrary
storage.

> In a good implementation using my (char)/(short char) distinction, it
> would be POSSIBLE to maintain a reasonable default collating sequence
> for (char)s so that a kludge like strcoll() would not normally be
> necessary.)

This is simply not true, unless the "normally" here is being used as an
escape clause to dismiss many natural languages as abnormal. Some languages
do *not* sort words with a character-by-character comparison (e.g., German).
One *might* give ligatures like "SS" "char" codes of their own - but you'd
have to deal with existing documents with two "S"es in them, and you'd
either have to convert them "on the fly" in standard I/O (in which case
you'd have to have standard I/O know what language the file was in) or
convert them *en bloc* when you brought the document over from a system with
8-bit "char"s. (Oh, yes, you'd still have to have standard I/O handle 8-bit
and 16-bit "char"s, and conversion between them, unless you propose to make
this new whizzy machine require text file conversion when you bring files
from or send files to machines with boring obsolete old 8-bit "char"s.)

Furthermore, I don't know how you sort words in Oriental languages, although
I remember people saying there *is* no unique way of sorting them.

> Using (long char) for genuine text characters would conflict with
> existing definitions for text-oriented functions, which is the main
> reason I decided that (char) is STILL the proper type for text units.

If you're going to internationalize an existing program, changing it to use
"lstrcpy" instead of "strcpy" is the least of your worries. I see no
problem whatsoever with having the existing text-oriented functions handle
8-bit "char"s. Furthermore, since not every implementation that supports
large character sets is going to adopt 16-bit "char"s, you're going to need
two sets of text-oriented functions in the specification anyway.

> The trade-off is between more compact storage (as in AT&T's approach)
> requiring kludgery to handle individual textual units, versus a clean,
> simple model of characters and storage cells that supports uncomplicated,
> straightforward programming.

What is this "kludgery"? You need two classes of string manipulation
routines. Big Deal. You need to convert some encoded representation in a
file to a 16-bit-character representation when you read the file, and
convert it back when you write it back. Big Deal. This would presumably be
handled by library routines. If you're going to read existing text files
without requireing them to be blessed by a conversion utility, you'll have
to do that in your scheme as well. You need to remember to properly declare
"char" and "long char" variables, and arrays and pointers to same. Big Deal.

I am not convinced that the "char"/"long char" scheme is significantly less
"clean", "simple", "uncomplicated", or "straightforward" than the "short
char"/"char" scheme.

> While it is POSSIBLE to run into problems, such as in using the
> result of strlen() as the length of a memcpy() operation, these
> don't arise so often that it is hopeless to make the transition.

Sigh. No, it isn't necessarily HOPELESS; however, you have not provided ANY
evidence that the various problems caused by changing the meaning of "char"
would be preferable to any disruption to the "clean" models caused by adding
"long char". (Frankly, I'd rather keep track of two types of string copy
routines and character types than keep track of all the *existing* code that
would have to have "char"s changed to "short char".)

Doug Gwyn

unread,
Nov 8, 1986, 9:02:04 AM11/8/86
to
Guy is still missing my point about bitmap display programming;
I have NOT been arguing for a GUARANTEED PORTABLE way to handle
individual bits, but rather for the ability to do so directly
in real C on specific machines/implementations WITH THE FACILITY:
typedef short char Pixel; /* one bit for B&W displays */
/* fancy color frame buffers wouldn't
use (short char) for this, but an
inexpensive "home" model might */
typedef struct
{
short x, y;
} Point;
typedef struct
{
Point origin, corner;
} Rectangle;
typedef struct
{
Pixel *base; /* NOT (Word *) */
unsigned width; /* in Bits, not Words */
Rectangle rect;
/* obscured-layer chain really goes here */
} Bitmap; /* does this look familiar? */
Direct use of Pixel pointers/arrays tremendously simplifies coding for
such applications as "dmdp", where one has to pick up typically six
bits at a time from a rectangle for each printer byte being assembled
(sometimes none of the six bits are in the same "word", no matter how
bits may have been clumped into words by the architect).

Now, MC68000 and WE32000 architectures do not support this (except for
(short char)s that are multi-bit pixels). But I definitely want the
next generation of desktop processors to support bit addressing. I am
fully aware that programming at this level of detail is non-portable,
but portable graphics programming SUCKS, particularly at the interactive
human interface level. Programmers who try that are doing their users
a disservice. I say this from the perspective of one who is considered
almost obsessively concerned with software portability and who has been
the chief designer of spiffy commercial graphic systems (and who
currently programs DMDs and REAL frame buffers, not Suns).

I'm well aware of the use of packed-bit access macros, thank you. That
is exactly what I want to get away from! The BIT is the basic unit of
information, not the "byte", and there is nothing particularly sacred
about the number 8, either. I agree that if you want to write PORTABLE
bit-accessing code, you'll have to use macros or functions, since SOME
machines/implementations will not directly support one-bit data objects.
That wasn't my concern.

Due to all the confusion, I'm recapitulating my proposal briefly:
ESSENTIAL:
(1) New type: (short char), signedness as for (char).
(2) sizeof(short char) == 1.
(3) sizeof(char) >= sizeof(short char).
(4) Clean up wording slightly to improve the
byte (storage cell) vs. character distinction.
RECOMMENDED:
(5) Fix character \-escapes so that larger numeric
values are permitted in character/string constants
on implementations where that is needed. The
current 9/12 bit limit is a botch anyway.
(6) Text streams read/write/seek (char)s, and
binary streams read/write/seek (short char)s.
This requires addition of fgetsc(), fputsc(),
which are routines I think most system programmers
have already invented under names like get_byte().
(7) Add `b' size modifier for fscanf().

I've previously pointed out that this has very little impact on most
existing code, although I do know of exceptions. (Actually, until the
code is ported to a sizeof(short char) != sizeof(char) environment,
it wouldn't break in this regard. That port is likely to be a painful
one in any case, since it would probably be to a multi-byte character
environment, and SOMEthing would have to be done anyway. The changes
necessary to accommodate this are generally fewer and simpler under my
proposal than under a (long char)/lstrcpy() approach.)

As to whether I think that mapping to/from 16-bit (char) would be done
by the I/O support system rather than the application code, my answer
is: Absolutely! That's where it belongs. (AT&T has said this too,
on at least one occasion, taking it even so far as to suggest that the
device driver should be doing this. I assume they meant a STREAMS
module.)

I won't bother responding in detail on other points, such as use of
reasonable default "DP shop" collating sequences analogous to ASCII
without having to pack/unpack multi-byte strings. (Yes, it's true
that machine collating sequence isn't always appropriate -- but does
that mean that one never encounters computer output that IS ordered by
internal collating sequence? Also note that strcoll() amounts to a
declaration that there IS a natural multibyte collating sequence for
any single environment.) Instead I will simply assure you that I
have indeed thought about all those things (and more), have read the
literature, have talked with people working on internationalization,
and have even been in internationalization working groups. I spent the
seven hours driving back from the Raleigh X3J11 meeting analyzing why
people were finding these issues so complex, and discovered that much
of it was due to the unquestioned assumption that "16-bit" text had to
be considered as made of individual 8-bit (char)s. If one starts to
write out a BNF grammar for what text IS, it becomes obvious very
quickly that that is an unnatural constraint. Before glibly dismissing
this as not well thought out, give it a genuine try and see what it is
like for actual programming; then try ANY alternative approach and see
how IT works in practice.

If you prefer, don't consider my proposal as a panacea for such issues,
but rather as a simple extension that permits some implementers to
choose comparatively straightforward solutions while leaving all others
no worse off than before (proof: if one were to decide to make
sizeof(char) == sizeof(short char), that is precisely where we are now.)
What I DON'T want to see is a klutzy solution FORCED on all implementers,
which is what standardizing a bunch of simultaneous (long char) and (char)
string routines (lstrcpy(), etc.) would amount to. If vendors think it
is necessary to take the (long char) approach, the door is still open
for them to do so under my proposal (without X3J11's blessing), but
vendors who really don't care about 16-bit chars (yes, there are vendors
like that!) are not forced to provide that extra baggage in their
libraries and documentation.

The fact that more future CPU architectures may support tiny data types
directly in standard C than at present is an extra benefit from my
approach to the "multi-byte character" problem; it wasn't my original
motivation, but I'm happy that it turned out that way. (You can bet
that (short char) would be heavily used for Boolean arrays, for example,
if my proposal makes it into the standard; device-specific bitmap
display programming is by no means the only application that could
benefit from availability of a shorter type. I've seen many people
#define TINY for nybble-sized quantities, usually having to use a
larger size (e.g., (char)) than they really wanted.)

From the resistance he's been putting up, I doubt that I will convert
Guy to my point of view, and I'm fairly sure that many people who have
already settled on some strategy to address the multi-byte character
issue are not eager to back out the work they've already put into it.
However, since I've shown that a clean conceptual model for such text
IS workable, there's no excuse for continued claims that explicit
byte-packing and unpacking is the only way to go.

0 new messages