Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ctype.h and sizeof(int) == sizeof(char)

20 views
Skip to first unread message

Peter Holzer

unread,
Jul 18, 1994, 4:38:05 PM7/18/94
to
The `what a character thread' brought me to a new problem:

How do you use the functions in ctype.h in a strictly conforming
program?

Their argument is an int, and the value is restricted to the values
representable in an unsigned char and EOF.

If sizeof (char) < sizeof (int), this is fine, since all values
representable in an unsigned char are also representable in an int. But
if sizeof (char) == sizeof (int), half of the possible values in an
unsigned char cannot be represented in an int, and the conversion might
lead to an overflow.

Do we have to do something like

unsigned char uc;
if (uc <= INT_MAX) && isprint (uc))

to determine whether uc is printable? In this case printable characters
above INT_MAX could not be detected.

hp
--
_ | h...@vmars.tuwien.ac.at | Peter Holzer | TU Vienna | CS/Real-Time Systems
|_|_) |------------------------------------------------------------------------
| | | It's not what we don't know that gets us into trouble, it's
__/ | what we know that ain't so. -- Will Rogers

Norman Diamond

unread,
Jul 19, 1994, 12:55:11 AM7/19/94
to
In article <30ep7d$e...@news.tuwien.ac.at> h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>How do you use the functions in ctype.h in a strictly conforming program?
>Their argument is an int, and the value is restricted to the values
>representable in an unsigned char and EOF. [...] But if sizeof (char) ==

>sizeof (int), half of the possible values in an unsigned char cannot be
>represented in an int, and the conversion might lead to an overflow.

Not necessarily overflow, but explicitly undefined (page 103 lines 4 to 5
in ANSI Classic).

>Do we have to do something like
>unsigned char uc;
>if (uc <= INT_MAX) && isprint (uc))
>to determine whether uc is printable? In this case printable characters
>above INT_MAX could not be detected.

Half-yes (yes you have to do it but...) and yes. I'd say the need for
a defect report is now "unambiguously" clear.
--
<< If this were the company's opinion, I would not be allowed to post it. >>
A program in conformance will not tend to stay in conformance, because even if
it doesn't change, the standard will. Force = program size * destruction.
Every technical corrigendum is met by an equally troublesome new defect report.

Nick Hounsome

unread,
Jul 21, 1994, 6:05:33 AM7/21/94
to
In article <30ep7d$e...@news.tuwien.ac.at>
h...@vmars.tuwien.ac.at "Peter Holzer" writes:

>The `what a character thread' brought me to a new problem:
>
>How do you use the functions in ctype.h in a strictly conforming
>program?
>
>Their argument is an int, and the value is restricted to the values
>representable in an unsigned char and EOF.
>
>If sizeof (char) < sizeof (int), this is fine, since all values
>representable in an unsigned char are also representable in an int. But
>if sizeof (char) == sizeof (int), half of the possible values in an
>unsigned char cannot be represented in an int, and the conversion might
>lead to an overflow.

sizeof(char) ==sizeof(int) does not imply that every int value is also
a char value.

Consider a machine that used ASCII characters but stored them in 16 bits
for performance reasons- there are still only 127 characters.

Note that you should not usualy make assumptions about complements -
i.e. don't assume that just because it isn't a control char it must
be a printable char (or any char at all)


>
>Do we have to do something like
>
>unsigned char uc;
>if (uc <= INT_MAX) && isprint (uc))
>
>to determine whether uc is printable? In this case printable characters
>above INT_MAX could not be detected.
>
> hp
>--
> _ | h...@vmars.tuwien.ac.at | Peter Holzer | TU Vienna | CS/Real-Time Systems
>|_|_) |------------------------------------------------------------------------
>| | | It's not what we don't know that gets us into trouble, it's
>__/ | what we know that ain't so. -- Will Rogers
>

--
Nick Hounsome

Peter Holzer

unread,
Jul 21, 1994, 3:14:22 PM7/21/94
to
nhou...@wslint.demon.co.uk (Nick Hounsome) writes:

>In article <30ep7d$e...@news.tuwien.ac.at>
> h...@vmars.tuwien.ac.at "Peter Holzer" writes:

>>The `what a character thread' brought me to a new problem:
>>
>>How do you use the functions in ctype.h in a strictly conforming
>>program?
>>
>>Their argument is an int, and the value is restricted to the values
>>representable in an unsigned char and EOF.
>>
>>If sizeof (char) < sizeof (int), this is fine, since all values
>>representable in an unsigned char are also representable in an int. But
>>if sizeof (char) == sizeof (int), half of the possible values in an
>>unsigned char cannot be represented in an int, and the conversion might
>>lead to an overflow.

>sizeof(char) ==sizeof(int) does not imply that every int value is also
>a char value.

>Consider a machine that used ASCII characters but stored them in 16 bits
>for performance reasons- there are still only 127 characters.

127 useful characters. There is nothing which prevents me from storing
60000 in a 16 bit unsigned char. If I want to check whether 60000 is a
printable character I have to convert it to int. The result of this
conversion is implementation-defined, so in order to avoid wrong
results I have to avoid it.

Now consider an implementation which does use 16-bit characters and
has more than 32768 printable characters (such as Unicode). If such a
system has 16-bit ints, there is no strictly conforming way (i.e., one
which does not rely on undefined or implementation-defined behaviour) to
identify all printable characters. Since I cannot find anything in the
standard, which would forbid such an implementation, there seems to be
no way to reliably test whether a char is printable or not.

>Note that you should not usualy make assumptions about complements -
>i.e. don't assume that just because it isn't a control char it must
>be a printable char (or any char at all)

Where have I said anything about control characters?

Karl Heuer

unread,
Jul 22, 1994, 3:04:15 PM7/22/94
to
In article <30ep7d$e...@news.tuwien.ac.at>
h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>If sizeof (char) < sizeof (int), this is fine... [else trouble]

I have long held the opinion that the Standard implicitly requires that
sizeof(char) < sizeof(int), precisely because of such cases where the
bastard type "unsigned char or maybe EOF" is used. I'm astounded that
this seems to be a minority opinion; even if the Committee {did,will}
rule that sizeof(char)==sizeof(int) is legal, I maintain that no sane
vendor will make such an implementation, because it would break one of
the fundamental idioms of the language, namely
int c;
while ((c=getc(fp)) != EOF) ...;
when reading from a binary file that might contain a value that resembles
EOF. (Do you *really* expect everybody to convert such programs to test
feof(fp) instead of comparing with EOF? Have you rewritten any of *your*
code to do so?)

I wish the Committee would just issue a correction that asserts that char
must be smaller than int. As for Chinese character sets, well, that's
what wchar_t was created for.

Karl W. Z. Heuer (ka...@kelp.boston.ma.us), The Walking Lint

Ajoy KT

unread,
Jul 22, 1994, 4:07:00 PM7/22/94
to
In article <774785...@wslint.demon.co.uk>, nhou...@wslint.demon.co.uk writes...

>In article <30ep7d$e...@news.tuwien.ac.at>
> h...@vmars.tuwien.ac.at "Peter Holzer" writes:

>>The `what a character thread' brought me to a new problem:
>>How do you use the functions in ctype.h in a strictly conforming
>>program?
>>
>>Their argument is an int, and the value is restricted to the values
>>representable in an unsigned char and EOF.

>>If sizeof (char) < sizeof (int), this is fine, since all values
>>representable in an unsigned char are also representable in an int. But
>>if sizeof (char) == sizeof (int), half of the possible values in an
>>unsigned char cannot be represented in an int, and the conversion might
>>lead to an overflow.

True. But any member of the character set (not necessarily of the
basic set) has to be representable as an int (otherwise the very presence
of the character constant causes undefined behaviour). Even though the
standard is slippery on this issue, we can take it that a DR response would
essentially have to accept this. This means that, as long as the unsigned
char variable holds only character constants, the code works. If you
put arbitrary integers into it, anyway you have problems.

The real problem is that the ctype functions don't accept -ve
ints (the argument should be representable as an *unsigned* char). This
means that, on a machine with default char a signed type, and with some
non-basic character set member -ve, ctype functions are going to be hard
to use.

------
/* "A conclusion is the place where you get tired of thinking".
"I won't post again on this thread" */

/* Ajoy Krishnan T,
Senior S/W Engr., Hughes Software Systems,
New Delhi - 19, India.
(ajoyk%h...@lando.hns.com)*/

Michael Meissner

unread,
Jul 24, 1994, 10:11:51 PM7/24/94
to
In article <22JUL199...@lando.hns.com> t_akr...@bart.hns.com (Ajoy KT)
writes:

| The real problem is that the ctype functions don't accept -ve
| ints (the argument should be representable as an *unsigned* char). This
| means that, on a machine with default char a signed type, and with some
| non-basic character set member -ve, ctype functions are going to be hard
| to use.

During the standardization process, Bill Plauger called the three character
types as "unsigned char, signed char, and don't char".

Anyways, it is not that hard of a problem -- you never EVER use 'char'. For
stuff that holds real live text, you use 'unsigned char'. For example, as part
of the I18N coding standards wtihin OSF, we have that rule (in fact, in the
early days when we fixing^H^H^H^H^H^Henhacing the code, somebody did a global
replace of 'char' -> 'uchar_t', on some files and changed the word 'character'
in some comments :-)
--
Michael Meissner email: meis...@osf.org phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Old hackers never die, their bugs just increase.

Karl Heuer

unread,
Jul 25, 1994, 12:59:29 AM7/25/94
to
In article <MEISSNER.94...@pasta.osf.org>

meis...@osf.org (Michael Meissner) writes:
>Anyways, it is not that hard of a problem -- you never EVER use 'char'. For
>stuff that holds real live text, you use 'unsigned char'. For example, as
>part of the I18N coding standards wtihin OSF, we have that rule...

I remember this from my contract stint at OSF. It's a stupid rule.

Most of the Standard library (e.g. strlen) uses plain char. I believe the
workaround was to use a macro, `#define Strlen(s) strlen((char *)s)'. Now
you've defeated the prototype: if you accidentally pass an int instead of a
string, the compiler will happily accept it. I think this increases the risk
rather than decreasing it.

A much cleaner solution is to *always* use plain char for real text. If
char signedness problems are a significant risk, then make a rule that you
can't use getc() or any of the <ctype.h> functions directly. It's simple
to replace them with appropriate functions that deal with plain char and
exclude EOF. (Also you have to agree not to use text characters as if they
were small integers, but that ought to be a rule anyway.)

Karl W. Z. Heuer (ka...@kelp.boston.ma.us), The Walking Lint

(Actually, I recall noticing that the "don't use char" rule is *not* in the
actual OSF coding document. Perhaps there's hope.)

Norman Diamond

unread,
Jul 25, 1994, 2:24:05 AM7/25/94
to
In article <MEISSNER.94...@pasta.osf.org> meis...@osf.org (Michael Meissner) writes:
>Anyways, it is not that hard of a problem -- you never EVER use 'char'. For
>stuff that holds real live text, you use 'unsigned char'.

And cast all your arguments to strcmp() etc.
Even if you don't char, your pointers do. :-(

Christopher R. Volpe

unread,
Jul 25, 1994, 9:04:31 AM7/25/94
to
In article <KARL.94Ju...@ursa-major.spdcc.com>, ka...@kelp.boston.ma.us (Karl Heuer) writes:
>In article <30ep7d$e...@news.tuwien.ac.at>
>h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>>If sizeof (char) < sizeof (int), this is fine... [else trouble]
>
>I have long held the opinion that the Standard implicitly requires that
>sizeof(char) < sizeof(int), precisely because of such cases where the
>bastard type "unsigned char or maybe EOF" is used. I'm astounded that
>this seems to be a minority opinion; even if the Committee {did,will}

I have to side with the majority on this one, then, despite my reluctance to
disagree with the Walking Lint on anything.

>rule that sizeof(char)==sizeof(int) is legal, I maintain that no sane
>vendor will make such an implementation, because it would break one of

They do exist. TI's compiler for their C30 DSP has 32 bit ints and chars.

>the fundamental idioms of the language, namely
> int c;
> while ((c=getc(fp)) != EOF) ...;
>when reading from a binary file that might contain a value that resembles
>EOF. (Do you *really* expect everybody to convert such programs to test

This is not necessarily broken at all. Just because the smallest unit that the
CPU can address is 32 bits doesn't mean that files have to be read in 32 bit
chunks, does it? If external devices read octets, and getc either returns
1) a char with the high 24 bits zero
or 2) a char with all bits set to 1 (EOF)

all the rules are preserved, no?

>feof(fp) instead of comparing with EOF? Have you rewritten any of *your*
>code to do so?)

No need.

>
>I wish the Committee would just issue a correction that asserts that char
>must be smaller than int. As for Chinese character sets, well, that's
>what wchar_t was created for.

Do you really want to make it impossible to implement C on architectures such
as the C30? How about if everything is 32 bits, but CHAR_MIN and CHAR_MAX are
set to -128 and 127, respectively. Would that be enough?

>
>Karl W. Z. Heuer (ka...@kelp.boston.ma.us), The Walking Lint

--

Chris Volpe Phone: (518) 387-7766 (Dial Comm 8*833
GE Corporate R&D Fax: (518) 387-6560
PO Box 8, Schenectady, NY 12301 Email: vol...@crd.ge.com

Philip Homburg

unread,
Jul 25, 1994, 8:43:45 PM7/25/94
to
In article <CtHyz...@crdnns.crd.ge.com> vo...@ausable.crd.ge.com writes:
%This is not necessarily broken at all. Just because the smallest unit that the
%CPU can address is 32 bits doesn't mean that files have to be read in 32 bit
%chunks, does it? If external devices read octets, and getc either returns
% 1) a char with the high 24 bits zero
%or 2) a char with all bits set to 1 (EOF)
%
%all the rules are preserved, no?
%
%>feof(fp) instead of comparing with EOF? Have you rewritten any of *your*
%>code to do so?)

What about fread/fwrite?

I would hope that

char ch= 'A'; fwrite (&ch, sizeof(ch), 1, stream);

writes 32 bits (and the corresponding fread, read 32 bits). Does the
standard really make such a big distinction between reading bytes or
reading characters? If the fwrite only writes 1 byte, then writing a
replacement for something like:

int i= 65; fwrite(&i, sizeof(i), 1, stream);

would be interesting.

Philip Homburg

Alan Watson

unread,
Jul 25, 1994, 11:23:16 PM7/25/94
to
In article <CtIvC...@cs.vu.nl>

phi...@cs.vu.nl (Philip Homburg) wrote:
>In article <CtHyz...@crdnns.crd.ge.com> vo...@ausable.crd.ge.com writes:
>%This is not necessarily broken at all. Just because the smallest unit that the
>%CPU can address is 32 bits doesn't mean that files have to be read in 32 bit
>%chunks, does it? [...]

>
>What about fread/fwrite? I would hope that
>
>char ch= 'A'; fwrite (&ch, sizeof(ch), 1, stream);
>
>writes 32 bits (and the corresponding fread, read 32 bits).

If stream was opened in binary mode, I would expect 32 bits to be
written; if it was opened in text mode, I might expect 8 bits to be
written.

--
Alan Watson | C is not a big language, and it is not
al...@oldp.astro.wisc.edu | well served by a big book.
Department of Astronomy | -- Kernighan & Ritchie
University of Wisconsin -- Madison |

Karl Heuer

unread,
Jul 26, 1994, 12:10:58 AM7/26/94
to
In article <CtHyz...@crdnns.crd.ge.com>
vo...@bart.crd.ge.com (Christopher R. Volpe) writes:
>ka...@kelp.boston.ma.us (Karl Heuer) writes:
>>I maintain that no sane vendor will make such [char==int] implementation

>
>They do exist. TI's compiler for their C30 DSP has 32 bit ints and chars.

Interesting. Is UCHAR_MAX 255 or 4294967295 in that implementation?

>>[it would break the fundamental getc() idiom]
>
>[What if non-EOF getc() always returns a char with the high 24 bits zero?]

It's shaky. Data written to a binary stream can be read back in and shall
compare equal; so if fgetc() returns 8-bit values, fputc() must only accept
8-bit values. Moreover, fwrite() exists, and all output takes place as if
characters were written by successive calls to fputc(). In order for this
to work, I think you have to assume that the way in which fwrite() maps to
fputc() is unspecified; in particular, that fwrite() of a single character
is permitted to map into four calls to fputc(). Perhaps a DR should be
submitted on this issue.

>>I wish the Committee would just issue a correction that asserts that char
>>must be smaller than int.
>

>Do you really want to make it impossible to implement C on architectures such
>as the C30?

I've used implementations where the architecture is not octet-addressible.
The customary way to handle it is to implement word pointers in the "obvious"
way, and implement char pointers with extra bits at one end or the other.

>How about if everything is 32 bits, but CHAR_MIN and CHAR_MAX are
>set to -128 and 127, respectively. Would that be enough?

I have reason to doubt that this would be conforming.

Christopher R. Volpe

unread,
Jul 26, 1994, 9:32:39 AM7/26/94
to
In article <CtIvC...@cs.vu.nl>, phi...@cs.vu.nl (Philip Homburg) writes:
>In article <CtHyz...@crdnns.crd.ge.com> vo...@ausable.crd.ge.com writes:
>%This is not necessarily broken at all. Just because the smallest unit that the
>%CPU can address is 32 bits doesn't mean that files have to be read in 32 bit
>%chunks, does it? If external devices read octets, and getc either returns
>% 1) a char with the high 24 bits zero
>%or 2) a char with all bits set to 1 (EOF)
>%
>%all the rules are preserved, no?
>%
>%>feof(fp) instead of comparing with EOF? Have you rewritten any of *your*
>%>code to do so?)
>
>What about fread/fwrite?
>
>I would hope that
>
>char ch= 'A'; fwrite (&ch, sizeof(ch), 1, stream);
>
>writes 32 bits (and the corresponding fread, read 32 bits). Does the

I would imagine that the number of bits written to the physical device is
irrelevant, and that all that matters is that your 'A' gets there in some
recoverable way.

>standard really make such a big distinction between reading bytes or

As far as I know, the Standard makes no distinction. "Byte" and "character"
are synonyms.

>reading characters? If the fwrite only writes 1 byte, then writing a

The fwrite writes only 1 byte from the C program point of view, regardless of
the implementation, because you specified sizeof(ch) and 1. Whether it writes
out an octet or four octets is not relevant, as long as a corresponding fread
results in an 'A' in ch again.

>replacement for something like:
>
>int i= 65; fwrite(&i, sizeof(i), 1, stream);
>
>would be interesting.

Hmmm. This is starting to get tricky. Since it can't tell the difference
between an int and a char, fwrite probably must write out all the bits.
Perhaps it is impossible to implement stdio on such a machine. It's a good
thing TI's compiler for the C30 is only a freestanding implementation :-)

Christopher R. Volpe

unread,
Jul 26, 1994, 9:55:27 AM7/26/94
to
>In article <CtHyz...@crdnns.crd.ge.com>
>vo...@bart.crd.ge.com (Christopher R. Volpe) writes:
>>ka...@kelp.boston.ma.us (Karl Heuer) writes:
>>>I maintain that no sane vendor will make such [char==int] implementation
>>
>>They do exist. TI's compiler for their C30 DSP has 32 bit ints and chars.

BTW, I abbreviated the name of the chip above because I didn't know the
full name. It's the Texas Instruments TMS320C30.

>
>Interesting. Is UCHAR_MAX 255 or 4294967295 in that implementation?

The latter.

>
>>>[it would break the fundamental getc() idiom]
>>
>>[What if non-EOF getc() always returns a char with the high 24 bits zero?]
>
>It's shaky. Data written to a binary stream can be read back in and shall
>compare equal; so if fgetc() returns 8-bit values, fputc() must only accept
>8-bit values. Moreover, fwrite() exists, and all output takes place as if
>characters were written by successive calls to fputc(). In order for this
>to work, I think you have to assume that the way in which fwrite() maps to
>fputc() is unspecified; in particular, that fwrite() of a single character
>is permitted to map into four calls to fputc(). Perhaps a DR should be
>submitted on this issue.

Yeah, I agree that it's shaky. But the compiler in question is freestanding,
so they avoided the issue.

>
>>>I wish the Committee would just issue a correction that asserts that char
>>>must be smaller than int.
>>
>>Do you really want to make it impossible to implement C on architectures such
>>as the C30?
>
>I've used implementations where the architecture is not octet-addressible.
>The customary way to handle it is to implement word pointers in the "obvious"
>way, and implement char pointers with extra bits at one end or the other.

So the compiler still makes sizeof(int)==4 then?

Peter Holzer

unread,
Jul 26, 1994, 2:45:32 PM7/26/94
to
ka...@kelp.boston.ma.us (Karl Heuer) writes:

>In article <30ep7d$e...@news.tuwien.ac.at>
>h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>>If sizeof (char) < sizeof (int), this is fine... [else trouble]

>I have long held the opinion that the Standard implicitly requires that
>sizeof(char) < sizeof(int), precisely because of such cases where the
>bastard type "unsigned char or maybe EOF" is used.

As a C user I share your opinion. I think that a hosted implementation
where sizeof (char) == sizeof (int) would break too many programs to be
useful.

As language lawyer, however, I cannot find any guarantee in the
standard that all values of "unsigned char + EOF" can be represented in
an int. Some responses to DRs (struct hack, Ux_MAX == x_MAX, ...) seem
to indicate that the committee seems to care more about bizarre
architectures than about existing code these days. Therefore I fear
that a DR on that matter would not be answered the way you (and I)
hope.

I will continue to use
while ((c = getc ()) != EOF)
or
isprint ((unsigned char)s[i])
however, because I don't expect to run into an implementation where
they will break.

Peter Holzer

unread,
Jul 26, 1994, 3:01:15 PM7/26/94
to
t_akr...@bart.hns.com (Ajoy KT) writes:

>In article <774785...@wslint.demon.co.uk>, nhou...@wslint.demon.co.uk writes...
>>In article <30ep7d$e...@news.tuwien.ac.at>
>> h...@vmars.tuwien.ac.at "Peter Holzer" writes:

>>>The `what a character thread' brought me to a new problem:
>>>How do you use the functions in ctype.h in a strictly conforming
>>>program?
>>>
>>>Their argument is an int, and the value is restricted to the values
>>>representable in an unsigned char and EOF.

>>>If sizeof (char) < sizeof (int), this is fine, since all values
>>>representable in an unsigned char are also representable in an int. But
>>>if sizeof (char) == sizeof (int), half of the possible values in an
>>>unsigned char cannot be represented in an int, and the conversion might
>>>lead to an overflow.

> True. But any member of the character set (not necessarily of the
>basic set) has to be representable as an int (otherwise the very presence
>of the character constant causes undefined behaviour).

This is true but irrelevant. An unsigned char may still contain values
not representable in an int. This means that only values which can be
represented both in an int and in an unsigned char (and EOF) can be
passed to the ctype functions. On systems where INT_MAX >= UCHAR_MAX
(to avoid problems with spare bits :-) this includes the whole
charset. On other systems a strictly conforming program cannot decide
whether a character is printable or not for all characters.

> The real problem is that the ctype functions don't accept -ve ints
>(the argument should be representable as an *unsigned* char). This
>means that, on a machine with default char a signed type, and with some
>non-basic character set member -ve, ctype functions are going to be
>hard to use.

Such implementations are common. In fact on all compilers I use char is
signed by default, but there are useful characters with the 8th bit
set. I already had to fix software which did things like

char *p;
[...]
while (isspace (*p)) p ++;

Karl Heuer

unread,
Jul 26, 1994, 3:12:16 PM7/26/94
to
In article <CtJw0...@crdnns.crd.ge.com>

vo...@bart.crd.ge.com (Christopher R. Volpe) writes:
>ka...@kelp.boston.ma.us (Karl Heuer) writes:
>>I've used implementations where the architecture is not octet-addressible.
>>The customary way to handle it is to implement word pointers in the "obvious"
>>way, and implement char pointers with extra bits at one end or the other.
>
>So the compiler still makes sizeof(int)==4 then?

Yes, for 32-bit ints. (I just realized that at least some Cray architectures
are also word-addressible, with 64-bit words, but they likewise use the
"software byte pointer" trick; on these machines sizeof(int)==8. Strangely
enough, they do *not* do the corresponding hack for non-byte pointers, so
sizeof(short)==8 as well, despite USHRT_MAX being 65535. I have no actual
experience with this box, and probably won't be able to provide any details
not contained in this paragraph.)

Karl W. Z. heuer (ka...@kelp.boston.ma.us), The Walking Lint

Clive D.W. Feather

unread,
Jul 27, 1994, 4:46:48 AM7/27/94
to
In article <313lkc$d...@news.tuwien.ac.at>,

Peter Holzer <h...@vmars.tuwien.ac.at> wrote:
> Some responses to DRs (struct hack, Ux_MAX == x_MAX, ...) seem
> to indicate that the committee seems to care more about bizarre
> architectures than about existing code these days.

You imply that allowing Ux_MAX == x_MAX would break a significant amount
of existing code. Are you just talking about the issues of unsigned
characters being discussed at the moment, or something more ? Can you
suggest what code would be broken that isn't already assuming
twos-complement ?

--
Clive D.W. Feather | Santa Cruz Operation | If you lie to the compiler,
cl...@sco.com | Croxley Centre | it will get its revenge.
Phone: +44 923 816 344 | Hatters Lane, Watford | - Henry Spencer
Fax: +44 923 210 352 | WD1 8YN, United Kingdom |

Alan Watson

unread,
Jul 27, 1994, 2:09:06 PM7/27/94
to
In article <CtLC...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) wrote:
>In article <313lkc$d...@news.tuwien.ac.at>,
>Peter Holzer <h...@vmars.tuwien.ac.at> wrote:
>> Some responses to DRs (struct hack, Ux_MAX == x_MAX, ...) seem
>> to indicate that the committee seems to care more about bizarre
>> architectures than about existing code these days.
>
>You imply that allowing Ux_MAX == x_MAX would break a significant amount
>of existing code. Are you just talking about the issues of unsigned
>characters being discussed at the moment, or something more ? Can you
>suggest what code would be broken that isn't already assuming
>twos-complement ?

Any code that assumes that there is a unique and reversible mapping
between the int values {INT_MIN,..,INT_MAX} and the unsigned int values
{0,...,UINT_MAX}. For example, code that performs signed arithmetic
using unsigned types in an explicit two's complement notation to check
for overflow.

Also, unless all of the integral types have the same size (i.e., 1),
any code that assumes that there are sizeof (unsigned) * CHAR_BIT
`real' bits in an unsigned int (for example, an implementation of a
packed bit array) will have problems.

Let me turn the question on it's head: can you suggest a real
implementation that would be broken by the requirement that unsigned
types use all of their available bits?

--
Alan Watson | B can be thought of as C without types;
al...@oldp.astro.wisc.edu | more accurately, it is BCPL squeezed into
Department of Astronomy | 8K bytes of memory and filtered through
University of Wisconsin -- Madison | Thompson's brain. -- Dennis Ritchie

Clive D.W. Feather

unread,
Jul 28, 1994, 3:10:13 AM7/28/94
to
In article <1994Jul27.1...@sal.wisc.edu>,

Alan Watson <al...@sal.wisc.edu> wrote:
>In article <CtLC...@scone.london.sco.com>
>cl...@sco.com (Clive D.W. Feather) wrote:
>> You imply that allowing Ux_MAX == x_MAX would break a significant amount
>> of existing code. Are you just talking about the issues of unsigned
>> characters being discussed at the moment, or something more ? Can you
>> suggest what code would be broken that isn't already assuming
>> twos-complement ?
> Any code that assumes that there is a unique and reversible mapping
> between the int values {INT_MIN,..,INT_MAX} and the unsigned int values
> {0,...,UINT_MAX}. For example, code that performs signed arithmetic
> using unsigned types in an explicit two's complement notation to check
> for overflow.

Such code will already break on one's complement and sign-and-magnitude
systems. Or did you mean "... and a subset of the unsigned int values ..." ?

> any code that assumes that there are sizeof (unsigned) * CHAR_BIT
> `real' bits in an unsigned int (for example, an implementation of a
> packed bit array) will have problems.

That's true. Such code can be protected (though not fixed) fairly easily by
putting a test in of the form

#if 1 << (sizeof (unsigned) * CHAR_BIT - 1) >= UINT_MAX >> 1
#error Missing bits !!!
#endif

> Let me turn the question on it's head: can you suggest a real
> implementation that would be broken by the requirement that unsigned
> types use all of their available bits?

From WG14 internal email:

> For example, on some 24-bit machines the "unsigned long" consumes 48 bits
> of storage, yet there are only 47 bits of precision. The high order bit
> of the low order word is inaccessible via normal bit shifting operations.
> (Of course, the bits *could* be accessed via a "char *" and bit shift
> operations.)

Ajoy KT

unread,
Jul 28, 1994, 6:52:00 AM7/28/94
to
In article <313mhr$d...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...

>t_akr...@bart.hns.com (Ajoy KT) writes:
>>In article <774785...@wslint.demon.co.uk>, nhou...@wslint.demon.co.uk writes...
>>>In article <30ep7d$e...@news.tuwien.ac.at>
>>> h...@vmars.tuwien.ac.at "Peter Holzer" writes:
>>>How do you use the functions in ctype.h in a strictly conforming
>>>program?
>>>Their argument is an int, and the value is restricted to the values
>>>representable in an unsigned char and EOF.
>>>If sizeof (char) < sizeof (int), this is fine, since all values
>>>representable in an unsigned char are also representable in an int. But
>>>if sizeof (char) == sizeof (int), half of the possible values in an
>>>unsigned char cannot be represented in an int, and the conversion might
>>>lead to an overflow.

>> True. But any member of the character set (not necessarily of the
>>basic set) has to be representable as an int (otherwise the very presence
>>of the character constant causes undefined behaviour).

>This is true but irrelevant. An unsigned char may still contain values
>not representable in an int.

Well, my point is that the ctype functions can be used, without
extra checks, as long as an unsigned char variable contains only a member
of the character set. The problem you specify occurs only if arbitrary
integers are assigned to the unsigned char variable being tested.

Thus:

#include <stdio.h>
#include <ctype.h>

#define ANYCHAR /* Any character constant here */

int main(void)
{
unsigned char ch = ANYCHAR;

if (isprint(ch))
print("Yeah\n");
else
printf("Nai\n");
return 0;
}

is fully portable (I guess this would come under the category coined
by Chris Volpe - strongly conforming; its output is dependent on an
implementation-defined issue), irrespective of what character constant
ANYCHAR expands to.

> On other systems a strictly conforming program cannot decide
>whether a character is printable or not for all characters.

Huh. A strongly conforming program cannot decide whether an arbitrary
value in an unsigned char object is printable or not. But it can decide
whether a value in an unsigned char object is printable or not, provided
the value is that of a character set member.

>> The real problem is that the ctype functions don't accept -ve ints
>>(the argument should be representable as an *unsigned* char). This
>>means that, on a machine with default char a signed type, and with some
>>non-basic character set member -ve, ctype functions are going to be
>>hard to use.

>Such implementations are common.

Well, I don't see any way out. If they are common, hard luck for all
of us. The committee seems to be more intent on complicating issues (introducing
an unnecessary spare-bit concept, and re-interpeting the same representation
rule), rather than in cleaning them up.

-----

/* Ajoy Krishnan T,
Senior Software Engineer, Hughes Software Systems,

Peter Holzer

unread,
Jul 28, 1994, 2:05:17 PM7/28/94
to
cl...@sco.com (Clive D.W. Feather) writes:

>In article <1994Jul27.1...@sal.wisc.edu>,
>Alan Watson <al...@sal.wisc.edu> wrote:
>>In article <CtLC...@scone.london.sco.com>
>>cl...@sco.com (Clive D.W. Feather) wrote:
>>> You imply that allowing Ux_MAX == x_MAX would break a significant amount
>>> of existing code. Are you just talking about the issues of unsigned
>>> characters being discussed at the moment, or something more ?

No, I was thinking about converting signed integers to unsigned to do
bit-operations on them (for example to store them in a binary, but
implementation-independent (file) format) and about using sizeof to get
the number of bits in an unsigned integer. Both are IMHO quite common.

>>> Can you
>>> suggest what code would be broken that isn't already assuming
>>> twos-complement ?
>> Any code that assumes that there is a unique and reversible mapping
>> between the int values {INT_MIN,..,INT_MAX} and the unsigned int values
>> {0,...,UINT_MAX}.

>Such code will already break on one's complement and sign-and-magnitude


>systems. Or did you mean "... and a subset of the unsigned int values ..." ?

Of course I cannot speak for Alan, but I think he meant the latter.

>> any code that assumes that there are sizeof (unsigned) * CHAR_BIT
>> `real' bits in an unsigned int

>That's true. Such code can be protected (though not fixed) fairly easily by

>putting a test in of the form

>#if 1 << (sizeof (unsigned) * CHAR_BIT - 1) >= UINT_MAX >> 1
>#error Missing bits !!!
>#endif

No. You cannot use sizeof in a preprocessor directive.

>> Let me turn the question on it's head: can you suggest a real
>> implementation that would be broken by the requirement that unsigned
>> types use all of their available bits?

>From WG14 internal email:

>> For example, on some 24-bit machines the "unsigned long" consumes 48 bits
>> of storage, yet there are only 47 bits of precision. The high order bit
>> of the low order word is inaccessible via normal bit shifting operations.
>> (Of course, the bits *could* be accessed via a "char *" and bit shift
>> operations.)

If this implementation looks the way I imagine it, the signed long type
has only 47 bits (including sign bit) either. This doesn't help the
sizeof problem, of course.

Clive D.W. Feather

unread,
Jul 29, 1994, 2:02:44 AM7/29/94
to
In article <318s0t$s...@news.tuwien.ac.at>,

Peter Holzer <h...@vmars.tuwien.ac.at> wrote:
>cl...@sco.com (Clive D.W. Feather) writes:
>> #if 1 << (sizeof (unsigned) * CHAR_BIT - 1) >= UINT_MAX >> 1
>> #error Missing bits !!!
>> #endif
> No. You cannot use sizeof in a preprocessor directive.

Oops. And I knew that.
:-(

>>> For example, on some 24-bit machines the "unsigned long" consumes 48 bits
>>> of storage, yet there are only 47 bits of precision. The high order bit
>>> of the low order word is inaccessible via normal bit shifting operations.
>>> (Of course, the bits *could* be accessed via a "char *" and bit shift
>>> operations.)
> If this implementation looks the way I imagine it, the signed long type
> has only 47 bits (including sign bit) either.

I don't know, but that may well be the case. Nevertheless, the number of
bits is not the size of the type * CHAR_BIT.

Clive D.W. Feather

unread,
Jul 29, 1994, 2:16:49 AM7/29/94
to
In article <1994Jul28.1...@sal.wisc.edu>,
Alan Watson <al...@sal.wisc.edu> wrote:
>In article <Ctn2L...@scone.london.sco.com> cl...@sco.com (Clive D.W.
>Feather) graciously put his head above the parapet and wrote:

>>Alan Watson <al...@sal.wisc.edu> wrote:
>>> Any code that assumes that there is a unique and reversible mapping
>>> between the int values {INT_MIN,..,INT_MAX} and the unsigned int values
>>> {0,...,UINT_MAX}. For example, code that performs signed arithmetic
>>> using unsigned types in an explicit two's complement notation to check
>>> for overflow.

I've never done something like that, but, there again, I've never done
work that worries about overflow to that extent.

If you're going to do that, shouldn't you either:
- use floats to do the approximation (or is that too slow ?), or
- take specific account of signs (which you have to do for multiplication
and division anyway)
?

>>> any code that assumes that there are sizeof (unsigned) * CHAR_BIT
>>> `real' bits in an unsigned int (for example, an implementation of a
>>> packed bit array) will have problems.

Very true.

> I am puzzled that the committee would choose to break such a common and
> important idiom. I thought `existing code is important but existing
> [and by extension future] implementations are not'.

The committee do not "choose to break such a common and important idiom"
(something which has been claimed about the "struct hack" as well). In
fact, as an occasional member, I rather resent that comment.

The committee get asked questions about the exact meaning of the
wording. We/they look at the question, and the words of the Standard,
which has *already* been approved internationally, and attempt to
determine what the Standard says on the topic. Previous decisions,
memory of discussion when the Standard was written [from those members
who have the strength to *still* do this stuff !], and common practice
are all considered.

In this case, there are and were implementations that don't use all the
bits of an integer. The view appears to be [I wasn't at that meeting]
that those implementations were always legitimate, and therefore the
Standard doesn't require all bits to be used. So that disposes of the
packed bit array idiom. Unfortunate, but it *never* was guaranteed to
work.

Given that, there is no wording to suggest that UINT_MAX >= INT_MIN+INT_MAX,
but only that UINT_MAX >= INT_MAX. If you can find a reading to the
contrary, we would all like to know.

[47-bit system]
> Thanks. I'm glad I don't have to use or implement a C compiler on this
> architecture. Does a conforming implementation exist?

Yes.

> C has already made life a pain for certain restricted architectures
> (for example, the requirement that long have 32 bits is somewhat
> inconvenient for implementors of C on 4 bit machines).

But long wasn't expected to be as fast as int, and you *can* do multiple
precision without too much cost. OTOH, doing 48-bit arithmetic on that
47-bit system would be a real pain.

>> Some responses to DRs (struct hack, Ux_MAX == x_MAX, ...) seem
>> to indicate that the committee seems to care more about bizarre
>> architectures than about existing code these days.

> This now seems somewhat difficult to deny.

I deny it. The committee are interpreting a Standard that has been out
for almost 5 years. Every Technical Corrigendum is the result of
discussion and debate; some members still don't want to change a single
word [well, almost; AFAIK no-one objected to adding "successful" to
fsetpos].

Clive D.W. Feather

unread,
Jul 29, 1994, 5:52:15 AM7/29/94
to
In article <Ctous...@scone.london.sco.com>, I wrote:
> Given that, there is no wording to suggest that UINT_MAX >= INT_MIN+INT_MAX,
> but only that UINT_MAX >= INT_MAX. If you can find a reading to the
> contrary, we would all like to know.

I of course meant: UINT_MAX >= INT_MAX - INT_MIN.
Thanks to Jutta for pointing this out.

Peter Holzer

unread,
Jul 29, 1994, 9:26:16 AM7/29/94
to
t_akr...@bart.hns.com (Ajoy KT) writes:

>In article <313mhr$d...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...
>>t_akr...@bart.hns.com (Ajoy KT) writes:

>>> True. But any member of the character set (not necessarily of the
>>>basic set) has to be representable as an int (otherwise the very presence
>>>of the character constant causes undefined behaviour).

>>This is true but irrelevant. An unsigned char may still contain values
>>not representable in an int.

> Well, my point is that the ctype functions can be used, without
>extra checks, as long as an unsigned char variable contains only a member
>of the character set. The problem you specify occurs only if arbitrary
>integers are assigned to the unsigned char variable being tested.

I believe we have some difficulties understanding each other, so I will
use an example. Let us assume an implementation with 16 bit ints
and 16 bit chars. Then assume that the character constant '$' is
negative. Then the value of '$' cannot be represented in an unsigned
char, because that can only hold non-negative values. So, unless '$' ==
EOF, '$' is not a legal argument to isprint in such an implementation.


>>>This means that, on a machine with default char a signed type, and
>>>with some non-basic character set member -ve, ctype functions are
>>>going to be hard to use.

>>Such implementations are common.

> Well, I don't see any way out. If they are common, hard luck for all
>of us. The committee seems to be more intent on complicating issues (introducing
>an unnecessary spare-bit concept, and re-interpeting the same representation
>rule), rather than in cleaning them up.

The common implementations with negative characters have only 8-bit
characters. This is not a problem. If '$' in any such character set is
negative, isprint ('$') is not legal, but isprint (unsigned char)'$')
is, because the value of (unsigned char)'$' is still representable in
an int. I have had to fix several programs (including some of my own)
over the last few years, which blindly passed char values to the ctype
functions.

The way out is, of course, to demand that UCHAR_MAX <= INT_MAX for
hosted implementations.

Alan Watson

unread,
Jul 28, 1994, 11:52:52 AM7/28/94
to
In article <1994Jul28.1...@sal.wisc.edu> I wrote:
>the requirement that longs have 32 bits

Make that `a range not less than that of a signed binary number of two
and thirty bits that maketh good use of all of the bits that God in His
wisdom hath given it, within the restrictions of the pestilent
sign-magnitude representation'.

Anyone care for a King James translation of the standard?

--
Alan Watson | One out of twenty sounds better to
al...@oldp.astro.wisc.edu | doctors than patients, especially when it
Department of Astronomy | refers to sexual ability. -- Charles Mann

Alan Watson

unread,
Jul 28, 1994, 11:42:47 AM7/28/94
to
In article <Ctn2L...@scone.london.sco.com> cl...@sco.com (Clive D.W.
Feather) graciously put his head above the parapet and wrote:

>In article <1994Jul27.1...@sal.wisc.edu>,
>Alan Watson <al...@sal.wisc.edu> wrote:
>>In article <CtLC...@scone.london.sco.com>
>>cl...@sco.com (Clive D.W. Feather) wrote:
>>> You imply that allowing Ux_MAX == x_MAX would break a significant amount
>>> of existing code. Are you just talking about the issues of unsigned
>>> characters being discussed at the moment, or something more ? Can you
>>> suggest what code would be broken that isn't already assuming
>>> twos-complement ?
>> Any code that assumes that there is a unique and reversible mapping
>> between the int values {INT_MIN,..,INT_MAX} and the unsigned int values
>> {0,...,UINT_MAX}. For example, code that performs signed arithmetic
>> using unsigned types in an explicit two's complement notation to check
>> for overflow.
>
>Such code will already break on one's complement and sign-and-magnitude
>systems. Or did you mean "... and a subset of the unsigned int values ..." ?

Yes, sorry.

>> any code that assumes that there are sizeof (unsigned) * CHAR_BIT
>> `real' bits in an unsigned int (for example, an implementation of a
>> packed bit array) will have problems.
>
>That's true. Such code can be protected (though not fixed) fairly easily by
>putting a test in of the form
>
>#if 1 << (sizeof (unsigned) * CHAR_BIT - 1) >= UINT_MAX >> 1
>#error Missing bits !!!
>#endif

I am puzzled that the committee would choose to break such a common and


important idiom. I thought `existing code is important but existing
[and by extension future] implementations are not'.

>> Let me turn the question on it's head: can you suggest a real


>> implementation that would be broken by the requirement that unsigned
>> types use all of their available bits?
>
>From WG14 internal email:
>
>> For example, on some 24-bit machines the "unsigned long" consumes 48 bits
>> of storage, yet there are only 47 bits of precision. The high order bit
>> of the low order word is inaccessible via normal bit shifting operations.
>> (Of course, the bits *could* be accessed via a "char *" and bit shift
>> operations.)

Thanks. I'm glad I don't have to use or implement a C compiler on this
architecture. Does a conforming implementation exist? C has already


made life a pain for certain restricted architectures (for example, the
requirement that long have 32 bits is somewhat inconvenient for

implementors of C on 4 bit machines). Was there something different
about this architecture?

In article <313lkc$d...@news.tuwien.ac.at>,
Peter Holzer <h...@vmars.tuwien.ac.at> wrote:
> Some responses to DRs (struct hack, Ux_MAX == x_MAX, ...) seem
> to indicate that the committee seems to care more about bizarre
> architectures than about existing code these days.

This now seems somewhat difficult to deny.

--


Alan Watson | C is not a big language, and it is not
al...@oldp.astro.wisc.edu | well served by a big book.
Department of Astronomy | -- Kernighan & Ritchie

Alan Watson

unread,
Jul 29, 1994, 12:32:42 PM7/29/94
to
In article <Ctous...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) wrote:

[Re: use of unsigned to check for overflow.]

>If you're going to do that, shouldn't you either:
>- use floats to do the approximation (or is that too slow ?), or
>- take specific account of signs (which you have to do for multiplication
> and division anyway)
>?

There are other ways to skin this cat; the question is whether this
method `is' or `should' be strictly conforming.

[Re: sizeof (int) * CHAR_BITS]

>> I am puzzled that the committee would choose to break such a common and
>> important idiom. I thought `existing code is important but existing
>> [and by extension future] implementations are not'.
>
>The committee do not "choose to break such a common and important idiom"
>(something which has been claimed about the "struct hack" as well). In
>fact, as an occasional member, I rather resent that comment.

I apologize for the offense. I will attempt to be more specific in
this post.

>The committee get asked questions about the exact meaning of the
>wording. We/they look at the question, and the words of the Standard,
>which has *already* been approved internationally, and attempt to
>determine what the Standard says on the topic. Previous decisions,
>memory of discussion when the Standard was written [from those members
>who have the strength to *still* do this stuff !], and common practice
>are all considered.
>
>In this case, there are and were implementations that don't use all the
>bits of an integer. The view appears to be [I wasn't at that meeting]
>that those implementations were always legitimate, and therefore the
>Standard doesn't require all bits to be used. So that disposes of the
>packed bit array idiom. Unfortunate, but it *never* was guaranteed to
>work.
>
>Given that, there is no wording to suggest that UINT_MAX >= INT_MIN+INT_MAX,
>but only that UINT_MAX >= INT_MAX. If you can find a reading to the
>contrary, we would all like to know.

Perhaps I am missing the essence of the process, and I would be
grateful if you could give some details of this particular DR. If the
committee were asked `what does the standard say', the response `hidden
bits are allowed' is reasonable and probably correct (see my question
on ~ below). If the committee was asked `is the fact that hidden bits
are apparently allowed a defect in the standard', the response `yes' is
IMHO reasonable but probably a mistake.

I thought that one of the purpose of DR was to alert the committee to
apparent defects in the standard and to initiate the process of
correcting such apparent defects, not just to ask for interpretations
of the standard. For example, a few months ago we established that by
the wording of the standard `(void *) (void *) 0' is not a null pointer
constant; clearly this was a simple mistake in the standard and needed
fixing. I see this case as similar, and I am a little puzzled that you
do not.

Now, this seems to me to be a possible defect in the standard. If the
committee rules is that this is not a defect, a single implementation
on a (IMHO) bizarre architecture will have been been rendered
conforming and a common C idiom will have been rendered non-strictly
conforming. The only rationales for this (perhaps hypothetical)
decision are that I can generate are (a) the architecture is not
sufficiently bizarre; (b) the idiom is not sufficiently common; (c) the
committee is not able to change that standard; (d) the committee does
not want to change the standard.

My belief that this is a defect in the standard is bolstered by the
statement in the Rationale `the Committee saw little utility in adding
such macros [CHAR_BIT] for other data types'. Without UINT_BIT, I
cannot construct a constant expression for the number of available bits
in an unsigned type (and see below for unsigned char).

>[47-bit system]

How does it implement the ~ operator in a manner that is consistent
with `each bit in the operand is set if and only if the corresponding
bit in the converted operand is not set' and `the expression is
equivalent to Ux_MAX - E' (from ISO 6.3.3.3). Is the hidden bit
somehow somehow not `in the operand'?

Further questions:

Must there even be CHAR_BIT `real' bits in an unsigned char?

Must Ux_MAX be equal to 2^n - 1 for some value of n? Is there
anything to stop the value of UCHAR_MAX being 1000?

How `pure' must a `pure binary numeration system' be? Can hidden
bits appear in the middle of the value? Can the bits appear out of
order?

--
Alan Watson | Assembly: You try to shoot yourself in
al...@oldp.astro.wisc.edu | the foot only to discover you must first
Department of Astronomy | reinvent the gun, the bullet, and your
University of Wisconsin -- Madison | foot. -- Anon.

Karl Heuer

unread,
Jul 31, 1994, 3:45:15 PM7/31/94
to
In article <Ctous...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) writes:
>So that disposes of the packed bit array idiom. Unfortunate, but it
>*never* was guaranteed to work.

If I correctly understand the issue, the usual idiom does work provided
you pack 47 bits to an unsigned long, right? That's still not too bad,
but I wonder what other "common but technically non-conforming" idioms
would break on this architecture.

Mark Brader

unread,
Jul 31, 1994, 3:51:21 PM7/31/94
to
> > ... For example, a few months ago we established that by

> > the wording of the standard `(void *) (void *) 0' is not a null pointer
> > constant; clearly this was a simple mistake ...
>
> Hold on just a cotton-pickin' minute there partner. ...
> *I* happen to think that that's exactly as it should be.

Right, I think that was intentional. The defect was not that the
expression (void*)(void*)0 isn't a null pointer constant, but rather,
that the standard doesn't require its value to be a null pointer.
--
Mark Brader | "It is impractical for the standard to attempt to
m...@sq.com | constrain the behavior of code that does not obey
SoftQuad Inc., Toronto | the constraints of the standard." -- Doug Gwyn

This article is in the public domain.

Ronald F. Guilmette

unread,
Jul 31, 1994, 5:46:42 PM7/31/94
to
In article <1994Jul29....@sal.wisc.edu> al...@sal.wisc.edu (Alan Watson) writes:
>
>I thought that one of the purpose of DR was to alert the committee to
>apparent defects in the standard and to initiate the process of
>correcting such apparent defects, not just to ask for interpretations
>of the standard...

There *was* a process (some time back) which involved the submission of
so-called ``Requests for Interpretation''.

As I understood it, that process didn't really allow for the possibility
that there might be a real live technical mistake in the standard (which
would be in need of repair).

Anyway, that process has now been replaced by a slightly different system
which involves the submission of ``Defect Reports'' (wherein, it is believed,
people will make assertions along the lines that ``The standard is defective
in that it says... blah, blah, blah... which is obviously wrong/unclear.'')

Under the new system, the committee *can* and *will* create technical
corrections to the standard when and if such corrections seem needed.
(And by the way, I believe that we have P.J. Plauger to thank for the new
system, which really is better than the old system since it allows for
the correction of defects, whereas the old system didn't.)

Anyway, IN PRACTICE, the new system retains some of the character of the
old system, and the committee, while accepting only things which are
(formally) called ``Defect Reports'', continues to do it's level best
*both* to give guidance to implementors *and* to correct defects in the
standard via this ``DR'' process.

(I complained a bit about this generous approach at the last X3J11 meeting
in San Jose, and suggested that X3J11 should stop responding to anything
other than ``DRs'' which were clearly and unambiguously asserting the
existance of defects in the standard... as opposed to those which were
merely asking for some `interpretive' guidance. My suggestion went over
like a lead balloon. Thus, for the time being, it seems that X3J11/WG14
will continue to respond to requests for interpretations... as long as they
are labeled as ``Defect Reports''.)

--

-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- domain addr: r...@netcom.com ----------- Purveyors of Compiler Test ----
---- uucp addr: ...!uunet!netcom!rfg ------- Suites and Bullet-Proof Shoes -

Stephen Baynes

unread,
Aug 1, 1994, 3:56:41 AM8/1/94
to
Alan Watson (al...@sal.wisc.edu) wrote:
: Further questions:

: Must there even be CHAR_BIT `real' bits in an unsigned char?

I would take CHAR_BIT as the number of bits visible to the programer, not
how many are used by the hardware.

: Must Ux_MAX be equal to 2^n - 1 for some value of n? Is there


: anything to stop the value of UCHAR_MAX being 1000?

Allowing UCHAR_MAX of 1000 would make implementations on decimal based
computers a lot easier.

: How `pure' must a `pure binary numeration system' be? Can hidden


: bits appear in the middle of the value? Can the bits appear out of
: order?

I have always taken the view that the `pure binary numeration system' means
that you can assume things like:
1 << 2 == 4
2 | 4 == 6
6 & 3 == 2
etc. It does not say anything about how the programers model is mapped
to the underlying architecture.

The existance of big-endian and little-endian computers already allows a
variation in the order of bits that is visible to programs that use unions
etc (though I think any program that relies on specific results from such
use is non conforming). So other variations should be permissible.

If you were to take purity to the rediculous extreams you could be requiring
that the computers data bus is set out in bitwise order with no interveaning
wires - just so the bits never get out of order or hidden bits exist. If I
have a memory board with a parity bit - would that invalidate my compiler?

The only reasonable measure of the order of bits is the effects that logical
(bit) operations have on numerical values. If it is hidden then it is hidden
and you can't detect it. I think some sort of ruling is needed on how many
bits in an int etc - is it sizeof(int)*CHAR_BIT or can it be something else,
as that is visible effect in normal arithmetic expressions. [I would prefer
to see a separate INT_BIT as requiring sizeof(int)*CHAR_BIT rules out
implementations on some otherwise reasonable embeded microprocessors.]

--
Stephen Baynes bay...@mulsoc2.serigate.philips.nl
Philips Semicondutors Ltd
Southampton My views are my own.
United Kingdom

Clive D.W. Feather

unread,
Aug 1, 1994, 4:06:11 AM8/1/94
to
In article <rfgCtt...@netcom.com>,

Ronald F. Guilmette <r...@netcom.com> wrote:
>In article <1994Jul29....@sal.wisc.edu> al...@sal.wisc.edu (Alan Watson) writes:
>> I thought that one of the purpose of DR was to alert the committee to
>> apparent defects in the standard and to initiate the process of
>> correcting such apparent defects, not just to ask for interpretations
>> of the standard...
> Anyway, that process has now been replaced by a slightly different system
> which involves the submission of ``Defect Reports'' (wherein, it is believed,
> people will make assertions along the lines that ``The standard is defective
> in that it says... blah, blah, blah... which is obviously wrong/unclear.'')
> Under the new system, the committee *can* and *will* create technical
> corrections to the standard when and if such corrections seem needed.
> (And by the way, I believe that we have P.J. Plauger to thank for the new
> system, which really is better than the old system since it allows for
> the correction of defects, whereas the old system didn't.)

It's not PJP's invention; the Defect Report process is an integral part
of all ISO standards.

I would also like to quote part of the introduction to Record of Responses #1:
|| As a Record of Responses, this document is *not* normative; rather,
|| it provides guidance on how to interpret the ISO C Standard.
|| That guidance was crafted by technical experts [...]
|| this Record of Responses also records a number of responses that call
|| for normative changes to the ISO C Standard. A separate document,
|| called a Technical Corrigendum, makes those normative changes.

Clive D.W. Feather

unread,
Aug 1, 1994, 5:16:30 AM8/1/94
to
In article <1994Jul29....@sal.wisc.edu>,

Alan Watson <al...@sal.wisc.edu> wrote:
>In article <Ctous...@scone.london.sco.com>
>cl...@sco.com (Clive D.W. Feather) wrote:
> [Re: sizeof (int) * CHAR_BITS]

> Perhaps I am missing the essence of the process, and I would be
> grateful if you could give some details of this particular DR.

Since I wrote it, I can submit the text:

========
In these items, identifiers lexically identical to those declared in
standard headers refer to the identifiers declared in those standard
headers, whether or not the header is explicitly mentioned.

This collection has been prepared with considerable help from Mark Brader,
Jutta Degener, and a person whose employment conditions require anonymity.
However, opinions expressed or implied should not be assumed to be those
of any person other than myself.


Defect Report 069 - representation of integral types
----------------------------------------------------
Subclause 6.1.2.5 refers to the representation of a value in an integral
type being in a "pure binary numeration system", and defines this
further in footnote 18. On the other hand, the wording of ISO 2382 is:

[In this transcription, words in {...} are in bold in the original, words
in <...> are in italics in the original, and 2^4 means 2 with a superscript
of 4.]

|| 05.03.15
|| {binary (numeration) system}
|| The <fixed radix numeration system> that uses the <digits> 0 and 1 and
|| the <radix> two.
||
|| Example: In this <numeration system>, the numeral 110,01 represents the
|| number "6,25"; that is 1 x 2^2 + 1 x 2^1 + 1 x 2^-2.
||
|| 05.03.11
|| {fixed radix (numeration) system}
|| {fixed radix notation}
|| A <radix numeration system> in which all the <digit places>, except
|| perhaps the one with the highest <weight>, have the same <radix>.
||
|| NOTES
|| 1 The weights of successive digit places are successive integral powers
|| of a single radix, each multiplied by the same factor. Negative integral
|| powers of the radix are used in the representation of factors.
||
|| 2 A fixed radix numeration system is a particular case of a <mixed radix
|| numeration system>; see also note 2 to 05.03.19.
||
|| 05.03.08
|| {radix}
|| {base} (depreciated in this sense)
|| In a <radix numeration system>, the positive <integer> by which the
|| <weight> of any <digit place> is multiplied to obtain the weight of the
|| digit place with the next higher weight.
||
|| Example: In the <decimal numeration system> the radix of each digit
|| place is 10.
||
|| NOTE - The term base is depreciated in this sense because of its
|| mathematical use (see definition in 05.02.01).
||
|| 05.03.07
|| {radix (numeration) system}
|| {radix notation}
|| A <positional representation system> in which the ratio of the <weight>
|| of any one <digit place> to the weight of the digit place with the next
|| lower weight is a positive <integer>.
||
|| NOTE - The permissible values of the <character> in any digit place
|| range from zero to one less than the <radix> of that digit place.
||
|| 05.03.04
|| {weight}
|| In a <positional representation system>, the factor by which the value
|| represented by a <character> in a <digit place> is multiplied to obtain
|| its additive contribution in the representation of a number.
||
|| 05.03.03
|| {digit place}
|| {digit position}
|| In a <positional representation system>, each site that may be occupied
|| by a <character> and that may be identified by an ordinal number or by
|| an equivalent identifier.
||
|| 05.03.01
|| {positional (representation) system}
|| {positional notation}
|| Any <numeration system> in which a number is represented by an <ordered>
|| set of <characters> in such a way that the value contributed by a
|| character depends upon its position as well as upon its value.

(a) What is the legal force of the footnote, given that it quotes a
definition from a document other than ISO 2382 [see 3] ?

(b) Is the footnote wording correct, seeing that the ISO 2382 definition
does not appear to allow any of the common representations (note the
word "positive" in 05.03.07) ?

(c) Does the Standard require that an implementation appear to use only one
representation for each value of a given type ?

(d) Does the Standard require that all the bits of the value be significant ?

(e) Does the Standard require that all possible bit patterns represent
numbers ?

(f) Do the answers to questions (c), (d), and (e) depend on whether the
type is signed or unsigned, and in the former case, on the sign of the
value ?

(g) If it is permitted for certain bit patterns not to represent values,
is generation of such a value by an application (using bit operators)
undefined behaviour, or is use of such a value strictly conforming
provided that it is not used with arithmetic operators ?

In particular, are the following five implementations allowed ?

(h) Unsigned values are pure binary.
Signed values are represented using ones-complement (in other words,
positive and negative values with the same absolute value differ in
all bits, and zero has two representations). Positive numbers have
a sign bit of 0, and negative numbers a sign bit of 1.
In both cases, all bits are significant.

(i) Unsigned values are pure binary.
Signed values are represented using sign-and-magnitude with a pure
binary magnitude (note that the top bit is not "additive"). Positive
numbers have a sign bit of 0, and negative numbers a sign bit of 1.
In both cases, all bits are significant.

(j) Unsigned values are pure binary, with all bits significant.
Signed values with an MSB (sign bit) of 0 are positive, and the remainder
of the bits are evaluated in pure binary. Signed values with an MSB
of 1 are negative, and the remainder of the bits are evaluated in BCD.
If ints are 20 bits, then INT_MAX is 524287 and INT_MIN is -79999.

(k) Signed values are twos-complement using all bits.
Unsigned values are pure binary, but ignoring the MSB (so each
number has two representations).
In this implementation, SCHAR_MAX == UCHAR_MAX, SHRT_MAX == USHRT_MAX,
INT_MAX == UINT_MAX, and LONG_MAX == ULONG_MAX.

(l) Signed values are twos-complement. Unsigned values are pure binary.
In both cases, the top 3 bits of the value are ignored (and each
number has 8 representations). For signed values, the sign bit is
the fourth bit from the top.

Furthermore:

(m) Does the Standard require that the values of SCHAR_MAX, SHRT_MAX,
INT_MAX, and LONG_MAX in <limits.h> [5.2.4.2.1] all be exactly one
less than a power of 2 ?

(n) If the answer to (m) is "yes", then must the exponent of 2 be
exactly one less than CHAR_BITS * sizeof (T), where T is signed char,
short, int, or long respectively ?

(p) Does the Standard require that the values of UCHAR_MAX, USHRT_MAX,
UINT_MAX, and ULONG_MAX in <limits.h> [5.2.4.2.1] all be exactly one
less than a power of 2 ?

(q) If the answer to (p) is "yes", then must the exponent of 2 be
exactly CHAR_BITS * sizeof (T), where T is unsigned char, unsigned
short, unsigned int, or unsigned long respectively ?

(r) Does the Standard require that the absolute values of SCHAR_MIN,
SHRT_MIN, INT_MIN, and LONG_MIN in <limits.h> [5.2.4.2.1] all be
exactly a power of 2 or exactly one less than a power of 2 ?

(s) If the answer to (r) is "yes", then must the exponent of 2 be
exactly one less than CHAR_BITS * sizeof (T), where T is signed char,
short, int, or long respectively ?

(t) If any of the answers to (m), (p), or (r) is "no", are there any values
for each of these expressions that are permitted by subclause 5.2.4.2
but prohibited by the Standard for other reasons, and if so, what are
they ?

(u) Does the Standard require that the expressions (SCHAR_MIN + SCHAR_MAX),
(SHRT_MIN + SHRT_MAX), (INT_MIN + INT_MAX), and (LONG_MIN + LONG_MAX)
be exactly 0 or -1 ? If not, does it put any restrictions on these
expressions ?
========

> If the
> committee were asked `what does the standard say', the response `hidden
> bits are allowed' is reasonable and probably correct (see my question
> on ~ below).

That's effectively what was asked.

> I thought that one of the purpose of DR was to alert the committee to
> apparent defects in the standard and to initiate the process of
> correcting such apparent defects, not just to ask for interpretations
> of the standard.

That is indeed one purpose, but not the only one. This question was of
the form "I can't understand what the Standard means"; this is a Defect,
in a technical sense ("the Standard is not clear"). The committee has the
right to state that they believe that this has shown a further substantial
defect (e.g. that hidden bits are allowed and shouldn't be), in which
case they will attempt to issue a TC (which can be voted down by the
national members), or they can just issue an informative interpretation.
So far, they are still talking about it, and some of the issues on this
thread have come to their attention.

[...]


> I see this case as similar, and I am a little puzzled that you do not.

Well, I didn't know what the answer was, so asked. Personally, I feel
that hidden bits should be allowed, but there needs to be some
convenient way to find out the actual number of bits used by a type.

> Now, this seems to me to be a possible defect in the standard. If the
> committee rules is that this is not a defect, a single implementation
> on a (IMHO) bizarre architecture will have been been rendered
> conforming and a common C idiom will have been rendered non-strictly
> conforming.

Hold on. What is the common idiom that was strictly conforming and no
longer is ? If it was the "bit array stored in an integral variable",
then note that:
- if you store no more than 16 bits in an unsigned int or short, and
no more than 32 in an unsigned long, you are still strictly conforming;
- if you stored more than that, you were never strictly conforming;
- if, and only if, you calculated "sizeof(T)*CHAR_BIT)", do you have
problems.

I'm not clear how common the last one really is.

> The only rationales for this (perhaps hypothetical)
> decision are that I can generate are (a) the architecture is not
> sufficiently bizarre; (b) the idiom is not sufficiently common; (c) the
> committee is not able to change that standard; (d) the committee does
> not want to change the standard.

I would argue (b). In general, (c) is sometimes true (Normative Addendum
1 might or might not get through), as is (d) (making a change is not an
easy thing to do, or something to do lightly).

Unlike the items in TC1, this is something that can reasonably be read
either way. So any planned change must take into account existing
conforming implementations and strictly conforming applications. It's
not right to just gratuitously break either of these. Yes, I know you'll
say that's what they're doing, but if you can show that the existing
Standard supports you, you will be listened to - I've been able to get DR
responses *completely* reversed in the past.

> My belief that this is a defect in the standard is bolstered by the
> statement in the Rationale `the Committee saw little utility in adding
> such macros [CHAR_BIT] for other data types'. Without UINT_BIT, I
> cannot construct a constant expression for the number of available bits
> in an unsigned type (and see below for unsigned char).

(a) That's support for your position, but says nothing about what the
Standard says.
(b) The Rationale is wrong in several places.
(c) Perhaps the Committee had never seen a program calculating UINT_BIT
(note that CHAR_BIT seems to have been a Committee invention, or at
least not widely used before then).

>> [47-bit system]
> How does it implement the ~ operator in a manner that is consistent
> with `each bit in the operand is set if and only if the corresponding
> bit in the converted operand is not set' and `the expression is
> equivalent to Ux_MAX - E' (from ISO 6.3.3.3). Is the hidden bit
> somehow somehow not `in the operand'?

That seems to be the interpretation, from the discussions on the WG14
mailing list.

> Further questions:
> Must there even be CHAR_BIT `real' bits in an unsigned char?

Good question. From other parts of this thread, and other threads, I
think that if it can't be shown, it should be made explicit in a TC
(that is, that 2^CHAR_BIT == UCHAR_MAX+1).

> Must Ux_MAX be equal to 2^n - 1 for some value of n? Is there
> anything to stop the value of UCHAR_MAX being 1000?

Yes. Unsigned values are "pure binary".

> How `pure' must a `pure binary numeration system' be? Can hidden
> bits appear in the middle of the value? Can the bits appear out of
> order?

For unsigned types, it's clear that << and >> operate on the represented
number. If you are asking which bits in memory are used, how do you find
out ?

Imagine a system with 8 bit unsigned chars and 16 bit unsigned ints. If
you use an array of 2 unsigned chars to examine how a value is broken
into 2 bytes, there is no rule as to which byte each bit goes into. It
is legitimate for the odd-numbered bits to go into one byte and the
even-numbered ones into the other, for example.

For signed types, this is still being discussed, but it seems clear that
the sign bit must be the most significant one in terms of << and >>.

The hidden bits can't affect a strictly conforming program at all. For
unsigned types, they can't be accessed except via the char array trick,
which relies on the implementation-defined representation. For signed
types, you can also get at them via << and >>, but this is at least
implementation-defined as well, if not undefined.

Mark Brader

unread,
Aug 1, 1994, 6:45:54 PM8/1/94
to
Stephen Baynes (bay...@ukpsshp1.serigate.philips.nl) writes:
> I have always taken the view that the `pure binary numeration system' means
> that you can assume things like:
> 1 << 2 == 4
> 2 | 4 == 6
> 6 & 3 == 2
> etc.

I'd just like to point out that if the last two lines above are taken
as C expressions, rather than pseudocode, then they yield the values
2 and 0 respectively, not 1. I understand what Stephen intended, of
course. But C's precedence for bitwise operators is based more on
history than usability, and I thought it worthwhile to remind people.

For those who weren't around then, the historical reason is that the
&& and || operators were originally not in the language, so people
wrote things like

if (c >= -128 & c <= 127)

which works because >= and <= always yield 0 or 1. Note that the
precedence of the bitwise operators is correct for this usage.

An early C compiler optimized & and | to work like the present && and ||
when used in what we might call a boolean context. This behavior came
to be viewed as a feature in its own right, and so && and || were created.
Now the precedence of & and | could be changed to something more appropriate
for expressions the ones Stephen used -- but this would have *broken
existing code* such as the example line above and, since several people
at Bell Labs were now using the language, Ritchie didn't want to do that.
--
Mark Brader "We did not try to keep writing until things got full."
m...@sq.com, SoftQuad Inc., Toronto -- Dennis Ritchie

Alan Watson

unread,
Aug 1, 1994, 9:19:17 PM8/1/94
to
In article <Ctun3...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) wrote:

>[Text of DR by Clive Feather, with help from Mark Brader, Jutta
>Degener, and the Caped Crusader deleted.]

Thanks. This is clearly something into which you and the others have
put some considerable thought. Some of the possible consequences
boggle the mind. Has there been an official response, or is it still
under consideration?

Regarding the commonness or rarity of `sizeof (X) * CHAR_BITS', I have
just grepped some more of my code, and I find that I have also used
that idiom in code that implements log2, sqrt, the population counts,
the bit reversal, and so on, of the integral types.

In each of these cases, the code simply requires that the value of the
expression be at least the number of real bits in the unsigned integral
type, but I see no guarantee of this (for example, a system based on
9-bit bytes might use 8 of them for unsigned char and 15 for unsigned
int, yet still define CHAR_BIT as 8).

So, I now have code that will fail if there are more than (packed bit
array) and fewer than (bit-twiddling) `sizeof (X) * CHAR_BIT' real bits
in an unsigned type. I am unhappy.

>Yes, I know you'll
>say that's what they're doing, but if you can show that the existing
>Standard supports you, you will be listened to - I've been able to get DR
>responses *completely* reversed in the past.

I would hang my hat on the word `pure' in the phrase `pure binary
enumeration system' and on the double-definition of the result of the ~
operator in terms of bit flipping and subtraction from Ux_MAX.

A representation with hidden bits is `impure'. In what ways could a
representation deviate from a binary enumeration system making full use
of all available bits and still remain pure?

However, the bit-flipping line of reason becomes yet more murky. ISO
6.3.3.3 defines ~ in terms of bit flipping, and ISO 3.3 and 3.14 seems
to make it clear to me that a bit is defined in terms of storage and
objects. However, quite where this leaves `~0ul' is beyond me, as 0ul
isn't an object, doesn't have any storage, and consequently doesn't
have any bits to flip.

>> Further questions:
>> Must there even be CHAR_BIT `real' bits in an unsigned char?
>
>Good question. From other parts of this thread, and other threads, I
>think that if it can't be shown, it should be made explicit in a TC
>(that is, that 2^CHAR_BIT == UCHAR_MAX+1).

I lost track of the other thread. Did it generate a good reason for
why the Standard should allow hidden bits in unsigned long/int/short
but not in unsigned char? (If the answer is complex, a simple `yes,
read that thread' will do fine.)

>> How `pure' must a `pure binary numeration system' be? Can hidden
>> bits appear in the middle of the value? Can the bits appear out of
>> order?
>
>For unsigned types, it's clear that << and >> operate on the represented
>number. If you are asking which bits in memory are used, how do you find
>out ?
>
>Imagine a system with 8 bit unsigned chars and 16 bit unsigned ints. If
>you use an array of 2 unsigned chars to examine how a value is broken
>into 2 bytes, there is no rule as to which byte each bit goes into. It
>is legitimate for the odd-numbered bits to go into one byte and the
>even-numbered ones into the other, for example.

What I am getting at here is two-fold. 6.1.2.5 states that `the
representations of integral types shall define values by use of a pure
binary numeration system'. I have two questions:

How pure is pure?

Does this restriction on the `representation' constrain the values
returned when a unsigned char * is used to access an object in which
an unsigned integer value has been stored?

For example,

>> Must Ux_MAX be equal to 2^n - 1 for some value of n? Is there
>> anything to stop the value of UCHAR_MAX being 1000?
>
>Yes. Unsigned values are "pure binary".

So all values up to 1001 are allowed and are encoded in `pure
binary'. All arithmetic operations that produce intermediate
results greater than 1000 are reduced modulus 1001.

Can an implementation store unsigned integers as (possibly unpacked)
BCD values?

Can an implementation flip the bits whenever an unsigned integral
type other than unsigned char is accessed? (Consider a load/store
architecture in which arithmetic is performed conventionally in the
registers but all of the bits are flipped whenever an unsigned
long/int/short but not unsigned char is read from or written to
memory. In this case:

#include <stdio.h>
int main ()
{
unsigned u = 0;
printf ("%d\n", (* (unsigned char *) &u == 0));
}

would print 0.)

Once you start dealing in terms of the oxymoron of `degrees of purity',
you open a whole can of worms.

--
Alan Watson | The most significant simplification is
al...@oldp.astro.wisc.edu | that BCPL has only one data type -- the
Department of Astronomy | bit pattern.
University of Wisconsin -- Madison | -- Richards & Whitby-Strevens

Scott A Mayo

unread,
Aug 2, 1994, 2:46:47 AM8/2/94
to
In comp.std.c, al...@sal.wisc.edu (Alan Watson) wrote:

<So, I now have code that will fail if there are more than (packed bit array)
<and fewer than (bit-twiddling) `sizeof (X) * CHAR_BIT' real bits in an
<unsigned type. I am unhappy.

Don't be unhappy. Recognise that The Committee had to defend hardware
that no one in their right mind wants to program on, and that your
code will never, ever be ported to. Construct some sort of compile-time
test to test your assumptions, and #error if the compiler won't meet
them. (Yes, this is flame bait to the #error thread, but then, any
compiler that won't print a diagnostic and >stop< when it sees #error is
part of that whole class of compilers that, ANSI or not, fails to
interest me. More interesting is the construction of compile time tests
for these sorts of assumptions.)

This is a heretical request, but I *really* wish there existed a subset
of ANSI C that was what I'll call real world (for a sufficiently
parochial view of "world"). One in which #error stopped the compiler;
one in which the struct hack worked, one in which sizeofs, maxs and mins
had no surprises; one in which hidden bits were, by definition,
not there; even one in which (oh, forgive me!) a NULL pointer was
a bit pattern of all 0's, so calloc'd structures containing pointers
could be trusted. In other words, one that reflected the usage we all
know works perfectly well on any system we're ever going to touch,
but now feel guily about using because ANSI has reminded us that,
somewhere out there, there lurks a machine with 47 bit integers (plus three
hidden), with 53 bit unsigned ints, NULL pointers with all bits 1,
and is one's-complement-little-endian on Thursdays. It seems to me that,
with a great deal of argument, a standard could be created that embodied
*most* processors, even those that don't stick to 8 bit bytes (I
programmed KL-10's for years, don't talk to *me* about bytes), without
reflecting the evil constraints of bizarre microprocessors that should
stay firmly out of the sight of Decent Programmers...

Why are all those Committee members suddenly facing me, wielding knives?
Is it something I said?

Clive D.W. Feather

unread,
Aug 2, 1994, 10:55:24 AM8/2/94
to
In article <1994Aug2.0...@sal.wisc.edu>,

Alan Watson <al...@sal.wisc.edu> wrote:
>> [Text of DR by Clive Feather, with help from Mark Brader, Jutta
>> Degener, and the Caped Crusader deleted.]
> Thanks. This is clearly something into which you and the others have
> put some considerable thought. Some of the possible consequences
> boggle the mind. Has there been an official response, or is it still
> under consideration?

There hasn't been an official response. There was apparently some discussion
at last week's WG14 meeting, but I haven't assimilated the reports I've
received yet. The response isn't "official" until it appears in a Record
of Responses which has passed ballot. We're a long way from that yet.

> Regarding the commonness or rarity of `sizeof (X) * CHAR_BITS', I have
> just grepped some more of my code, and I find that I have also used
> that idiom in code that implements log2, sqrt, the population counts,
> the bit reversal, and so on, of the integral types.
> In each of these cases, the code simply requires that the value of the
> expression be at least the number of real bits in the unsigned integral
> type, but I see no guarantee of this (for example, a system based on
> 9-bit bytes might use 8 of them for unsigned char and 15 for unsigned
> int, yet still define CHAR_BIT as 8).

So you would be happy if there was an INT_BITS, a UINT_BITS, and so on ?

> So, I now have code that will fail if there are more than (packed bit
> array) and fewer than (bit-twiddling) `sizeof (X) * CHAR_BIT' real bits
> in an unsigned type. I am unhappy.

There can't be more, because there's nowhere to put them. There can
apparently be less, or so the current responses are saying. This issue
*is* under active consideration.

>> Yes, I know you'll
>> say that's what they're doing, but if you can show that the existing
>> Standard supports you, you will be listened to - I've been able to get DR
>> responses *completely* reversed in the past.
> I would hang my hat on the word `pure' in the phrase `pure binary
> enumeration system' and on the double-definition of the result of the ~
> operator in terms of bit flipping and subtraction from Ux_MAX.
> A representation with hidden bits is `impure'. In what ways could a
> representation deviate from a binary enumeration system making full use
> of all available bits and still remain pure?

Since "pure" isn't defined by either the Standard or ISO 2382, you have
to take common usage of the term. And I don't see how you can claim that
the common usage is that "pure" means "no unused bits". I would rather
think that it meant that you can't use "variants" like Gray code.

I would claim that failing to use some bits of the implementation would
leave it pure. Consider a system with a parity bit attached to each
byte which can be accessed via special opcodes, and where parity
checking can be disabled. Your definition of "pure" would require the C
implementation to use these extra bits, no matter how much extra effort
it requires.

> However, the bit-flipping line of reason becomes yet more murky. ISO
> 6.3.3.3 defines ~ in terms of bit flipping, and ISO 3.3 and 3.14 seems
> to make it clear to me that a bit is defined in terms of storage and
> objects. However, quite where this leaves `~0ul' is beyond me, as 0ul
> isn't an object, doesn't have any storage, and consequently doesn't
> have any bits to flip.

An operand doesn't have to be an object. A bit is a unit of data
storage, so objects are made of bits, but that doesn't mean that all
bits are objects.

Also note that 6.3.3.3 implicitly states that hidden bits don't take
part in bit flipping, at least for unsigned types, because it defines
the result in terms of UINT_MAX and ULONG_MAX.

>> Good question. From other parts of this thread, and other threads, I
>> think that if it can't be shown, it should be made explicit in a TC
>> (that is, that 2^CHAR_BIT == UCHAR_MAX+1).
> I lost track of the other thread. Did it generate a good reason for
> why the Standard should allow hidden bits in unsigned long/int/short
> but not in unsigned char?

Yes. Basically, accessing objects via arrays of unsigned chars wouldn't
work otherwise.

>>> How `pure' must a `pure binary numeration system' be? Can hidden
>>> bits appear in the middle of the value? Can the bits appear out of
>>> order?
>> For unsigned types, it's clear that << and >> operate on the represented
>> number. If you are asking which bits in memory are used, how do you find
>> out ?

[...]


> What I am getting at here is two-fold. 6.1.2.5 states that `the
> representations of integral types shall define values by use of a pure
> binary numeration system'. I have two questions:
> How pure is pure?

Can't say, because it's an undefined term.

> Does this restriction on the `representation' constrain the values
> returned when a unsigned char * is used to access an object in which
> an unsigned integer value has been stored?
> For example,
>>> Must Ux_MAX be equal to 2^n - 1 for some value of n? Is there
>>> anything to stop the value of UCHAR_MAX being 1000?
>> Yes. Unsigned values are "pure binary".
> So all values up to 1001 are allowed and are encoded in `pure
> binary'. All arithmetic operations that produce intermediate
> results greater than 1000 are reduced modulus 1001.

The current WG14 view seems to be that "pure" means that all
combinations of the bits that take part in the representation must give
valid values using the normal meaning. So, in your implementation, given
that there must be a 2^9 bit in order to represent 1000, the expression
501 << 1 must evaluate to 1002, because that's what would happen in a
pure binary system that has a 2^9 bit.

> Can an implementation store unsigned integers as (possibly unpacked)
> BCD values?

If access via an (unsigned char *) sees the BCD values, then I would say
not. If it is fooled in the same way (for example, CHAR_BIT == 9 and
each byte is stored as a 12-bit BCD value), then yes, because there is
no way a "strongly conforming program" (I think that's the term I want;
will someone remind me what the definition is please ?) can find out
that it's happening.

> Can an implementation flip the bits whenever an unsigned integral
> type other than unsigned char is accessed? (Consider a load/store
> architecture in which arithmetic is performed conventionally in the
> registers but all of the bits are flipped whenever an unsigned
> long/int/short but not unsigned char is read from or written to
> memory.

I'm not sure what you mean, and your example doesn't help.

Let's suppose that sizeof (unsigned) is 2. If there are no hidden bits,
then, given:

unsigned u0 = 0, u1 = 1 << 14, u2 = (1 << 14) + (1 << 2);

I would say that it is required that:

((unsigned char *) &u0) [0] == 0
((unsigned char *) &u0) [1] == 0
that one of
((unsigned char *) &u1) [0]
((unsigned char *) &u1) [1]
is zero and the other has exactly 1 bit set [as determined by a loop
that checks (value&1) and then shifts right by one bit], and that
if you examine:
((unsigned char *) &u2) [0]
((unsigned char *) &u2) [1]
then *either* each will have exactly one bit set, *or* one will have two
bits set and the other will be zero. Moreover, one of the two bits found
will match the one found by the tests on u1.

If there is one hidden bit in unsigned int and none in unsigned char, then
I would say that the two byte accesses *might* find an additional set bit
in each case, but it is impossible to predict; the bit might turn on and
off without apparent cause. The extra bit will be in its own position.
For example, if CHAR_BIT is 10, then UINT_MAX is 2^19 - 1, but the
access as two bytes will randomly see the notional 2^19 bit be set.

Signed types are under the same constraint when the value is non-negative [*],
and the sign bit must then be zero. When the sign bit is set, there seem
to be no restrictions on the value.

[*] Actually, it isn't totally clear whether -0 must be converted to +0
if the two are different, or whether it is permitted for 1&(-0) to equal 1.

Clive D.W. Feather

unread,
Aug 2, 1994, 11:12:46 AM8/2/94
to
In article <4f.smail...@std.world.com>,

Scott A Mayo <sm...@world.std.com> wrote:
> Don't be unhappy. Recognise that The Committee had to defend hardware
> that no one in their right mind wants to program on, and that your
> code will never, ever be ported to.

How do you know that either of these is the case ?

How about instead recognising that the Committee have to work with the
Standard that is written, and that in most cases, for every person who
says "my way is obviously right and much more common", there is someone
who says that about the opposing view. How about I post some TCs here
and you show me what is the obvious answer ?

> Construct some sort of compile-time
> test to test your assumptions, and #error if the compiler won't meet
> them.

The problem is that, if hidden bits are permitted, this isn't possible
for him. And, unfortunately, adding new symbols to <limits.h> is not a
trivial thing to do. Particularly not in a TC.

> This is a heretical request, but I *really* wish there existed a subset
> of ANSI C that was what I'll call real world (for a sufficiently
> parochial view of "world").

If you stick to one compiler and one machine, then it's easy to get.
Unfortunately, the *real* world isn't like that.

To address your issues:

- One in which #error stopped the compiler;

An implementation may successfully translate an invalid program. That
comes under the heading of "quality of implementation". You are at
liberty to refuse to buy compilers which don't have success/failure
reporting to match your requirements.

- one in which the struct hack worked,

Why are people so wedded to this ? It's easy to avoid *without* needing
two allocations, just by including an extra pointer field. And you can
put more than one variable array in a structure like that. And have the
variable array not at the end.

- one in which sizeofs, maxs and mins had no surprises;

I'm not sure what you mean by that.

- one in which hidden bits were, by definition, not there;

If the implementation doesn't need to hide bits, it probably won't. If
it does, do you really want your code to run at half speed while it
fakes the non-hidden bits ?

- one in which a NULL pointer was a bit pattern of all 0's so calloc'd


structures containing pointers could be trusted.

Or because you're too lazy to write a malloc wrapper that initializes
the fields of your structures correctly.

> In other words, one that reflected the usage we all
> know works perfectly well on any system we're ever going to touch,

You must have very little experience of systems. Those restrictions were
put in there for the real world, not for MS-DOS weenies or Unix bigots.

> but now feel guily about using because ANSI has reminded us that,
> somewhere out there, there lurks a machine with

[...]

"machines with various combinations of", you mean. If you want to assume
NULL has all zero bits, go ahead. You'll just be limited to the machines
your code works on.

> It seems to me that,
> with a great deal of argument, a standard could be created that embodied
> *most* processors

That's what we've got. Don't come crying because the authors looked at
more than the few machines you've seen. Yes, I saw you've used a KL-10.
*I've* used an Elliott 803. It had 39-bit integers, 77-bit longs, and 5
or 6 (depending on the character set encoding you used) bytes.

> Why are all those Committee members suddenly facing me, wielding knives?

Tell you what. How about asking X3J11 to let you come to the next WG14
meeting ? Then you can see some *real* Committee members with *real*
knives. And if you think I've been rough ...

Ajoy KT

unread,
Aug 2, 1994, 5:57:00 AM8/2/94
to
In article <31b01o$g...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...

>t_akr...@bart.hns.com (Ajoy KT) writes:
>>In article <313mhr$d...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...
>>>t_akr...@bart.hns.com (Ajoy KT) writes:
>I believe we have some difficulties understanding each other, so I will
>use an example. Let us assume an implementation with 16 bit ints
>and 16 bit chars. Then assume that the character constant '$' is
>negative. Then the value of '$' cannot be represented in an unsigned
>char, because that can only hold non-negative values. So, unless '$' ==
>EOF, '$' is not a legal argument to isprint in such an implementation.

OK; I agree.

>>>>This means that, on a machine with default char a signed type, and
>>>>with some non-basic character set member -ve, ctype functions are
>>>>going to be hard to use.

>>>Such implementations are common.

>> Well, I don't see any way out. If they are common, hard luck for all
>>of us.

>The common implementations with negative characters have only 8-bit


>characters. This is not a problem. If '$' in any such character set is
>negative, isprint ('$') is not legal, but isprint (unsigned char)'$')
>is,

I am not sure whether (unsigned char)'$' would be interpreted by
isprint() to be the character '$'. In other words, I can see no guarantee that
isprint((unsigned char)'$'); actually tells us whether '$' is a print-character
or not (it just tells us whether (unsigned char)'$' is a printable-character
or not).

>The way out is, of course, to demand that UCHAR_MAX <= INT_MAX for
>hosted implementations.

Programmers typically use isprint(ch); prior art is that way. Even
if UCHAR_MAX <= INT_MAX is guaranteed; this particular useage still remains
non-portable (since any non-basic character constant can be -ve). I believe
the description of the ctype functions has to be modified to take into
account these possibilites (instead of requiring programmers to use
isprint((unsigned char)'$'); the status of which is also unclear).

Another drawback of the current definition is that the default type of
char becomes fixed by the properties of the character set in use; not
necessarily by what is native to the processing system. For eg., on any
implementation supporting the EBCDIC set, the peculiarities of ANSI C ensure
that the default char type be an unsigned type (assuming 8-bit chars)^; on a
processor with default byte manipulations signed, this introduces inefficiency.

^ This is so, because EBCDIC has an 8th-bit-set representation for digits;
since '0' to '9' has to be positive, this implies that the default char
type is an unsigned type.

-----
/* "Never again, was what you swore, the time before..." - Depeche Mode */

Alan Watson

unread,
Aug 2, 1994, 1:17:58 PM8/2/94
to
In article <4f.smail...@std.world.com>

>Construct some sort of compile-time
>test to test your assumptions [that there are no hidden bits in an
>unsigned int], and #error if the compiler won't meet them.

This appears to be impossible since, as Peter Holtzer reminded us,
sizeof cannot be used in preprocessor constant expressions.

The best preprocessor diagnostic I can come up with is:

#if UINT_MAX >> 16 == 0
#define UINT_BITS 16
#elif UINT_MAX >> 17 == 0
#define UINT_BITS 17
...
#endif

and then either use UINT_BITS and be done with it or use

#if UINT_BITS % CHAR_BITS != 0
#error Hidden bits!
#endif

However, the point at which the #if chain is terminated is somewhat
arbitrary.

--
Alan Watson | [volatile's] chief virtue is that nearly
al...@oldp.astro.wisc.edu | everyone can forget about it.
Department of Astronomy | -- Dennis Ritchie

Alan Watson

unread,
Aug 3, 1994, 12:36:22 AM8/3/94
to
In article <CtwxG...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) wrote:
>So you would be happy if there was an INT_BITS, a UINT_BITS, and so on ?

Yes. I would be happy with Ux_BITS. I'm not sure how x_BITS should be
defined, given the precedent set by CHAR_BITS.

I could modify my code to make it strictly conforming under the
existing standard, but only at a cost either in space, speed, and/or
complexity on a common 64-bit machine that lacks load/store byte
operations. I would not be especially happy about this, but I could
live with it. The vendor of this machine might regard this as a
serious issue, but somehow I doubt it.

I am aware that reserving identifiers currently in the user's name
space is a Big Deal. There seems to be three options: (a) outlaw
hidden bits; (b) add Ux_BITS; or (c) accept that there is no
strictly-conforming constant expression for the number of bits in an
unsigned integer type. Although all of these have drawbacks, I think
(a) has the least, but then I don't have persuade a group of my peers
that we should tell a certain vendor and their lawyers that their
compiler no longer conforms.

>> I lost track of the other thread. Did it generate a good reason for
>> why the Standard should allow hidden bits in unsigned long/int/short
>> but not in unsigned char?
>
>Yes. Basically, accessing objects via arrays of unsigned chars wouldn't
>work otherwise.

Well, that's a damn good reason for why unsigned chars should not have
hidden bits, but I can't find that guarantee in the Standard. Have I
missed something?

>[Response to my ravings of the purity of the integral types deleted.]

Okay, you (and I presume you are trying to reflect the Committee's
view, such as it exists) favour allowing freedom in the (fixed) order
in which bits are stored (as seen by an unsigned char *), the number of
bits, and the value of hidden bits, but not such things as an encoding
of those bits. This is reasonable, but not especially explicit in the
Standard. This issue is to some degree addressed by your DR.

(BTW, what I mean by `flipping bits' when an unsigned int was loaded or
stored but not an unsigned char, was that if the value zero was written
to an unsigned int object though a modifiable lvalue with type unsigned
int, all of the bits in that object would appear to be one if they were
accessed via an unsigned char * pointer, and if the value UINT_MAX was
written to the same object, all of the bits in that object would appear
to be zero if they were accessed via an unsigned char pointer. This
ignores hidden bits.)

Let's investigate the value of those potential hidden bits. Should the
following print zero?

#include <stdio.h>
#include <string.h>
int
main()
{
unsigned u1, u2;
u1 = u2 = 0;
printf("%d\n", memcmp(&u1, &u2, sizeof(unsigned)));
return 0;
}

It would not print zero if two unsigneds contain hidden bits and if the
values of those hidden bits differ. Is there a guarantee in the
Standard that they do indeed have the same values?

--
Alan Watson | On international race courses, the result
al...@oldp.astro.wisc.edu | of the race is much more important than
Department of Astronomy | the equipment. The collision only gives
University of Wisconsin -- Madison | a more accurate indication of a protest
| situation. -- Paul Elvström

Stephen Baynes

unread,
Aug 3, 1994, 3:22:01 AM8/3/94
to
Clive D.W. Feather (cl...@sco.com) wrote:
: In article <1994Aug2.0...@sal.wisc.edu>,
: Alan Watson <al...@sal.wisc.edu> wrote:
-snip-
: > Regarding the commonness or rarity of `sizeof (X) * CHAR_BITS', I have

: > just grepped some more of my code, and I find that I have also used
: > that idiom in code that implements log2, sqrt, the population counts,
: > the bit reversal, and so on, of the integral types.
: > In each of these cases, the code simply requires that the value of the
: > expression be at least the number of real bits in the unsigned integral
: > type, but I see no guarantee of this (for example, a system based on
: > 9-bit bytes might use 8 of them for unsigned char and 15 for unsigned
: > int, yet still define CHAR_BIT as 8).

: So you would be happy if there was an INT_BITS, a UINT_BITS, and so on ?

: > So, I now have code that will fail if there are more than (packed bit
: > array) and fewer than (bit-twiddling) `sizeof (X) * CHAR_BIT' real bits
: > in an unsigned type. I am unhappy.

: There can't be more, because there's nowhere to put them. There can
: apparently be less, or so the current responses are saying. This issue
: *is* under active consideration.

-snip-

There could be more - consider:
char is 8 visible bits + 1 hidden parity bit. CHAR_BIT is 8
int is 4 chars used as
35 visible bits + 1 hidden parity bit. INT_BIT is 35
but sizeof(int) is 4 and 4 * CHAR_BIT is 32.

Clive D.W. Feather

unread,
Aug 3, 1994, 4:15:29 AM8/3/94
to
In article <1994Aug3.0...@sal.wisc.edu>,

Alan Watson <al...@sal.wisc.edu> wrote:
>In article <CtwxG...@scone.london.sco.com>
>cl...@sco.com (Clive D.W. Feather) wrote:
>> So you would be happy if there was an INT_BITS, a UINT_BITS, and so on ?
> Yes. I would be happy with Ux_BITS. I'm not sure how x_BITS should be
> defined, given the precedent set by CHAR_BITS.

Actually, it's CHAR_BIT. And, looking at stuff that arrived in the last
few days, it seems that character types will not be allowed "padding bits"
[Mark Brader's term, but one I approve of].

BTW, WG14 discussions have come up with the following terms, which make
it somewhat easier to discuss the topic:

Object representation
The value of an object as seen as an array of unsigned chars. For
example, following:
T object;
unsigned char buf [sizeof object];
/* ... */
memcpy (buf, (const void *) &object);
the contents of buf are the object representation of the value
in object.

Value representation
A sequence of bits structured in a conventional way to represent
values.

The mapping from the bits of the value representation to the bits of the
object representation is implementation defined; it could depend, for
example, on byte ordering. Padding bits and bytes are then the bits (and
bytes for non-scalar types) in the object representation but not the value
representation.

All non-integral types have an unspecified value representation. The
value representation of an unsigned integral type with N value bits is
standard binary, holding values from 0 to 2^N-1. For a signed type with
N value bits, non-negative values are the same as if the type was unsigned,
while negative values are still under discussion.

> I am aware that reserving identifiers currently in the user's name
> space is a Big Deal. There seems to be three options: (a) outlaw
> hidden bits; (b) add Ux_BITS; or (c) accept that there is no
> strictly-conforming constant expression for the number of bits in an
> unsigned integer type. Although all of these have drawbacks, I think
> (a) has the least, but then I don't have persuade a group of my peers
> that we should tell a certain vendor and their lawyers that their
> compiler no longer conforms.

The problems are
(a) changes the apparent conformance status of existing implementations
(b) requires all conforming implementations to change
(c) changes the apparent conformance status of existing applications

Doing any of those in a TC is something to be wary of.

>>> I lost track of the other thread. Did it generate a good reason for
>>> why the Standard should allow hidden bits in unsigned long/int/short
>>> but not in unsigned char?
>> Yes. Basically, accessing objects via arrays of unsigned chars wouldn't
>> work otherwise.
> Well, that's a damn good reason for why unsigned chars should not have
> hidden bits, but I can't find that guarantee in the Standard. Have I
> missed something?

It's not spelled out, but 7.9.2 (binary streams) and 7.11.1 (string
functions) provide indirect justification.

> Okay, you (and I presume you are trying to reflect the Committee's
> view, such as it exists) favour allowing freedom in the (fixed) order
> in which bits are stored (as seen by an unsigned char *), the number of
> bits, and the value of hidden bits, but not such things as an encoding
> of those bits. This is reasonable, but not especially explicit in the
> Standard. This issue is to some degree addressed by your DR.

The encoding of the bits of the value representation is stated to be
"a pure binary numeration system". Given this, and the concepts of value
and object representation, I think the rest falls out. At least most of
the response to this DR will be "guidance in interpretation". I will be
pushing to have the "no padding bits in char types" explicitly added.

> (BTW, what I mean by `flipping bits' when an unsigned int was loaded or
> stored but not an unsigned char, was that if the value zero was written
> to an unsigned int object though a modifiable lvalue with type unsigned
> int, all of the bits in that object would appear to be one if they were
> accessed via an unsigned char * pointer

This means that the object and value representations would use different
values for the same bit in the same number. I think this is outlawed.

> Let's investigate the value of those potential hidden bits. Should the
> following print zero?
> #include <stdio.h>
> #include <string.h>
> int
> main()
> {
> unsigned u1, u2;
> u1 = u2 = 0;
> printf("%d\n", memcmp(&u1, &u2, sizeof(unsigned)));
> return 0;
> }
> It would not print zero if two unsigneds contain hidden bits and if the
> values of those hidden bits differ. Is there a guarantee in the
> Standard that they do indeed have the same values?

I don't believe so. However, this is a separate issue, though the DR did
touch on it.

Before claiming that it is unrealistic to have the padding bits be different
in the two zeros, consider:

u1 = 0;
u2 = -1; u2++;

which might put a -0 into u2, or:

u1 = u2 = 0;

do u2++; while (u2 != 0);

which might carry out of the MSB of the value representation into a
padding bit.

Norman Diamond

unread,
Aug 3, 1994, 6:08:40 AM8/3/94
to
In article <Ctwy9...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>To address your issues:
>- One in which #error stopped the compiler;
>An implementation may successfully translate an invalid program. That
>comes under the heading of "quality of implementation". You are at
>liberty to refuse to buy compilers which don't have success/failure
>reporting to match your requirements.

That "answer" is a true statement, but it does not address the issue.
The issue is if implementations can refuse to translate valid programs.
--
<< If this were the company's opinion, I would not be allowed to post it. >>
segmentation fault (california dumped)

Scott A Mayo

unread,
Aug 3, 1994, 2:24:04 AM8/3/94
to
In comp.std.c, cl...@sco.com (Clive D.W. Feather) wrote:

>....Hard, hard words....

Clive, calm down. I was *joking*. A bit of lighthearted humor in the midst of
an otherwise technical discussion. Just a touch of the old ha, ha. Gracious,
man, what do they put in your breakfast cereal?

I really have the highest regard for the standard, or I wouldn't read here.
My copy of the standard (classic, /w rationale, destined to become a
collector's item, bronze your copy today!) occupies a place of honor on my
desk, and I use it as my runtime reference wherever I go. Because we've
reached the point where I *can* do that.

But in the name of Making A Point, let me defend a comment or two I made.

S>Don't be unhappy. Recognise that The Committee had to defend hardware that
S>no one in their right mind wants to program on, and that your code will
S>never, ever be ported to.
C>How do you know that either of these is the case ?

I don't *know*. I reserve *knowing* things for other newsgroups. But I've
been watching the computer world for the last decade or two, and I've come to
the astonishing conclusion that, as far as at least American business is
concerned - and that *is* the computer world, isn't it? - the rule is 8 Bits
Or Die. Word sizes that are not in fact a positive, integral power of 2 in
width may exist in some machines, but they are either museum pieces or
sufficiently special purpose that I'll probably never again meet a programmer
who is now or will again work on them. I can't quite say that the day of the
funny-width word is ending. We're past that. It ended. It ended, perhaps, on
the day when DEC squished the KL processor, largely because 36/8 wasn't 4.
Memory manufacturers typically look at you funny, and charge a premium, if
you want memory arranged in ways that won't nicely support octets. And I'm sure
it is quite possible to develop a quality C compiler for a 39 bit machine,
but let's face it, lack of competitive pressure, and a basic lack of
programmers who want to muck around with dinosaur machines anyway, will more
or less guarantee poor odds of getting it cheaply. In the meantime, the only
potential competitor to 32 bits is 64. Sizes like 36, 39, 77, and even 16 are
just dead, dead, dead. You can still make a case for 80, though...

Look at the big OS's these days. Macs, OS/2, zillions of unixes, NT, MS-Dos,
even MS-Windows. And look at the platforms they run on. They all
think in terms of the sort of ANSI C subset I describe. No law says they had
to - except the law of demand. Show me a machine that won't yelp when I
accidentally set a pointer to all bits zero and then dereference, and
I'll ask you why this silicon is still permitted to walk the streets
and trouble the dreams of men.

[the struct hack]

<Why are people so wedded to this ? It's easy to avoid *without* needing two
<allocations, just by including an extra pointer field.

I used the struct hack exactly once, when I was first learning C. There were
machine constraints - it was a Z80 with 64K of memory, which I was fast using
up, and the pointer trick cost bytes. In other words, my reasons for wanting
it had nothing to do with portability. They still don't. If I argue for it,
it is because I am arguing for a machine in which memory follows the model of
the parochial, narrow, small-minded, unixy/DOSic/Macish/NTian/OS2oid world
that I am used to - the one that took over the world while you were still
thinking longingly of an Elliot 803, and I was still wondering if I could
rememeber the bit patterns of the KL instructions. No one here is saying
the struct hack is elegant programming, and I recognise what drove the
committee to break it. But all the folk who want it, want it because it means
that memory makes sense, and want the security of knowing that if some
need drives them to treat memory as undifferentated bytes, that they can
do it. That is what C used to be about; to quote the Rationale, that was
very much in "the spirit of C." It looks as if C has evolved away from that
set of roots. Of course there's some foot dragging over that.

<>...one in which hidden bits were, by definition, not there;

<If the implementation doesn't need to hide bits, it probably won't. If it
<does, do you really want your code to run at half speed while it fakes the
<non-hidden bits ?

Yes. I'd make that trade ten times out of ten. My product wouldn't run as
fast as possible but I'll have it out to market years before the competition
will - probably years before they've staffed up on programmers who can deal
with the concept of bits that are only sort of there. And that, as we've all
had reason to learn, means I won't *have* competition. If performance isn't
what I want, I'll wait six months for the next version of the processor,
because that, again, is cheaper than touching the code.

<<one in which a NULL pointer was a bit pattern of all 0's so calloc'd
<<structures containing pointers could be trusted.

<Or because you're too lazy to write a malloc wrapper that initializes the
<fields of your structures correctly.

I'm a fanatic about wrapper functions, myself. But there is just something
too elegant for words about x = calloc(1, sizeof *x); for x, any non-void
pointer type. Because, modulo the NULL and float issues, it creates one of
anything in a predictable form. Getting a Universal Constructor is worth
something.

<>In other words, one that reflected the usage we all know works perfectly
<>well on any system we're ever going to touch,

<You must have very little experience of systems. Those restrictions were put
<in there for the real world, not for MS-DOS weenies or Unix bigots.

Now Clive, that simply isn't a reasonable assumption. I've been around. :-)
It's just that the committee, I think, succumbed to a temptation to be all
things to all men, or at least to all machines. And even while they were
succumbing, the real world was changing. I maintain the real world is now
full of octets and everything else is consigned by market forces to the role
of computer museum heaters.

It is just that the needs of the few outweighed the wishes of the
many, and as a member of a democratic nation I sort of wonder if the standard
didn't go a tad too far in supporting the few.

<"machines with various combinations of", you mean. If you want to assume
<NULL has all zero bits, go ahead. You'll just be limited to the machines
<your code works on.

Amen. Isn't that always the way? Glad it will never affect my paycheck.

<>It seems to me that, with a great deal of argument, a standard could be
<>created that embodied *most* processors

<That's what we've got. Don't come crying because the authors looked at more
<than the few machines you've seen. Yes, I saw you've used a KL-10. *I've*
<used an Elliott 803. It had 39-bit integers, 77-bit longs, and 5 or 6
<(depending on the character set encoding you used) bytes.

So, are you looking for sympathy, or something? Let's trade war stories. The
Z80 I used allowed you to use its video memory as store - except that for
some reason, they didn't need all the bits for the video so they didn't wire
them all. If HL pointed into video memory, then MOV (HL),AL and MOV AL,(HL)
had the effect of AL = AL & ~2, if I recall. We all agree that this is sick
and degraded, right? But why should I consider a one's complement machine, or
even a NULL-pointer-nonzero machine, significantly less over the edge?
Machines that recognised that 0 opcodes and 0 pointers were evil things have
been around for quite some time now - with good reason.

So let's polish the standard to perfection. And then, since the standard is
a treaty between programmer and implementor, let's extend the treaty. Let's
create additional defines along the lines of (these names not to be taken seriously)
_WARNING_NULL_ISNT_ALL_BITS_0
_WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
_WARNING_ONES_COMPLEMENT_SPOKEN_HERE
_WARNING_BITS_HIDDEN_WITHIN
_WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE
and define them so that the programmer who wants ANSI and some additional
reassurances aside can at least abort a compile of his code that would
lead to useless executables.

<Tell you what. How about asking X3J11 to let you come to the next WG14
<meeting ? Then you can see some *real* Committee members with *real* knives.
<And if you think I've been rough ...

Nah, just a little jumpy. If you want rough, you should have seen the way I
reacted the first time a fresh-from-college programmer thought I was telling
a tall one when I mentioned starting my programming career on a machine with
36 bit words. I've since learned to let my past blend with the mythological
tales that surround the early prehistory of computer science, back when
memory came in sizes below 4M chips, when high speed disks were the ones that
look less than 50ms to seek, when you could shift and subtract faster than do
a hardware integer divide, and no one supported threading, and cavemen walked
the earth, programming in languages that did not support recursion.

Jim Balter

unread,
Aug 3, 1994, 6:59:40 AM8/3/94
to
In article <1994Aug3.0...@sal.wisc.edu>,

Alan Watson <al...@sal.wisc.edu> wrote:
>In article <CtwxG...@scone.london.sco.com>
>cl...@sco.com (Clive D.W. Feather) wrote:
>>So you would be happy if there was an INT_BITS, a UINT_BITS, and so on ?
>
>Yes. I would be happy with Ux_BITS. I'm not sure how x_BITS should be
>defined, given the precedent set by CHAR_BITS.

Heck, I'd be happy with CHAR_BITS, instead of the misshapen 8-character
CHAR_BIT defined by a certain language standard.
--
<J Q B>

Clive D.W. Feather

unread,
Aug 3, 1994, 6:47:51 AM8/3/94
to
In article <Cty74...@ukpsshp1.serigate.philips.nl>,

Stephen Baynes <bay...@ukpsshp1.serigate.philips.nl> wrote:
> char is 8 visible bits + 1 hidden parity bit. CHAR_BIT is 8
> int is 4 chars used as
> 35 visible bits + 1 hidden parity bit. INT_BIT is 35
> but sizeof(int) is 4 and 4 * CHAR_BIT is 32.

Ooo. Clever.

If (see my previous posting) the bits in the value representation are
required to be a strict subset of those in the object representation,
that problem goes away.

Jim Balter

unread,
Aug 3, 1994, 6:02:51 AM8/3/94
to
In article <4f.smail...@std.world.com>,
Scott A Mayo <sm...@world.std.com> wrote:
>This is a heretical request, but I *really* wish there existed a subset
>of ANSI C that was what I'll call real world (for a sufficiently
>parochial view of "world"). One in which #error stopped the compiler;
>one in which the struct hack worked, one in which sizeofs, maxs and mins
>had no surprises; one in which hidden bits were, by definition,
>not there; even one in which (oh, forgive me!) a NULL pointer was
>a bit pattern of all 0's, so calloc'd structures containing pointers
>could be trusted.

I don't think you will get any support for the last one, even from the
staunchest proponents of any of the others (including me). It puts a severe
and unrealistic physical constraint upon implementations, unlike #error and
the struct hack. And one can easily code in such a way that all objects are
explicitly initialized, whereas "hidden bits" may make generic coding
difficult in some cases in the absence of a strictly conforming mechanism for
determining the number of bits in an integral type.
--
<J Q B>

Clive D.W. Feather

unread,
Aug 3, 1994, 10:09:15 AM8/3/94
to
In article <9d.smail...@world.std.com>,

Scott A Mayo <sm...@world.std.com> wrote:
> Clive, calm down. I was *joking*. A bit of lighthearted humor in the midst of
> an otherwise technical discussion. Just a touch of the old ha, ha. Gracious,
> man, what do they put in your breakfast cereal?

You might or might not have been joking. I've heard those sort of words
too often to think of them as a joke anymore, I'm afraid.

S> Don't be unhappy. Recognise that The Committee had to defend hardware that
S> no one in their right mind wants to program on, and that your code will
S> never, ever be ported to.
C> How do you know that either of these is the case ?
> I don't *know*. I reserve *knowing* things for other newsgroups. But I've
> been watching the computer world for the last decade or two, and I've come to
> the astonishing conclusion that, as far as at least American business is
> concerned - and that *is* the computer world, isn't it? - the rule is 8 Bits
> Or Die.

As others have pointed out, this may be true of "mass market" computers,
but not of, e.g., embedded systems.

And, to paraphrase Henry Spencer, whenever you see something weird in
the Standard, check the IBM AS/400.

> Look at the big OS's these days. Macs, OS/2, zillions of unixes, NT, MS-Dos,
> even MS-Windows. And look at the platforms they run on.

Not everyone has an OS on their machine (and I'm not even thinking of
the usual comments about MS-DOS and Windoze here).

> Show me a machine that won't yelp when I
> accidentally set a pointer to all bits zero and then dereference, and
> I'll ask you why this silicon is still permitted to walk the streets
> and trouble the dreams of men.

Well, the Unix linker and addressing model doesn't have any special
treatment of address zero. It should take only minor tweaks to the code
generator and linker to make zero a perfectly normal address, and NULL
be somewhere else.

> it is because I am arguing for a machine in which memory follows the model of
> the parochial, narrow, small-minded, unixy/DOSic/Macish/NTian/OS2oid world
> that I am used to - the one that took over the world while you were still
> thinking longingly of an Elliot 803, and I was still wondering if I could
> rememeber the bit patterns of the KL instructions.

Just out of interest, do you know how old either the 803 or I am ?

> No one here is saying
> the struct hack is elegant programming, and I recognise what drove the
> committee to break it.

Good.

> That is what C used to be about; to quote the Rationale, that was
> very much in "the spirit of C." It looks as if C has evolved away from that
> set of roots. Of course there's some foot dragging over that.

The trouble is, that should have been sorted out when the Standard was
written. Not when we're trying to read it.

> I'm a fanatic about wrapper functions, myself. But there is just something
> too elegant for words about x = calloc(1, sizeof *x); for x, any non-void
> pointer type. Because, modulo the NULL and float issues, it creates one of
> anything in a predictable form.

Circular argument. It's only "elegant" if NULL is all zeroes, so that's
not a good reason for NULL to be all zeroes.

And some of us actually think that moderately strong typing is a *good*
thing.

> It is just that the needs of the few outweighed the wishes of the
> many, and as a member of a democratic nation I sort of wonder if the standard
> didn't go a tad too far in supporting the few.

"Tyranny of the majority". Not a concept I'm happy with; people die
everyday in my country because of it.

> So let's polish the standard to perfection. And then, since the standard is
> a treaty between programmer and implementor, let's extend the treaty. Let's
> create additional defines along the lines of (these names not to be taken seriously)
> _WARNING_NULL_ISNT_ALL_BITS_0
> _WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
> _WARNING_ONES_COMPLEMENT_SPOKEN_HERE
> _WARNING_BITS_HIDDEN_WITHIN
> _WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE
> and define them so that the programmer who wants ANSI and some additional
> reassurances aside can at least abort a compile of his code that would
> lead to useless executables.

The Standard is coming up for revision. Some of that might actually be
not unreasonable to propose.

> I've since learned to let my past blend with the mythological
> tales that surround the early prehistory of computer science, back when
> memory came in sizes below 4M chips, when high speed disks were the ones that
> look less than 50ms to seek, when you could shift and subtract faster than do
> a hardware integer divide, and no one supported threading, and cavemen walked
> the earth, programming in languages that did not support recursion.

And furry creatures from Alpha Centauri were real furry creatures from
Alpha Centauri. Eh ?

Clive D.W. Feather

unread,
Aug 3, 1994, 8:18:47 AM8/3/94
to
In article <31nqb8$s...@usenet.pa.dec.com>,

Norman Diamond <dia...@jrd.dec.com> wrote:
>In article <Ctwy9...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>> To address your issues:
>> - One in which #error stopped the compiler;
>> An implementation may successfully translate an invalid program. That
>> comes under the heading of "quality of implementation". You are at
>> liberty to refuse to buy compilers which don't have success/failure
>> reporting to match your requirements.
> That "answer" is a true statement, but it does not address the issue.
> The issue is if implementations can refuse to translate valid programs.

An implementation must accept a strictly conforming program. Such a
program cannot contain any illegal usage. I agree that it is unclear
whether #error in a translation unit falls into that class or not, and I
think this is sloppy wording rather than anything more.

Clive D.W. Feather

unread,
Aug 3, 1994, 8:28:31 AM8/3/94
to
In article <jqbCty...@netcom.com>, Jim Balter <j...@netcom.com> wrote:
>In article <4f.smail...@std.world.com>,
>Scott A Mayo <sm...@world.std.com> wrote:
>> even one in which (oh, forgive me!) a NULL pointer was
>> a bit pattern of all 0's, so calloc'd structures containing pointers
>> could be trusted.
> I don't think you will get any support for the last one, even from the
> staunchest proponents of any of the others (including me). It puts a severe
> and unrealistic physical constraint upon implementations

And, in addition, at least one type of hardware (the Transputer) puts 0
in the *middle* of the address range; addresses are signed integers, not
unsigned ones. A sensible NULL for that machine is either INT_MIN or
INT_MAX.

Alan Watson

unread,
Aug 3, 1994, 2:03:25 PM8/3/94
to
In article <Cty9L...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) wrote:
> At least most of
> the response to this DR will be "guidance in interpretation". I will be
> pushing to have the "no padding bits in char types" explicitly added.

Thanks. This seems essential.

>> (BTW, what I mean by `flipping bits' when an unsigned int was loaded or
>> stored but not an unsigned char, was that if the value zero was written
>> to an unsigned int object though a modifiable lvalue with type unsigned
>> int, all of the bits in that object would appear to be one if they were
>> accessed via an unsigned char * pointer
>
> This means that the object and value representations would use different
> values for the same bit in the same number. I think this is outlawed.

I still have difficulty accepting that hidden bits in the object
representation are somehow `pure', but more complex encodings of the
value representation in the object representation are not. Perhaps the
Committee should consider stating these restriction on the object
representation more formally.

>> Is there a guarantee in the

>> Standard that [padding bits in object representations of the same value
>> representation] do indeed have the same values?


>
> I don't believe so. However, this is a separate issue, though the DR did
> touch on it.
>
> Before claiming that it is unrealistic to have the padding bits be different
> in the two zeros, consider:
>
> u1 = 0;
> u2 = -1; u2++;
>
> which might put a -0 into u2, or:
>
> u1 = u2 = 0;
> do u2++; while (u2 != 0);
>
> which might carry out of the MSB of the value representation into a
> padding bit.

Indeed. Padding bits open a whole can of worms. One of the
disadvantages of allowing them is that the Committee needs to be sure
they fully understand and accept their possible consequences.

Can the value of padding bits change at random? Is

memcmp(&u, &u, sizeof(unsigned))

guaranteed to return zero? (Extra credit will be given for an answer
that encompasses padding in structs and unions. You may assume u is
not volatile.)

--
Alan Watson | The programmer is always assumed to know
al...@oldp.astro.wisc.edu | they is doing and is not hemmed in by
Department of Astronomy | petty restrictions.
University of Wisconsin -- Madison | -- Richards & Whitby-Strevens

Philip Homburg

unread,
Aug 3, 1994, 2:41:11 PM8/3/94
to
In article <Ctypz...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
%As others have pointed out, this may be true of "mass market" computers,
%but not of, e.g., embedded systems.

Can anyone describe what kind of code is ported from 'normal' Unix boxes
or PCs to embedded systems? (e.g. do people write portable TCP/IP
implementations that run on anything from a Cray to a machine with 42 bit
words?)

Is it possible (in the sense of relatively easy) to refine the C standard just
like POSIX 1003.1 did, to give more guarantees about certain issues.
(POSIX defines among other the value of time_t)

For instance, in an "octet based std C" with 8, 16 and 32 bit signed and
unsigned types, implementing TCP/IP protocols is relatively easy (TCP is
only an example of a protocol not tied to a particular architecture, but
strongly tied to 8-bit bytes).

%Well, the Unix linker and addressing model doesn't have any special
%treatment of address zero. It should take only minor tweaks to the code
%generator and linker to make zero a perfectly normal address, and NULL
%be somewhere else.

...and rewrite the library. Some of the system calls on my machine take
null pointers (changing the kernel to use a new null might be a lot of work).

%> That is what C used to be about; to quote the Rationale, that was
%> very much in "the spirit of C." It looks as if C has evolved away from that
%> set of roots. Of course there's some foot dragging over that.
%
%The trouble is, that should have been sorted out when the Standard was
%written. Not when we're trying to read it.
%
%"Tyranny of the majority". Not a concept I'm happy with; people die
%everyday in my country because of it.

But do we really need to alienate the old time C users? If I were to write
a unix kernel for a PDP-11, what kind of extra guarantees do I need from
my compiler vendor? (Maybe asking for a conforming C compiler is just like
asking for a POSIX compiant O.S. Only toy programs are really compliant,
so you have to maintain a list of extra features you need).

%> So let's polish the standard to perfection. And then, since the standard is
%> a treaty between programmer and implementor, let's extend the treaty. Let's
%> create additional defines along the lines of (these names not to be taken seriously)
%> _WARNING_NULL_ISNT_ALL_BITS_0
%> _WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
%> _WARNING_ONES_COMPLEMENT_SPOKEN_HERE
%> _WARNING_BITS_HIDDEN_WITHIN
%> _WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE
%> and define them so that the programmer who wants ANSI and some additional
%> reassurances aside can at least abort a compile of his code that would
%> lead to useless executables.
%
%The Standard is coming up for revision. Some of that might actually be
%not unreasonable to propose.

I would define them the otherway around: as extensions to the standard:
__STDC__IEEE_DOUBLE
__STDC__STRUCT_HACK
__STDC__LITTLE_ENDIAN or __STDC__BIG_ENDIAN
__STDC__STRICT_ALIGNMENT
__STDC__STRICT_STRUCT_PADDING
__STDC__NULL_0
__STDC__2S_COMPLEMENT

Philip Homburg

Mark Brader

unread,
Aug 3, 1994, 12:49:58 PM8/3/94
to
> > one in which the struct hack worked,
> Why are people so wedded to this ?

In one sentence:

Because it's legal according to the standard, and has been ruled otherwise.

In three paragraphs (it was going to be one, but there are too many
good reasons...):

Because a number of quite competent standard-interpreters have agreed
on a proof that it's legal according to the standard (Chris Volpe
posted a form of the proof in the "struct hack" thread), one that
depends only on language in the standard and not on interpretation.
If this had not been the case, we would have objected at public review
time, and I, at least, am angry at seeing the review process sidestepped
by a ruling that the standard does not mean what it says -- nor what
the Rationale implies (3.5.4.2) that it was intended to say.

The supposed counter-proof, put forward by other normally :-) quite
competent standard-interpreters and *not* supported in the ruling by
any citation from the standard, claims that two pointers of the same
type pointing to the same object can be distinguishable only in that
certain operations one on of them cause undefined behavior. This not
only depends on a strained interpretation (3.1.2.5/6.1.2.5 says only
that a pointer provides "a reference to an object"), but is also a new
and unexplored concept in a C context, which should not be, or have been,
introduced without review outside of the working group. Remember noalias?

And in addition, even if there *are* two plausible interpretations of
the wording in the standard -- which I don't concede, but the list of
people on the Other Side is too weighty for me to fail to mention the
possibility -- then, unless there is a significant cost to implementing
it, the committee should *always* make the choice that allows the
*programmer* the greatest flexibility. Thus in this situation, any
doubtful construct should be declared legal.

> It's easy to avoid ...

And horrible style. You won't catch *me* using it... unless I've profiled
the code enough to know it'll save me significant time over the way you
suggest, which does need an extra indirection. *But that's not the point.*

Why ask? It's not as if we hadn't been through this before. You were there.

I suggest that followups on the legality issue be made in the struct hack
thread rather than this one. I answered this here because it was posted here.
--
Mark Brader | "As penance, I suppose I should read the standard
m...@sq.com | again, but I've already lost as much hair as
SoftQuad Inc., Toronto | I can afford." -- Tom Kelly

Norman Diamond

unread,
Aug 4, 1994, 12:23:24 AM8/4/94
to
In article <CtyKv...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>In article <31nqb8$s...@usenet.pa.dec.com>,
>Norman Diamond <dia...@jrd.dec.com> wrote:
>>In article <Ctwy9...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>>> To address your issues:
>>> - One in which #error stopped the compiler;
>>> An implementation may successfully translate an invalid program. That
>>> comes under the heading of "quality of implementation". You are at
>>> liberty to refuse to buy compilers which don't have success/failure
>>> reporting to match your requirements.

>> That "answer" is a true statement, but it does not address the issue.
>> The issue is if implementations can refuse to translate valid programs.

>An implementation must accept a strictly conforming program. Such a
>program cannot contain any illegal usage.

I think we agree on *that*. (And furthermore an implementation must
accept any program that does not contain any illegal usage, even if
the output depends on implementation-defined behavior, though of course
acceptance does not require translation or execution in the presence
of translation limits.)

>I agree that it is unclear whether #error in a translation unit falls
>into that class or not,

Whom do you agree with? The standard very clearly *refuses* to place
#error into that class.

>and I think this is sloppy wording rather than anything more.

If it was intended for #error to stop translation, then I agree.
But regardless of our agreement on the reason, a technical corrigendum
is necessary in order to accomplish this intention.

I would not dare to second-guess the intention when there is no sign
of any consideration for a #warning directive. If you know the intention,
I'll believe you. However, the text still takes priority.

Jim Hill

unread,
Aug 3, 1994, 11:13:29 PM8/3/94
to
In article <Ctypz...@scone.london.sco.com>,

Clive D.W. Feather <cl...@sco.com> wrote:
>
>And, to paraphrase Henry Spencer, whenever you see something weird in
>the Standard, check the IBM AS/400.

Thank you for my first belly laugh this week.

>In article <9d.smail...@world.std.com>,
>Scott A Mayo <sm...@world.std.com> wrote:
>> So let's polish the standard to perfection. And then, since the standard is
>> a treaty between programmer and implementor, let's extend the treaty. Let's
>> create additional defines along the lines of (these names not to be taken seriously)
>> _WARNING_NULL_ISNT_ALL_BITS_0
>> _WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
>> _WARNING_ONES_COMPLEMENT_SPOKEN_HERE
>> _WARNING_BITS_HIDDEN_WITHIN
>> _WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE
>> and define them so that the programmer who wants ANSI and some additional
>> reassurances aside can at least abort a compile of his code that would
>> lead to useless executables.
>
>The Standard is coming up for revision. Some of that might actually be
>not unreasonable to propose.

Hey, I vote for the names as written here. The standard's too dry, and the
meanings are completely unmistakeable. The vendors' suits would have apoplexies,
so I know it'll never happen. Sigh.

>In article <9d.smail...@world.std.com>,
>Scott A Mayo <sm...@world.std.com> wrote:
>> a hardware integer divide, and no one supported threading, and cavemen walked

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Grrrrrr. Just what kinds of machines _have_ you programmed for? Threading goes back
to the dark ages, friend. Cavemen speak it as a native tongue, and wonder why all
the fuss.

Jim
--
Jim Hill
jth...@netcom.com.

Scott A Mayo

unread,
Aug 5, 1994, 3:18:58 AM8/5/94
to
In comp.std.c, cl...@sco.com (Clive D.W. Feather) wrote:

<And, to paraphrase Henry Spencer, whenever you see something weird in the
<Standard, check the IBM AS/400.

My most nightmarish of fears is confirmed.

<Just out of interest, do you know how old either the 803 or I am ?

Not a clue in either case. I've never heard of an Elliot 803, and if
it has 77 bit words, I don't really want to learn. I cannot concieve
of a purpose for it. And if there is one, I'd rather cherish my ignorance.

<>I'm a fanatic about wrapper functions, myself. But there is just something
<>too elegant for words about x = calloc(1, sizeof *x); for x, any non-void
<>pointer type. Because, modulo the NULL and float issues, it creates one of
<>anything in a predictable form.

<Circular argument. It's only "elegant" if NULL is all zeroes, so that's not
<a good reason for NULL to be all zeroes.

Not circular. IF the NULL pointer is all zeros, THEN calloc can be
seen as a Universal Constructor. Else, there cannot be a universal
constructor at all. Thus, I wish for NULL to be all bits 0. But I've
had enough mail from folk convincing me that there are enough machines
out there what don't think in those terms for me to abandon hope on
this one.

<And some of us actually think that moderately strong typing is a *good*
<thing.

calloc, by definition, cannot be strongly typed. If you will, it exists
at the level below which typing can be meaningful. In a really strongly
typed language it would not exist; there would be only constructors. But
since calloc exists, it is not wrong to wish that it served a role in the
initialization of types involving pointers. Impractical, but not wrong.

<>So let's polish the standard to perfection. And then, since the standard is
<>a treaty between programmer and implementor, let's extend the treaty. Let's
<>create additional defines along the lines of (these names not to be taken
<>seriously) _WARNING_NULL_ISNT_ALL_BITS_0
<>_WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
<>_WARNING_ONES_COMPLEMENT_SPOKEN_HERE _WARNING_BITS_HIDDEN_WITHIN
<>_WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE and
<>define them so that the programmer who wants ANSI and some additional
<>reassurances aside can at least abort a compile of his code that would lead
<>to useless executables.

<The Standard is coming up for revision. Some of that might actually be not
<unreasonable to propose.

Are you in a position to propose them? *Any* additional treaty points that
give the humble C scribe a better chance of defending his code aginst
the quirks of the environment it lands is, are good things. I see a new
header, filled with defines that give the coder a clue.

While you are at it, is there ANY hope that sizeof will someday be understood
by the preprocessor? I know why it isn't, but it hurts nonetheless.

<>I've since learned to let my past blend with the mythological tales that
<>surround the early prehistory of computer science, back when memory came in
<>sizes below 4M chips, when high speed disks were the ones that look less
<>than 50ms to seek, when you could shift and subtract faster than do a
<>hardware integer divide, and no one supported threading, and cavemen walked
<>the earth, programming in languages that did not support recursion.

<And furry creatures from Alpha Centauri were real furry creatures from Alpha
<Centauri. Eh ?

Displaying a working knowledge of Douglas Adams is not sufficient convince
me that you aren't one of those cavemen who remember life before recursion,
but I will pull in my estimate of your age by ten years. :-)

Ronald F. Guilmette

unread,
Aug 5, 1994, 3:33:38 AM8/5/94
to
In article <9d.smail...@world.std.com> sm...@world.std.com (Scott A Mayo) writes:
>...In the meantime, the only

>potential competitor to 32 bits is 64. Sizes like 36, 39, 77, and even 16 are
>just dead, dead, dead. You can still make a case for 80, though...

Your presumptions regarding the death of 16-bit CPUs is highly premature
(to say the least).

>It is just that the needs of the few outweighed the wishes of the
>many, and as a member of a democratic nation I sort of wonder if the standard
>didn't go a tad too far in supporting the few.

If you think the *C* standard has a problem in this regard, you're just
gonna *love* some of the things that nutty C++ crowd has been doing. (And
you may want to check out my recent diatribe in comp.std.c++ regarding
C++ `diagraphs' among other things.)

>So let's polish the standard to perfection. And then, since the standard is
>a treaty between programmer and implementor, let's extend the treaty. Let's
>create additional defines along the lines of (these names not to be taken seriously)
> _WARNING_NULL_ISNT_ALL_BITS_0
> _WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
> _WARNING_ONES_COMPLEMENT_SPOKEN_HERE
> _WARNING_BITS_HIDDEN_WITHIN
> _WARNING_TINY_SEGMENTED_MEMORY_YOUR_MALLOC_OF_10000000_WONT_WORK_HERE

Actually, I (for one) wouldn't mind a bit if the *next* C standard required
the implementation to provide a header file which contained defines (not
very different from the ones you have suggested) which the programmer could
test to find out about *all* of the characteristics of the implementation.

(I already have a set of programs which *manufacture* just such as file as
part of the installation procedure for my test suites. They don't quite
determine all the important charactistics yet, but at least I find out
things like whether adding two floats together actually gets done in
single or double precision.)

--

-- Ron Guilmette, Sunnyvale, CA ---------- RG Consulting -------------------
---- domain addr: r...@netcom.com ----------- Purveyors of Compiler Test ----
---- uucp addr: ...!uunet!netcom!rfg ------- Suites and Bullet-Proof Shoes -

Ronald F. Guilmette

unread,
Aug 5, 1994, 4:03:54 AM8/5/94
to
In article <1994Aug3.1...@sq.sq.com> m...@sq.sq.com (Mark Brader) writes:
>
>In three paragraphs (it was going to be one, but there are too many
>good reasons...):
>
> Because a number of quite competent standard-interpreters have agreed
> on a proof that it's legal according to the standard (Chris Volpe
> posted a form of the proof in the "struct hack" thread), one that

I never paid that much attention to these ``struct hack'' threads... either
the current one or previous incarnations... but I just *skimmed* Chris
Volpe's posting, and it did seem rather reasonable to me.

> depends only on language in the standard and not on interpretation.
> If this had not been the case, we would have objected at public review
> time, and I, at least, am angry at seeing the review process sidestepped
> by a ruling that the standard does not mean what it says -- nor what
> the Rationale implies (3.5.4.2) that it was intended to say.

Some free advice Mark: Dont' get mad. Get even. :-)

But seriously. Why all the angst?

I get that feeling that there are at least a few people (e.g. you, Chris,
and I don't know who else) who could make a good case to X3J11/WG14 that
they messed up when they decided that the struct hack yields undefined
behavior. So why not just make the case, and wait patiently (and politely)
and hope that they see the errors of their ways and reverse their earlier
decision? I mean why get tense about it?

(I, for one, sure hope that X3J11/WG14 are not above accepting the notion
that they might have missed some important points, here and there, when
reaching this decision or that decision. If they *do* think that they
they should never reconsider any decisions they may have reached, then
I, for one, am screwed because there were a bunch of things decided at
the X3J11 meeting in San Jose that I think were not quite right, and that
I have been meaning to file some sort of follow-ups on. I just haven't
gotten the time yet. But I'm not worried. I have every confidence that
once X3J11/WG14 obtains the full benefit of the Pure Light(tm) that I
shall cast upon these issues, they are certain to understand the Truth
of these matters, and recant. :-) :-)

(For the benefit of the humor impared, let me just clarify that that last
sentence is an attempt at self-depricating humor. The preceeding material
is all 100% serious however.)

Norman Diamond

unread,
Aug 8, 1994, 4:48:22 AM8/8/94
to
In article <2AUG1994...@lando.hns.com> t_akr...@bart.hns.com (Ajoy KT) writes:
>In article <31b01o$g...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...
>>Let us assume an implementation with 16 bit ints and 16 bit chars.

Well, our latest problem can occur even with 16 bit ints and 8 bit chars,
if plain char is signed. What a strange implementation, eh? :-)

>>Then assume that the character constant '$' is negative.

Yup.

>>Then the value of '$' cannot be represented in an unsigned char, because
>>that can only hold non-negative values. So, unless '$' == EOF, '$' is not
>>a legal argument to isprint in such an implementation.

>OK; I agree.

Me too.

>>The common implementations with negative characters have only 8-bit
>>characters. This is not a problem. If '$' in any such character set is
>>negative, isprint ('$') is not legal, but isprint (unsigned char)'$') is,

>I am not sure whether (unsigned char)'$' would be interpreted by isprint()
>to be the character '$'. In other words, I can see no guarantee that
>isprint((unsigned char)'$'); actually tells us whether '$' is a print-

>character or not (it just tells us whether (unsigned char)'$' is a
>printable-character or not).

Yup. And I've been troubled by this for a few days too. If '$' is passed
to printf() with a %c format, it is valid, because '$' fits in an int.
But will it print as '$'?

Now, %s format pairs with a pointer to ANY character type. Unsigned char
can contain (unsigned char) '$' and it will print as ... something.
Signed char can contain '$' and it will print as ... something.
And if my implementation, aside from those strange 16-bit ints and
strange signed 8-bit chars, now happens to be one's complement, then
we really are in trouble. The printouts can't both be '$', I think.

>>The way out is, of course, to demand that UCHAR_MAX <= INT_MAX for
>>hosted implementations.

I've heard a rumor that the committee is actually considering this.
But... it is only part-way out. One's complement is still troublesome.

>Programmers typically use isprint(ch); prior art is that way.

Yup... and I find it hard to believe that isprint((unsigned char) ch)
was the prior art.

Clive D.W. Feather

unread,
Aug 8, 1994, 6:31:32 AM8/8/94
to
In article <Ctz2...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
>In article <Ctypz...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>> Well, the Unix linker and addressing model doesn't have any special
>> treatment of address zero. It should take only minor tweaks to the code
>> generator and linker to make zero a perfectly normal address, and NULL
>> be somewhere else.
> ...and rewrite the library. Some of the system calls on my machine take
> null pointers (changing the kernel to use a new null might be a lot of work).

Why ? If the library is well written, it will use "NULL", or "(char *) 0",
or other source code which specifies a null pointer constant *irrespective*
of internal appearance. Recompile the library, perhaps.

Which is the topic I was following up to in the first place.

--
Clive D.W. Feather | Santa Cruz Operation | If you lie to the compiler,
cl...@sco.com | Croxley Centre | it will get its revenge.

Phone: +44 1923 813541 | Hatters Lane, Watford | - Henry Spencer
Fax: +44 1923 813811 | WD1 8YN, United Kingdom | <= NOTE: NEW PHONE NUMBERS

Clive D.W. Feather

unread,
Aug 8, 1994, 8:50:13 AM8/8/94
to
In article <114.smail...@std.world.com>,

Scott A Mayo <sm...@world.std.com> wrote:
>In comp.std.c, cl...@sco.com (Clive D.W. Feather) wrote:
>>> _WARNING_NULL_ISNT_ALL_BITS_0
>>> _WARNING_DOUBLE_0_ISNT_ALL_BITS_0_EITHER
[etc.]

>> The Standard is coming up for revision. Some of that might actually be not
>> unreasonable to propose.
> Are you in a position to propose them? *Any* additional treaty points that
> give the humble C scribe a better chance of defending his code aginst
> the quirks of the environment it lands is, are good things. I see a new
> header, filled with defines that give the coder a clue.

I am in a position, and so are you. Just write up a decent proposal and
pass it to your National Standards Body.

However, this one might not be accepted. To me, it rather smacks of
subsetting of the Standard, which the Pascal experience shows is
probably a bad thing.

But feel free to submit it. With good rationale, it might be accepted.

> While you are at it, is there ANY hope that sizeof will someday be understood
> by the preprocessor? I know why it isn't, but it hurts nonetheless.

I doubt it. It requires too much interaction between the preprocessor
and the compiler proper.

>>> cavemen walked
>>> the earth, programming in languages that did not support recursion.
>> And furry creatures from Alpha Centauri were real furry creatures from Alpha
>> Centauri. Eh ?
> Displaying a working knowledge of Douglas Adams is not sufficient convince
> me that you aren't one of those cavemen who remember life before recursion,
> but I will pull in my estimate of your age by ten years. :-)

I'd like to know what that estimate is. I'm not the person who invented
the subroutine, but I used to see him most days.

Actually, the first language standard I know of to explicitly allow recursion
predates the Fortran standard by 4 years.

Peter Holzer

unread,
Aug 8, 1994, 11:14:31 AM8/8/94
to
dia...@jrd.dec.com (Norman Diamond) writes:

>In article <2AUG1994...@lando.hns.com> t_akr...@bart.hns.com (Ajoy KT) writes:
>>In article <31b01o$g...@news.tuwien.ac.at>, h...@vmars.tuwien.ac.at (Peter Holzer) writes...

>>>The common implementations with negative characters have only 8-bit


>>>characters. This is not a problem. If '$' in any such character set is
>>>negative, isprint ('$') is not legal, but isprint (unsigned char)'$') is,

>>I am not sure whether (unsigned char)'$' would be interpreted by isprint()
>>to be the character '$'. In other words, I can see no guarantee that
>>isprint((unsigned char)'$'); actually tells us whether '$' is a print-
>>character or not (it just tells us whether (unsigned char)'$' is a
>>printable-character or not).

>Yup. And I've been troubled by this for a few days too. If '$' is passed
>to printf() with a %c format, it is valid, because '$' fits in an int.
>But will it print as '$'?

Yes. All printing occurs through fputc, and fputc converts its argument
to unsigned char. fputc ('$') and fputc ((unsigned char) '$') must
print the same if the conversion from unsigned char to int is
reversible.

>Now, %s format pairs with a pointer to ANY character type.

Wow, I hadn't thought of that.

>Unsigned char
>can contain (unsigned char) '$' and it will print as ... something.
>Signed char can contain '$' and it will print as ... something.
>And if my implementation, aside from those strange 16-bit ints and
>strange signed 8-bit chars, now happens to be one's complement, then
>we really are in trouble. The printouts can't both be '$', I think.

Yes, you are right. In a one's complement implementation, if you pass a
(signed char *) to printf and it expects an (unsigned char *), all the
negative chars will come out off by one. So a ones complement
implementation cannot have any printable characters outside
[0..SCHAR_MAX].

>>Programmers typically use isprint(ch); prior art is that way.

>Yup... and I find it hard to believe that isprint((unsigned char) ch)
>was the prior art.

Yes, prior art assumes that all characters in a text file are positive.
Prior art was also to use isascii before other ctype functions. But
currently there are many systems where

* the default char type is signed,
* there are more than 128 characters,
* EOF == -1,
* character 255 (== -1) is a useful character,

so you have cast plain char's to unsigned char before passing them to
any ctype function if you want ypur program to be portable. If EOF
wasn't a legal argument to ctype functions, they could cast to
(unsigned char) internally. Unfortunately EOF has been a legal argument
to those functions for a long time, so I don't think this will change.

hp
--
_ | h...@vmars.tuwien.ac.at | Peter Holzer | TU Vienna | CS/Real-Time Systems
|_|_) |------------------------------------------------------------------------
| | | It's not what we don't know that gets us into trouble, it's
__/ | what we know that ain't so. -- Will Rogers

Philip Homburg

unread,
Aug 8, 1994, 1:14:49 PM8/8/94
to
In article <Cu7p8...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
%In article <Ctz2...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
%>In article <Ctypz...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
%>> Well, the Unix linker and addressing model doesn't have any special
%>> treatment of address zero. It should take only minor tweaks to the code
%>> generator and linker to make zero a perfectly normal address, and NULL
%>> be somewhere else.
%> ...and rewrite the library. Some of the system calls on my machine take
%> null pointers (changing the kernel to use a new null might be a lot of work).
%
%Why ? If the library is well written, it will use "NULL", or "(char *) 0",
%or other source code which specifies a null pointer constant *irrespective*
%of internal appearance. Recompile the library, perhaps.

time_t time(time_t *tloc)
{
time_t tmp_time1, tmp_time2;

if (tloc == 0)
return syscall1(SYS_TIME, (unsigned)0);
else if ((unsigned)tloc == 0)
{
tmp_time1= syscall1(SYS_TIME, (unsigned)&tmp_time2);
if (tmp_time1 != (time_t)-1)
*tloc= tmp_time2;
return tmp_time1;
}
else
return syscall1(SYS_TIME, tloc);
}

Another interesting one is

#define SIG_DFL (void (*)())0

I guess that every place where null pointers get passed to the O.S. needs
to be examined. Unless the above implementation of time() is considered
'normal', and the simple

time_t time(time_t *tloc) { return syscall1(SYS_TIME, (unsigned)tloc); }

the exception.

Philip Homburg

Clive D.W. Feather

unread,
Aug 8, 1994, 2:47:15 PM8/8/94
to
In article <Cu87w...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
> time_t time(time_t *tloc)
> {
> time_t tmp_time1, tmp_time2;
>
> if (tloc == 0)
> return syscall1(SYS_TIME, (unsigned)0);
> else if ((unsigned)tloc == 0)
> {
> tmp_time1= syscall1(SYS_TIME, (unsigned)&tmp_time2);
> if (tmp_time1 != (time_t)-1)
> *tloc= tmp_time2;
> return tmp_time1;
> }
> else
> return syscall1(SYS_TIME, tloc);
> }

I presume this means that the kernel is using the same register for
pointer and integer arguments ? Unlike the 680x0, where A0 was used for
the first pointer argument and D1 for the first integer ?

Your code also never sets tmp_time1, but uses it. And if you think NULL
should be all zero bits, how is the inner block ever invoked ?

I suspect that if the first syscall was changed to:

return syscall1(SYS_TIME, (unsigned)(time_t *)0);

and if the kernel code was changed to match, then it would work even for
non-zero NULLs.

Karl Heuer

unread,
Aug 8, 1994, 10:16:22 PM8/8/94
to
In article <325i4n$6...@news.tuwien.ac.at>

h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>If EOF wasn't a legal argument to ctype functions, they could cast to
>(unsigned char) internally. Unfortunately EOF has been a legal argument to
>those functions for a long time, so I don't think this will change.

I never use an expression that might have the value EOF as an argument to
a ctype function. I find it quite annoying that EOF is even in the domain;
as a matter of programming style, I believe that EOF should be the *first*
thing you check for, before worrying about what type of character you have.

If I were rewriting the Standard Library from scratch, I wouldn't use that
bastard type "unsigned char or the integer EOF" at all. It's not hard to
design something that works at least as well as getc(), without requiring
an out-of-band value.

Karl W. Z. Heuer (ka...@kelp.boston.ma.us), The Walking Lint

Norman Diamond

unread,
Aug 8, 1994, 11:13:59 PM8/8/94
to
In article <Cu7p8...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>In article <Ctz2...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
>>In article <Ctypz...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>>> It should take only minor tweaks to the code generator and linker to
>>> make zero a perfectly normal address, and NULL be somewhere else.

>> ...and rewrite the library. Some of the system calls on my machine take
>> null pointers (changing the kernel to use a new null might be a lot of work).

>Why ? If the library is well written, it will use "NULL", or "(char *) 0",
>or other source code which specifies a null pointer constant *irrespective*
>of internal appearance. Recompile the library, perhaps.

If the library is well written, it is not written in C. It is necessary
to tweak the code generator, library, and maybe header files.

The kernel generally isn't considered to be part of the C implementation,
just as the machine isn't. You don't talk about re-etching some chips
in order to make a C implementation conform, do you? If you have to write
a bizarre library in order to make a C implementation conform to a weird
architecture and kernel, just as you have to write a bizarre code generator
to work with a weird architecture, then that's what you do (or else you
don't make a conforming C implementation).

Larry Jones

unread,
Aug 8, 1994, 2:10:27 PM8/8/94
to
In article <rfgCu1...@netcom.com>, r...@netcom.com (Ronald F. Guilmette) writes:
> (I, for one, sure hope that X3J11/WG14 are not above accepting the notion
> that they might have missed some important points, here and there, when
> reaching this decision or that decision.

Obviously Ron didn't attend any of the early X3J11 meetings where the
committee reversed itself three or four times during a single week of
meetings as various committee members studied an issue in depth and
discovered hitherto unknown problems with the current state of affairs.
The C Standard is a complex document with lots of little interlocking
tendrils between different parts and it's real hard to keep track of it
all, even for those of us who have been involved since the early days
(and there aren't too many of us left). I think the committee has
always had admirably open minds, and I don't think that's changed -- if
you really think we blew something, make your case and I'm sure you'll
get a fair hearing.
----
Larry Jones, SDRC, 2000 Eastman Dr., Milford, OH 45150-2789 513-576-2070
larry...@sdrc.com
Archaeologists have the most mind-numbing job on the planet. -- Calvin

Peter Holzer

unread,
Aug 9, 1994, 10:47:50 AM8/9/94
to
ka...@kelp.boston.ma.us (Karl Heuer) writes:

>In article <325i4n$6...@news.tuwien.ac.at>
>h...@vmars.tuwien.ac.at (Peter Holzer) writes:
>>If EOF wasn't a legal argument to ctype functions, they could cast to
>>(unsigned char) internally. Unfortunately EOF has been a legal argument to
>>those functions for a long time, so I don't think this will change.

>I never use an expression that might have the value EOF as an argument to
>a ctype function. I find it quite annoying that EOF is even in the domain;
>as a matter of programming style, I believe that EOF should be the *first*
>thing you check for, before worrying about what type of character you have.

I agree absolutely. I don't think I ever saw a program which passed EOF
to a ctype function. But all C-implementations I have seen allowed it
(I do not claim know most C-Implementations. I only learned C 8 years
ago and all my experience is with Unix and DOS implementations).
Nevertheless, I doubt that ANSI will remove EOF from the range of
values that can be passed to a ctype function: They didn't do it the
first time, when they made a lot of other changes which were
incompatible with at least some implementations. Even if they did, we
would still have to deal with many existing implementations where
passing an uncast char to a ctype function simply didn't work for some
time.

Ronald F. Guilmette

unread,
Aug 10, 1994, 4:04:58 AM8/10/94
to
In article <96...@heimdall.sdrc.com> scj...@thor.sdrc.com (Larry Jones) writes:
>In article <rfgCu1...@netcom.com>, r...@netcom.com (Ronald F. Guilmette) writes:
>> (I, for one, sure hope that X3J11/WG14 are not above accepting the notion
>> that they might have missed some important points, here and there, when
>> reaching this decision or that decision.
>
>... I think the committee has

>always had admirably open minds, and I don't think that's changed -- if
>you really think we blew something, make your case and I'm sure you'll
>get a fair hearing.

Thank you Larry, for confirming my own view.

I would strongly agree that X3J11 (and, by implication WG14 also) have made
all reasonable efforts to be responsive to legitimate concerns (and, in par-
ticular, to defect reports) submitted both by members and by the public at
large, even in those cases where the questions posed (to the committee)
might seem to again traverse old territory.

While I have not always agreed with all of the technical details of all
responses provided by X3J11/WG14 (e.g. to defect reports) I greatly re-
spect and appreciate the willingness of these committees to consider (and
even reconsider) the issues involved to the best of their ability.

Jim Balter

unread,
Aug 11, 1994, 1:15:22 AM8/11/94
to
In article <Ctypz...@scone.london.sco.com>,
Clive D.W. Feather <cl...@sco.com> wrote:
>> I'm a fanatic about wrapper functions, myself. But there is just something
>> too elegant for words about x = calloc(1, sizeof *x); for x, any non-void
>> pointer type. Because, modulo the NULL and float issues, it creates one of
>> anything in a predictable form.
>
>Circular argument. It's only "elegant" if NULL is all zeroes, so that's
>not a good reason for NULL to be all zeroes.

Whoa. Aside from all other questions, this most certainly is not a circular
argument! Being able to use calloc to initialize objects is a justification
for NULL being all zero bits; NULL being all zero bits is most certainly
not a justification for using calloc to initialize objects; such justification
would be found in consistency, ease of implementation, low implementation
cost, etc. Generally, if an implementation choice leads to elegant algorithms,
then the elegant algorithms justify the implementation choice; there is no
circularity. E.g., the ASCII committee need not have put the digits in numeric
order, but that yields "elegant" algorithms for number conversion (a subtract
vs. a table lookup). By your lights, the subtraction is ``only "elegant"
[actually only *possible*] if [the digits are in order], so that's not a good
reason for [the digits to be in order].'' Likewise all-zero bit representations
for numeric zero and 1- or 2- complement representations (vs., say, gray code
or something entirely random) are chosen precisely because they lead to
"elegant" hardware implementations.

>And some of us actually think that moderately strong typing is a *good*
>thing.

That's a bit of a red herring. Strong typing is no more a reason for not
having a single clear primitive than it is a reason for not having a single
memory allocation primitive ("new" in strongly typed languages such as C++).
If NULL were all zero bits, then an object_init primitive could be implemented
via a single memset, otherwise it would be implemented in a more complicated
manner. The problem is that the committee (and the language designers that
preceded them) didn't recognize that, every time you move away from a direct
hardware mapping and toward more abstraction, you need to provide primitives
to bridge the gap. Thus, if C included a dynamic object initialization
primitive, there would be no temptation to use calloc or memset to initialize
objects, and if there were a typeof primitive, then all sorts of assumptions
about equivalence of type size or representation could more readily be avoided.

The *valid* arguments against requiring all zero bits for NULL are rooted in
the fact that hardware designers have chosen other values to provide
"elegance" (low cost) in other areas, such as address space layout or hardware
detection of a dereference of NULL.


--
<J Q B>

Jim Balter

unread,
Aug 11, 1994, 1:30:48 AM8/11/94
to
In article <Cu87w...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
>time_t time(time_t *tloc)
>{
> time_t tmp_time1, tmp_time2;
>
> if (tloc == 0)
> return syscall1(SYS_TIME, (unsigned)0);
> else if ((unsigned)tloc == 0)
> {
> tmp_time1= syscall1(SYS_TIME, (unsigned)&tmp_time2);
> if (tmp_time1 != (time_t)-1)
> *tloc= tmp_time2;
> return tmp_time1;
> }
> else
> return syscall1(SYS_TIME, tloc);
>}
>
>Another interesting one is
>
>#define SIG_DFL (void (*)())0

POSIX does not require this particular value.

>I guess that every place where null pointers get passed to the O.S. needs
>to be examined. Unless the above implementation of time() is considered
>'normal', and the simple
>
>time_t time(time_t *tloc) { return syscall1(SYS_TIME, (unsigned)tloc); }
>
>the exception.

I don't understand the point. Nothing mandates a catchall routine like
syscall1, but it can readily be implemented as a varargs routine. E.g.,

int syscall1(int operation, ...);

time_t time(time_t *tloc) { return syscall1(SYS_TIME, tloc); }

Presumably implementations with non-zero NULL use this or some other
sensible method. (Of course, since the implementation might not be written
in strictly conforming C, it might use a table lookup to gather the
arguments rather than actually using the stdarg.h mechanism.)

--
<J Q B>

Jim Balter

unread,
Aug 11, 1994, 9:47:42 PM8/11/94
to
In article <CuDIv...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
>I took the following statement from Clive D.W. Feather literally:
>-> Well, the Unix linker and addressing model doesn't have any special
>-> treatment of address zero. It should take only minor tweaks to the code
>-> generator and linker to make zero a perfectly normal address, and NULL
>-> be somewhere else.

Sorry, I misunderstood the context. While I agree that neither the linker nor
the addressing model make any assumptions about the representation of NULL,
that is only a subset of the considerations, and in practice no set of
"minor tweaks" will suffice. For example, many programs, including UNIX
kernels, assume that NULL pointers can be initialized by setting them to zero,
possibly through bss initialization. Some UNIX kernels themselves make such
an assumption, and for some that initialization was done by the bootstrap
by setting all memmory to all zero bits. In addition, there are an untold
number of cases of type punning via casts or unions that implicitly assume
that NULL is represented by all zero bits. There may even be address
comparisons that depend upon all addresses comparing > NULL.

In the real world, people don't write strictly conforming code, even when
there is no advantage not to.

>I needed some syntax to represent stuffing arguments in registers and trapping
>into the kernel. Varargs routines are not normally used to implement
>the lowest layer of the system call interface since the routine would have
>trouble telling the number of arguments (of course the sparc and some other
>risc architectures are exceptions, because of their calling convertion)

The point isn't so much the syntax as that you have defined syscall1 as
taking an unsigned arg, and thus forced the caller to convert pointer
args before passing them. That is unnecessary. *Syntactically*, you can
avoid that by either declaring syscall1 to be a varargs routine, or to
have different syscall routines that take different argument types, or
by not implementing time etc. in C at all, and using assembly language
interfaces. That's how it is usually done; there is no need to stuff
arguments into registers because the arguments are already in place as
a result of the call to time. All you need is the appropriate trap
instruction. As most kernels and libraries are implemented, tloc from
the time call has been stored in some appropriate place, unmodified.
If the kernel accesses it with the proper pointer type, it will still
be unmodified. Everything should work just fine, regardless of the
representation of NULL, *unless* you insist upon an explicit cast.

If your API includes a syscall1 interface through which you need to squeeze
all one-argument system calls, then you are kind of stuck if the API also
demands that the argument is of type unsigned int. But, if syscall1 is
declared varargs, then it can be implemented via a table or a switch
statement that accesses the argument via va_arg with the appropriate
type, and then calls the assembly langauge interface. Unfortunately,
UNIX programmers are traditionally so distant from notions of strong typing
that they *do* define interfaces like "int syscall1(int type, unsigned arg);".
--
<J Q B>

Clive D.W. Feather

unread,
Aug 11, 1994, 4:46:10 PM8/11/94
to
In article <CuDIv...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
>I took the following statement from Clive D.W. Feather literally:
>-> Well, the Unix linker and addressing model doesn't have any special
>-> treatment of address zero. It should take only minor tweaks to the code
>-> generator and linker to make zero a perfectly normal address, and NULL
>-> be somewhere else.

That is what I originally said, and you have shown I am wrong.

> I took it that I can take a standard Unix kernel and change the C compiler and
> linker, etc. Later Clive said that he also assumes that the kernel uses
> the same new bit pattern for NULL.

I still maintain that the idea of NULL is all-zero is *not* built into
Unix. If you change the C compiler to believe that NULL is (say)
0xFEDCBA98, then I claim that all that is needed is:
* change the point in the linker that decides what address each segment
is loaded at so as to avoid that address
* recompile the kernel, libraries, and utilities

In other words, the same as if you were changing a kernel parameter.
This is *not* a major recode.

> I think that if you have to change
> the kernel you are not talking about Unix but about an arbitrary system
> where both kernel and applications are written in C.

Unix *is* "an arbitrary system where both kernel and applications
are written in C". What I am saying is that, if the code has been
written sensibly, then it doesn't have any deep requirement that NULL be
zero.

Even your syscall1 function will work if the mapping between integers
and pointers is the obvious one; the non-zero NULL will map to a
non-zero integer in the library, and back again in the kernel.

Remember that "NULL" and "(anytype *) 0" in the source code does *not*
mean all-zero bits in the binary. So, if the kernel code looked like
this:

int syscall1 (unsigned function, unsigned arg1)
{
switch (function)
{
/* ... */
case SYS_TIME:
{
time_t t = get_the_system_clock ();
time_t *tloc = (time_t *) arg1;

if (tloc != NULL)
*tloc = t;
return t;
}
/* ... */
}
}

it will work no matter what NULL looks like.

Of course, if it looks like this:

case SYS_TIME:
{
time_t t = get_the_system_clock ();

if (arg1 != 0)
*(time_t *) arg1 = t;
return t;
}

then it won't, but that's bad programming anyway. Note though, that this:

case SYS_TIME:
{
time_t t = get_the_system_clock ();

if (arg1 != (unsigned)(time_t *) 0)
*(time_t *) arg1 = t;
return t;
}

will work, but if you're going to do that, you might as well do it right.

See ?

Clive D.W. Feather

unread,
Aug 12, 1994, 3:36:45 AM8/12/94
to
In article <jqbCuE...@netcom.com>, Jim Balter <j...@netcom.com> wrote:
> For example, many programs, including UNIX
> kernels, assume that NULL pointers can be initialized by setting them to zero,
> possibly through bss initialization. Some UNIX kernels themselves make such
> an assumption, and for some that initialization was done by the bootstrap
> by setting all memmory to all zero bits.

Hmm. I don't *quite* agree with what you've said, but you do bring to
mind a valid point.

The compiler must arrange that most integer, floating-point, and pointer
variables not explicitly initialized are implicitly initialized to 0, 0.0,
and NULL respectively. If all three are all-zero bit patterns, this can
be done with one BSS segment, flooded with zeroes at startup. If not,
then the compiler, either through a startup routine or through kernel
assistance, must support at least three such segments (e.g. BSSI, BSSF,
BSSP) which are flooded with the appropriate patterns at startup. This
*is* a non-trivial change.

And, of course, the kernel must arrange the flooding of its own BSS*
segments at boot time.

> In addition, there are an untold
> number of cases of type punning via casts or unions that implicitly assume
> that NULL is represented by all zero bits. There may even be address
> comparisons that depend upon all addresses comparing > NULL.

This is the sort of thing I described as "bad programming" in other
articles.

Philip Homburg

unread,
Aug 11, 1994, 9:59:33 AM8/11/94
to
In article <jqbCuC...@netcom.com> j...@netcom.com (Jim Balter) writes:
%In article <Cu87w...@cs.vu.nl>, Philip Homburg <phi...@cs.vu.nl> wrote:
%>time_t time(time_t *tloc)
%>{
%> time_t tmp_time1, tmp_time2;
%>
%> if (tloc == 0)
%> return syscall1(SYS_TIME, (unsigned)0);
%> else if ((unsigned)tloc == 0)
%> {
%> tmp_time1= syscall1(SYS_TIME, (unsigned)&tmp_time2);
%> if (tmp_time1 != (time_t)-1)
%> *tloc= tmp_time2;
%> return tmp_time1;
%> }
%> else
%> return syscall1(SYS_TIME, tloc);
%>}
%>
%>Another interesting one is
%>
%>#define SIG_DFL (void (*)())0
%
%POSIX does not require this particular value.

I took the following statement from Clive D.W. Feather literally:

-> Well, the Unix linker and addressing model doesn't have any special
-> treatment of address zero. It should take only minor tweaks to the code
-> generator and linker to make zero a perfectly normal address, and NULL
-> be somewhere else.

I took it that I can take a standard Unix kernel and change the C compiler and
linker, etc. Later Clive said that he also assumes that the kernel uses

the same new bit pattern for NULL. I think that if you have to change


the kernel you are not talking about Unix but about an arbitrary system
where both kernel and applications are written in C.

So back to the complicated case, the kernel is standard, we have something
like a BSD 4.3 system call interface (SVR4 is defined in terms of shared
libraries), and now we want to run an application with a different bit
pattern for NULL.

Binary compatibility requires SIG_DFL to be defined as above.

%>time_t time(time_t *tloc) { return syscall1(SYS_TIME, (unsigned)tloc); }
%
%I don't understand the point. Nothing mandates a catchall routine like
%syscall1, but it can readily be implemented as a varargs routine. E.g.,

I needed some syntax to represent stuffing arguments in registers and trapping
into the kernel. Varargs routines are not normally used to implement
the lowest layer of the system call interface since the routine would have
trouble telling the number of arguments (of course the sparc and some other
risc architectures are exceptions, because of their calling convertion)


Philip Homburg

Jim Balter

unread,
Aug 12, 1994, 4:32:28 PM8/12/94
to
In article <CuEvt...@scone.london.sco.com>,

Clive D.W. Feather <cl...@sco.com> wrote:
>In article <jqbCuE...@netcom.com>, Jim Balter <j...@netcom.com> wrote:
>> For example, many programs, including UNIX
>> kernels, assume that NULL pointers can be initialized by setting them to zero,
>> possibly through bss initialization. Some UNIX kernels themselves make such
>> an assumption, and for some that initialization was done by the bootstrap
>> by setting all memmory to all zero bits.
>
>Hmm. I don't *quite* agree with what you've said, but you do bring to
>mind a valid point.

You are right in your disagreement (as I imagine it). My mental construct of
default initialization was what most implementations actually do (set memory to
all zero bits), rather than what the Standard requires. It is for this
reason that I tend to avoid default initialization of pointers. There is still
the assumption in existing UNIX code that pointers can be dynamically
initialized via calloc or memset. Once again, I note the absence of an object
initialization primitive in the language.

>The compiler must arrange that most integer, floating-point, and pointer
>variables not explicitly initialized are implicitly initialized to 0, 0.0,
>and NULL respectively. If all three are all-zero bit patterns, this can
>be done with one BSS segment, flooded with zeroes at startup. If not,
>then the compiler, either through a startup routine or through kernel
>assistance, must support at least three such segments (e.g. BSSI, BSSF,
>BSSP) which are flooded with the appropriate patterns at startup. This
>*is* a non-trivial change.

Sorry, this isn't sufficient, if you are talking about traditional contiguous
segments (as flooding implies), since the various types may be intertwined in
structures. Also, the different floating types may need separate
initialization. Initialization can be arbitrarily expensive; e.g.,
struct {
int i;
float f;
double d;
long double ld;
void * p;
} a[BIGNUM];

Presumably all this can be readily handled via a C++-type static constructor
mechanism. In any case non-zero-bit ``0'' values cost more, and are thus
less "elegant" (in respect to initialization).

>> In addition, there are an untold
>> number of cases of type punning via casts or unions that implicitly assume
>> that NULL is represented by all zero bits. There may even be address
>> comparisons that depend upon all addresses comparing > NULL.
>
>This is the sort of thing I described as "bad programming" in other
>articles.

Of course, but it is an unpleasant fact that any system will, over time,
accumulate cases of what happens to work (including dereferencing NULL,
overrunning arrays. etc.), as opposed to what is "allowed". This must be
taken into account in any *real* discussion of portability or conversion.
This problem can be alleviated by designing languages with well-defined
abstract semantics and translators with good (default!) error checking.
Unfortunately, C historically encouraged type punning and a byte-level
semantics (c.f. the struct hack discussion), and good error checking was
costly on a 28K pdp-11/40. And C++, despite its object orientation, is a
kitchen sink, and encourages coding anything that works (Stroustrup's "The
Design and Evolution of C++" is a very interesting and revealing book,
especially in re the effect of Stroustrup's personal ("pragmatic") philosophy
upon both the nature of the language and its market success).
--
<J Q B>

Karl Heuer

unread,
Aug 13, 1994, 12:29:28 AM8/13/94
to
In article <CuEvt...@scone.london.sco.com>

cl...@sco.com (Clive D.W. Feather) writes:
>The compiler must arrange that most integer, floating-point, and pointer
>variables not explicitly initialized are implicitly initialized to 0, 0.0,
>and NULL respectively.

I agree.

>If [these aren't all zero-bit patterns], then the compiler, either through


>a startup routine or through kernel assistance, must support at least three
>such segments (e.g. BSSI, BSSF, BSSP) which are flooded with the appropriate
>patterns at startup. This *is* a non-trivial change.

I disagree.

It's not sufficient: using separate segments wouldn't provide a way for
the compiler to deal with a struct containing mixed types (whose implicit
initializer is the appropriate aggregate of typed zeroes). It's also not
necessary: we can simply change the compiler to treat uninitialized
static-duration objects as if they had explicit zero initializers; in other
words, get rid of BSS entirely. This *is* a trivial change.

Of course, you could then put back one or more BSS segments of the sort that
you described, but this would be an optimization hack, not a conformance
issue.

>[Assuming things about null pointers] is the sort of thing I described


>as "bad programming" in other articles.

I agree.

Brian E Rhodefer

unread,
Aug 13, 1994, 2:35:24 PM8/13/94
to
I've been reading the discussions of the storage representation for NULL
with interest, and am (as always) very impressed at the level of thought
the language architects put in their designs.

I'm told machines have been made whose hardware design mysteriously
makes an all-bits-zero representation of NULL impossible, and that it's
for their sake that the language bends over backwards. But I still can't
buy it. I don't have a CS background, but I have watched a lot of hardware
get designed, and I know that hardware designers are a pretty inventive lot.
I *very* much doubt that they'd regard a requirement that NULL be represented
by all-bits-zero as much of an obstacle, if a language standards committee
would only have the gumption to insist on it. I'd rate it as maybe a little
tougher than deciding which side of the cabinets to put power switches on.

Has anyone made a list of the costs and benefits of maintaining the present
dichotomy between the logical and physical representation of (void *) 0?
Seems to me that the "benefits" side of the ledger has maybe two bullet items
on it of the form, "Well, at least all ten programmers for the Honeywell-Gump
MK01 computer can write for it in C", while the "costs" side of the ledger is
an unending stream of incredulity, confusion, and inefficiency. Just look at
how much time is spent on this net alone discussing the absurdity that
"*((int **)calloc(n,m))" can't be depended on to equal NULL .

So how about it? "ANSI C.2 - where nothing *IS* reliable"


Brian Rhodefer

Norman Diamond

unread,
Aug 14, 1994, 7:33:29 PM8/14/94
to
In article <32j3pc$s...@tekadm1.cse.tek.com> bri...@caladan.cse.tek.com (Brian E Rhodefer) writes:
>I've been reading the discussions of the storage representation for NULL
>with interest, and am (as always) very impressed at the level of thought
>the language architects put in their designs.

You might observe that the language architects correctly defined the
BEHAVIOR instead of the REPRESENTATION of null pointers.

Would that they had the same foresight in defining string operations.

You might observe that in other languages where the standard imposes
fewer restrictions on implementations, fancy implementations can do more
optimizing than C permits.

>I'm told machines have been made whose hardware design mysteriously
>makes an all-bits-zero representation of NULL impossible

Someone else gave an example, which I am almost willing to believe.
In general, the hardware doesn't really care, but programmers ought to.
It is customary to address blocks of storage at an offset of 0 from some
location and towards increasing addresses, up to 1 less than the size of
the block of storage. Not necessary, just customary and convenient.

In one case that I am familiar with, the hardware still doesn't care about
all 0-bits, but it cares about another case. If an offset is all 1-bits
and the object being accessed is 2 or more bytes long, then it traps.
Imagine that: if null pointers were all 1-bits, then most unintentional
dereferences of a null pointer could be trapped for free. Unintentional
dereferences of character pointers would still yield garbage or (in cases
of modification) random effects which will eventually cause some other
random failure and be hard to debug, just like now. When implementors
could take advantage of hardware that would catch some useful quantity
of bugs but arbitrarily decide to make null pointers all 0-bits, I think
that is irresponsible. (Disclaimer: I am not speaking for my employer.)

As silly as it is to tailor hardware for a single programming language,
users complain even when they got it.

Norman Diamond

unread,
Aug 14, 1994, 8:07:02 PM8/14/94
to
In article <CuE1o...@scone.london.sco.com> cl...@sco.com (Clive D.W. Feather) writes:
>Unix *is* "an arbitrary system where both kernel and applications
>are written in C".

Some computers run UNIX operating system and have two implementations of
the C language. The two implementations do not have to represent null
pointers in the same way. Would you force the user to reboot the machine
each time he/she wants to execute an application that was translated by
the "other" C implementation?

I've also heard rumors of some implementations of UNIX operating system
that are partly coded in assembly language. And if we had a more user
friendly substitute for assembly language, maybe we could reduce our
dependency on assembly language, eh? Shall we invent one?????

Karl Heuer

unread,
Aug 15, 1994, 1:27:18 PM8/15/94
to
In article <jqbCuF...@netcom.com>

j...@netcom.com (Jim Balter) writes:
>Of course, but it is an unpleasant fact that any system will, over time,
>accumulate cases of what happens to work (including dereferencing NULL,
>overrunning arrays. etc.), as opposed to what is "allowed". This must be
>taken into account in any *real* discussion of portability or conversion.

Code that dereferences a null pointer is already going to fail on a
significant number of implementations (I won't try to guess the fraction).
I'm not sure what you mean by "overrunning arrays", given the context that
it "happens to work".

As for the NULL-is-zero assumption, I bet that one could shake out most of
that lint without too much work. Assuming the code uses prototypes, you
need to view with suspicion all uses of calloc() and memset(), and zeroes
passed to variadic functions (the exec() family, in particular). And any
explicit pointer/int casts, of course. Do you think this would be a
difficult task?

(Oh, and using an implementation where NULL expands to __builtin_null
might be useful, too.)

Karl Heuer

unread,
Aug 15, 1994, 1:40:51 PM8/15/94
to
In article <32j3pc$s...@tekadm1.cse.tek.com>
bri...@caladan.cse.tek.com (Brian E Rhodefer) writes:
>I *very* much doubt that [hardware designers would] regard a requirement

>that NULL be represented by all-bits-zero as much of an obstacle, if a
>language standards committee would only have the gumption to insist on it.

Sometimes hardware is designed without the C language in mind at all,
much less the ease of implementing a particular language kludge.

>Has anyone made a list of the costs and benefits of maintaining the present
>dichotomy between the logical and physical representation of (void *) 0?
>Seems to me that the "benefits" side of the ledger has maybe two bullet items
>on it of the form, "Well, at least all ten programmers for the Honeywell-Gump
>MK01 computer can write for it in C", while the "costs" side of the ledger is
>an unending stream of incredulity, confusion, and inefficiency.

I believe that the confusion is primarily caused by the fact that "0" is a
valid spelling of the null pointer constant, together with the fact that
objects used in a boolean context are implicitly compared with a "zero" of
the appropriate type. I wouldn't mind seeing both of those warts removed
from the language, but I doubt it'll ever happen.

Stan Friesen

unread,
Aug 15, 1994, 4:15:50 PM8/15/94
to
In article <32mbj6$4...@usenet.pa.dec.com>, dia...@jrd.dec.com (Norman Diamond) writes:
|> Some computers run UNIX operating system and have two implementations of
|> the C language. The two implementations do not have to represent null
|> pointers in the same way. Would you force the user to reboot the machine
|> each time he/she wants to execute an application that was translated by
|> the "other" C implementation?

No, but I *would* require any C implementation to pass pointer parameters to
system calls in the format the kernel expects, regardless of the format used
by the C implementation internally.

Since all implementations I know of us library routine wrappers for system
calls, it is very simple to have these wrappers translate the C pointers
into kernel pointers prior to executing the syscall instruction (trap,
call gate, ot what have you).

The OS kernel imposes constraints on how one can implement
ANSI conforming C.


|>
|> I've also heard rumors of some implementations of UNIX operating system
|> that are partly coded in assembly language. And if we had a more user
|> friendly substitute for assembly language, maybe we could reduce our
|> dependency on assembly language, eh? Shall we invent one?????

Pretty much *all* Unix has a teeny tiny bit of assembly, to do context
switches and to map interrupts and traps into subroutine calls (which is
all that C can understand).

This is pretty much irrelevent.

--
s...@elsegundoca.ncr.com sar...@netcom.com

The peace of God be with you.

Shankar Unni

unread,
Aug 15, 1994, 7:08:03 PM8/15/94
to
Brian E Rhodefer (bri...@caladan.cse.tek.com) wrote:

> I'm told machines have been made whose hardware design mysteriously
> makes an all-bits-zero representation of NULL impossible, and that it's
> for their sake that the language bends over backwards. But I still can't
> buy it. I don't have a CS background, but I have watched a lot of hardware
> get designed, and I know that hardware designers are a pretty inventive lot.
> I *very* much doubt that they'd regard a requirement that NULL be represented
> by all-bits-zero as much of an obstacle, if a language standards committee
> would only have the gumption to insist on it. I'd rate it as maybe a little
> tougher than deciding which side of the cabinets to put power switches on.

Yes, Brian, but a lot of machines still in use out there in the "real
world" (tm) were designed way before C was known outside the confines of
Bell Labs..

Consider, for example, the classic HP3000 (thousands of them still in use
today), first shipped in 1973.

It's a segmented 16-bit stack architecture, where address 0 has a
well-defined meaning (start of data segment for data addresses, start of
code segment for code addresses), and a number of interesting data objects
are stored at small positive and negative data addresses.

(The C compiler on this machine, *not* done by HP, had an interesting hack:
it would reserve address 0 for one of its private globals, and initialize
it to 0, so that NULL was still 0 (as was *NULL), but this was dangerous if
you mixed C and Pascal, for instance; the Pascal compiler would feel free
to use address 0; you might then end up with a 0 data address for valid
data (some Pascal object, say)).

"Byte" and "word" addresses are different represented (byte address = 2 *
word address), so that casting between pointer types involved shifts. You
had to be *really* careful about casting to void * or char * and back if
you were passing pointers around and arbitrarily casting them.

Oh yeah: since you could have multiple text segments in a program, code
addresses were 32 bits, while data addresses were only 16 bits. Forget
about casting a function pointer to a "void *" (a "long" was OK, though).

It also had an extremely bizarre parameter passing convention (the
arguments were laid down in the same direction as the stack grew, so that
arg0 was furthest from the new frame pointer), such that varargs functions
were almost impossible before the ANSI standard came out. You had to put a
pragma (#pragma varargs) on both the function declaration and definition;
forget either, and the program would crash (guaranteed, not just
sporadically).


And if you were thinking along the lines of "oh, I'm thinking of *modern*
machines", remember that the ANSI committee has to cater to the
requirements of even the most bizarre architectures; many of these machines
are *not* likely to be thrown out, even though they may have depreciated to
$0, precisely because they are otherwise wonderful machines which do their
original job quite well..

--
Shankar Unni E-Mail: sha...@sgi.com
Silicon Graphics Inc. Phone: +1-415-390-2072

Jim Balter

unread,
Aug 16, 1994, 1:21:06 AM8/16/94
to
In article <KARL.94Au...@ursa-major.spdcc.com>,

Karl Heuer <ka...@kelp.boston.ma.us> wrote:
>In article <jqbCuF...@netcom.com>
>j...@netcom.com (Jim Balter) writes:
>>Of course, but it is an unpleasant fact that any system will, over time,
>>accumulate cases of what happens to work (including dereferencing NULL,
>>overrunning arrays. etc.), as opposed to what is "allowed". This must be
>>taken into account in any *real* discussion of portability or conversion.
>
>Code that dereferences a null pointer is already going to fail on a
>significant number of implementations (I won't try to guess the fraction).

I think you have missed my point. Projects developed on systems where
dereferences of NULL *do not* fail may contain such dereferences because
they accidentally happen to work. Such errors will persist until those projects
are ported to systems where NULL dereferences *do* fail. For example, the UNIX
program "tbl", developed on the PDP11, was badly coded and often checked
pointers to see if they were NULL *after* dereferencing them. These problems
were not identified until the program was ported to systems where page 0 was
protected. tbl is not unique in this regard.

>I'm not sure what you mean by "overrunning arrays", given the context that
>it "happens to work".

Someone recently posted a program to a Linux develoment group where the program
had a malloc call that used sizeof(double *) where sizeof(double) was
appropriate. The program happened to work on DOS, where it was originally
developed, but it failed on Linux, leading the naive programmer to think it
was a Linux bug. To reiterate, "any *real* discussion of portability or
conversion" of DOS programs to a UNIX environment should take into account
the fact that such weaknesses are more common in programs developed on DOS
than in programs developed on UNIX, precisely because DOS's lack of memory
protection allows such cases to "happen to work".

>As for the NULL-is-zero assumption, I bet that one could shake out most of
>that lint without too much work. Assuming the code uses prototypes, you
>need to view with suspicion all uses of calloc() and memset(), and zeroes
>passed to variadic functions (the exec() family, in particular). And any
>explicit pointer/int casts, of course. Do you think this would be a
>difficult task?

Not particularly. However, my attempts to locate and weed out examples of
bad C programming have been complicated in the past by particularly devious
misuses of the language and misunderstanding of good programming principles (and
resistance by some to the very concept).

>(Oh, and using an implementation where NULL expands to __builtin_null
>might be useful, too.)

Yeah, good thing that's an option available to conforming implementations. :-)
--
<J Q B>

Ronald F. Guilmette

unread,
Aug 16, 1994, 5:36:08 AM8/16/94
to
In article <KARL.94Au...@ursa-major.spdcc.com> ka...@kelp.boston.ma.us (Karl Heuer) writes:
>
>Code that dereferences a null pointer is already going to fail on a
>significant number of implementations (I won't try to guess the fraction).

A brief anecdote:

The way I heard it, the earliest ports of SVR4 to the x86 architecture
did the same thing as all other SVR4's did at the time, i.e. upon startup,
processes were NOT given any actual page zero. Thus, all dereferences
of NULL pointers caused immediate segmentation faults (and the processes
which attempted this bombed out immediately).

Apparently however, somewhere along the line USL caught a lot of flak about
this because large numbers of old programs (and old programmers) were being
ported over from DOS (where dereferencing NULL has traditionally been allowed).
So nowadays, I think you'll find that you WILL get a page 0 in your process
address space (and you WILL be able to dereference NULL) on most flavors of
SVR4 running on most flavors of x86.

Is this a good thing or a bad thing?

I happen to think it's a Bad Thing (because it means that nobody will EVER
be forced to fix all that crappy code that dereferences NULL) but then I'm
kind of a purist.

Douglas Rogers

unread,
Aug 16, 1994, 12:19:03 PM8/16/94
to
In article <jqbCuM...@netcom.com>, j...@netcom.com (Jim Balter) writes:
> In article <KARL.94Au...@ursa-major.spdcc.com>,
> Karl Heuer <ka...@kelp.boston.ma.us> wrote:
>
> I think you have missed my point. Projects developed on systems where
> dereferences of NULL *do not* fail may contain such dereferences because
> they accidentally happen to work. Such errors will persist until those projects
> are ported to systems where NULL dereferences *do* fail. For example, the UNIX
> program "tbl", developed on the PDP11, was badly coded and often checked
> pointers to see if they were NULL *after* dereferencing them. These problems
> were not identified until the program was ported to systems where page 0 was
> protected. tbl is not unique in this regard.
>

Yup good style is the key, is there a clear defence for this?

> >I'm not sure what you mean by "overrunning arrays", given the context that
> >it "happens to work".
>
> Someone recently posted a program to a Linux develoment group where the program
> had a malloc call that used sizeof(double *) where sizeof(double) was
> appropriate. The program happened to work on DOS, where it was originally
> developed, but it failed on Linux, leading the naive programmer to think it
> was a Linux bug. To reiterate, "any *real* discussion of portability or
> conversion" of DOS programs to a UNIX environment should take into account
> the fact that such weaknesses are more common in programs developed on DOS
> than in programs developed on UNIX, precisely because DOS's lack of memory
> protection allows such cases to "happen to work".
>

Interesting, I have been of the opinion that the following form for malloc
at least partially defends against this

ptr = (Type *) malloc(sizeof(Type) * array_size)

But then realised that with a macro:-

#define talloc(Type,size) ((Type *) malloc(sizeof(Type) * size))

this would prove better. Has anyone else considered this or use this form?
What other mechanisms if given in the style rules would protect the users
from such problems?

--
Douglas

---
=============================================================================
Douglas Rogers MAIL: d...@dcs.ed.ac.uk Tel: +44 31-650 5172 (direct dial)
Fax: +44 31-667 7209
============================= Mostly harmless ===============================

Dave Mooney

unread,
Aug 16, 1994, 6:32:22 PM8/16/94
to
Douglas Rogers <d...@dcs.ed.ac.uk> wrote:
> Interesting, I have been of the opinion that the following form for malloc
> at least partially defends against this [sort of problem]:

>
> ptr = (Type *) malloc(sizeof(Type) * array_size)

Unfortunately, this style leaves you open for other sorts of trouble if
you neglect to #include <stdlib.h>. Without a prototype in sight,
malloc() will return an int. And on a platform where sizeof(int) !=
sizeof(Type *), you are almost guaranteed to have your code fail
spectacularly at runtime. ANSI added a level of type safety into the
language that is rendered powerless by gratuitous use of type casts.

dave
--
Dave Mooney d...@vnet.ibm.com
"Where are the prophets, where are the visionaries?"

Michael M. Rubenstein

unread,
Aug 17, 1994, 12:15:57 PM8/17/94
to
Dave Mooney (da...@neutron.torolab.ibm.com) wrote:
> Douglas Rogers <d...@dcs.ed.ac.uk> wrote:
>> Interesting, I have been of the opinion that the following form for malloc
>> at least partially defends against this [sort of problem]:
>>
>> ptr = (Type *) malloc(sizeof(Type) * array_size)

> Unfortunately, this style leaves you open for other sorts of trouble if
> you neglect to #include <stdlib.h>. Without a prototype in sight,
> malloc() will return an int. And on a platform where sizeof(int) !=
> sizeof(Type *), you are almost guaranteed to have your code fail
> spectacularly at runtime. ANSI added a level of type safety into the
> language that is rendered powerless by gratuitous use of type casts.

Right. The way to get type safety is to code it as

ptr = malloc(sizeof *ptr * array_size);

--
Mike Rubenstein

Jonathan de Boyne Pollard

unread,
Aug 17, 1994, 11:40:25 AM8/17/94
to
>Since all implementations I know of us library routine wrappers for system
>calls, it is very simple to have these wrappers translate the C pointers
>into kernel pointers prior to executing the syscall instruction (trap,
>call gate, ot what have you).

OS/2 and Windows are the *major* exceptions to this that come immediately
to mind. System calls are function calls made directly to the
kernel API. No wrappers are used.

Relevant to other threads in this discussion are that there are 8192
*different* NULL pointers in the 16:16 memory model, and the kernel relies
on a 0 offset representing a NULL pointer in system calls.

¯ JdeBP ®

Mike Redford

unread,
Sep 23, 1994, 4:21:26 PM9/23/94
to
PLEASE MAIL IDEAS DIRECTLY TO ME. THANKS A BUNK!
The problem that I was having with this segment of the code, was
that on the next for loop. The strcat did can the second value of
name1 also to name0 and so on.
If I am doing CASE: condition, does the strcat see only one case
at a time or strcat should not be used with case. That is convert
all CASES in to IF's.

fscanf(ioptr,"%d %s %s %s %f\n",&number,&input1[INDEX-1][12],&input2[INDEX-1][12],&input3[INDEX-1][12],&input4);
mymenu=INDEX;
if(INDEX==0)
{
num1=0.0;
diff2=input4;
diff1=diff2;

// sprintf(strdiff1,"%4.5f",diff2);

if(!mainhand)
{
printf("Sorry can't open the fille %s",mainhand);
return 1;
}
else
{
switch (mymenu)
{
case 0:
if (mymenu==0 && index==0)
{
sprintf(strdiff1,"%4.5f",diff2);
if (strdiff1)
{
strcat(name0,strdiff1);
strcat(name0,",");
strdiff1[0]='\0';
}
}
else
{

sprintf(strdiff1,"%4.5f",diff2);
if (strdiff1)
{
strcat(name0,strdiff1);
strcat(name0,",");
strdiff1[0]='\0';
}

}
break;
default :
printf("OK");
}//switch
// fprintf(mainhand,"%d %f\n",number,diff2);
}
} //INDEX==0
else
{
num1=input4;
diff2=num1-diff1;
diff1=input4;

// sprintf(strdiff1,"%4.5f",diff2);
if(!mainhand)
{
printf("Sorry can't open the fille %s",mainhand);
return 1;
}
else
{
// mymenu=INDEX;
switch (mymenu)
{
case 1:
sprintf(strdiff1,"%4.5f",diff2);
if (strdiff1 && mymenu==1)
{
strcat(name1,strdiff1);
strcat(name1,",");
strdiff1[0]='\0';
}
break;
case 2:
sprintf(strdiff1,"%4.5f",diff2);
if (strdiff1)
{
strcat(name2,strdiff1);
strcat(name2,",");
strdiff1[0]='\0';
}
break;
case 3:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name3,strdiff1);
strcat(name3,",");
strdiff1[0]='\0';
break;
case 4:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name4,strdiff1);
strcat(name4,",");
strdiff1[0]='\0';
break;
case 5:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name5,strdiff1);
strcat(name5,",");
strdiff1[0]='\0';
break;
case 6:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name6,strdiff1);
strcat(name6,",");
strdiff1[0]='\0';
break;
case 7:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name7,strdiff1);
strcat(name7,",");
strdiff1[0]='\0';
break;
case 8:
sprintf(strdiff1,"%4.5f",diff2);
strcat(name8,strdiff1);
strcat(name8,",");
strdiff1[0]='\0';
break;
default:
printf(" Done Baby");

Ian Cargill

unread,
Sep 26, 1994, 5:26:05 PM9/26/94
to

> PLEASE MAIL IDEAS DIRECTLY TO ME. THANKS A BUNK!
>

> code for first year assignment (?) deleted.

Which is yet another reason for giving this group a more anonymous name.
I was only half serious in suggesting comp.std.iso9899, but I'm
rapidly beginning to think it is a good idea.

BTW. Is it worth people mailing objections to postmaster@wherever?
Maybe if they get enough complaints, they will tighten up on making
students more responsible for the way they use the net.

--
===================================================================
Ian Cargill CEng MIEE Soliton Software Ltd.
email: i...@soliton.demon.co.uk 54 Windfield, Leatherhead,
tel: +44 (0)1372 37 5529 Surrey, UK KT22 8UQ
-------------------------------------------------------------------
Course Organiser: Association of C and C++ Users
===================================================================

Ronald F. Guilmette

unread,
Sep 28, 1994, 2:55:01 AM9/28/94
to
In article <780614...@soliton.demon.co.uk> i...@soliton.demon.co.uk writes:
>In article <35vdc6$3...@noc2.drexel.edu> mred...@impact.drexel.edu writes:
>
>> PLEASE MAIL IDEAS DIRECTLY TO ME. THANKS A BUNK!
>>
>> code for first year assignment (?) deleted.
>
>Which is yet another reason for giving this group a more anonymous name.
>I was only half serious in suggesting comp.std.iso9899, but I'm
>rapidly beginning to think it is a good idea.
>
>BTW. Is it worth people mailing objections to postmaster@wherever?
>Maybe if they get enough complaints, they will tighten up on making
>students more responsible for the way they use the net.

No. Simple neophite mistakes are best taken up directly with the
individual. Only blatant cases of spamming are cause for getting
a site administrator involved. Remember to keep your powder dry for
those rare occasions when you really need it.

0 new messages