Reading from files and range of char and friends

Spiros Bousbouras

unread,

Mar 10, 2011, 11:40:32 AM3/10/11

to

If you are reading from a file by successively calling fgetc() is there
any point in storing what you read in anything other than unsigned
char ? If you try to store it in char or signed char then it's possible
that what you read may fall outside the range of the type in which case
you get implementation defined behavior according to 6.3.1.3 p. 3. So
then why doesn't fgets() get unsigned char* as first argument ? It
would make the life of the user simpler and possibly also the life of
the implementor.

--
Pain makes believers.
Wally Jay

Angel

unread,

Mar 10, 2011, 11:49:57 AM3/10/11

to

On 2011-03-10, Spiros Bousbouras <spi...@gmail.com> wrote:
> If you are reading from a file by successively calling fgetc() is there
> any point in storing what you read in anything other than unsigned
> char ?

Yes, when you read EOF which is not an unsigned char.

"fgetc() reads the next character from stream and returns
it as an unsigned char cast to an int, or EOF on end of file or
error."
(From the Linux man pages.)

--
The natural state of a spammer's website is a smoking crater.

Spiros Bousbouras

unread,

Mar 10, 2011, 12:05:48 PM3/10/11

to

On 10 Mar 2011 16:49:57 GMT

Angel <angel...@spamcop.net> wrote:
> On 2011-03-10, Spiros Bousbouras <spi...@gmail.com> wrote:
> > If you are reading from a file by successively calling fgetc() is there
> > any point in storing what you read in anything other than unsigned
> > char ?
>
> Yes, when you read EOF which is not an unsigned char.

In my mind I was making a distinction between storing and temporarily
assigning but I guess it wasn't clear. What I had in mind was something
like:

unsigned char arr[some_size] ;
int a ;

while ( (a = fgetc(f)) != EOF) arr[position++] = a ;

Would there be any reason for arr to be something other than
unsigned char ?

Angel

unread,

Mar 10, 2011, 3:36:11 PM3/10/11

to

No, but you should use a cast there or your compiler might balk because
unsigned char is likely to have less bits than int.

fgetc() returns an int because EOF has to have a value that cannot
normally be read from a file. Once you've determined that the read value
is not EOF, it's safe to store it as an unsigned char.

And in C there is no difference between "storing" and "temporarily
assigning". Every assignment lasts until overwritten.

Paul N

unread,

Mar 10, 2011, 5:18:05 PM3/10/11

to

On Mar 10, 5:05 pm, Spiros Bousbouras <spi...@gmail.com> wrote:
> On 10 Mar 2011 16:49:57 GMT
>

char is normally used for storing characters, and I think that is what
it was designed for. So it seems a bit odd not to use it. If you're
going to use the str* functions to manipulate what you've read in,
then storing it as char seems sensible, and not doing so is likely to
require some nasty casts.

In my view anyway...

Spiros Bousbouras

unread,

Mar 10, 2011, 5:40:40 PM3/10/11

to

On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
Paul N <gw7...@aol.com> wrote:
> On Mar 10, 5:05 pm, Spiros Bousbouras <spi...@gmail.com> wrote:
> > On 10 Mar 2011 16:49:57 GMT
> >
> > Angel <angel+n...@spamcop.net> wrote:
> > > On 2011-03-10, Spiros Bousbouras <spi...@gmail.com> wrote:
> > > > If you are reading from a file by successively calling fgetc() is there
> > > > any point in storing what you read in anything other than unsigned
> > > > char ?
> >
> > > Yes, when you read EOF which is not an unsigned char.
> >
> > In my mind I was making a distinction between storing and temporarily
> > assigning but I guess it wasn't clear. What I had in mind was something
> > like:
> >
> > unsigned char arr[some_size] ;
> > int a ;
> >
> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >
> > Would there be any reason for arr to be something other than
> > unsigned char ?
>
> char is normally used for storing characters, and I think that is what
> it was designed for. So it seems a bit odd not to use it.

But if arr[] is char how do you avoid the implementation defined
behavior when doing arr[position++] = a ?

Angel

unread,

Mar 10, 2011, 5:49:52 PM3/10/11

to

Depends on what exactly you are reading. If it's a normal text file
encoded in ASCII, converting the values read by fgetc() should be safe
because ASCII values are only 7 bits and will fit into a char.

If it's a binary file though, you'll have to use unsigned char, and
you should consider using fread instead.

Spiros Bousbouras

unread,

Mar 10, 2011, 5:59:09 PM3/10/11

to

On 10 Mar 2011 22:49:52 GMT

And what if it's a non ASCII text file ? It could be ISO-8859-1 or
UTF-8. An extra complication is that you may have to read some of the
file in order to determine what kind of information it contains.

Angel

unread,

Mar 10, 2011, 6:16:59 PM3/10/11

to

fgetc() is guaranteed to return either an unsigned char or EOF, so that
always works. Interpreting the read data is up to your program and will
depend on what exactly you are trying to accomplish.

UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
char (it will fit in a signed char too, but values >127 will be
converted to negative values), and so does ISO-8859-1. For character
encodings with more bits, there is fgetwc().

Keith Thompson

unread,

Mar 10, 2011, 6:37:38 PM3/10/11

to

Typically by ignoring the issue. (Well, this doesn't avoid
the implementation defined behavior; it just assumes it's
ok.) On any system where this is a sensible thing to do, the
implementation-defined behavior is almost certain to be what you
want. Assigning a value exceeding CHAR_MAX to a char (assuming
plain char is signed) *could* give you a strange result, or even
raise an implementation-defined signal, but any implementation that
chose to do such a thing would break a lot of existing code.

C uses plain char (which may be signed) for strings, but it reads
characters from files as unsigned char values. IMHO this is a flaw
in the language. A byte read from a file with a representation
of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
a copyright symbol in Latin-1, 'z' in EBCDIC).

One solution might be to require plain char to be unsigned, but that
causes inefficient code for some operations -- which was more of
issue in the PDP-11 days than it is now, but it's probably still
significant.

Another might be to have fgetc() return an int representing either
a *plain* char value or EOF, but it's too late to change that.

I'm usually a strong advocate for writing code as portably as possible,
but in this case I suspect that workaround around the unsigned char vs.
plain char mismatch would be more effort than it's worth.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman

unread,

Mar 10, 2011, 8:37:09 PM3/10/11

to

On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
> If you are reading from a file by successively calling fgetc() is there
> any point in storing what you read in anything other than unsigned
> char ?

Sure. To see one reason in action, try

unsigned char uchar_password[SIZE];
...
if (strcmp(uchar_password, "SuperSecret") == 0) ...

> If you try to store it in char or signed char then it's possible
> that what you read may fall outside the range of the type in which case
> you get implementation defined behavior according to 6.3.1.3 p. 3.

Yes. This is, IMHO, a weakness in the library design, a weakness
inherited from the pre-Standard days that also gave us gets(). The
practical consequence is that the implementation must define the
behavior "usefully" in order to make the library work as desired.
(The situation is particularly bad for systems with signed-magnitude
or ones' complement notations, where the sign of zero is obliterated
on conversion to unsigned char and thus cannot be recovered again
after getc().)

> then why doesn't fgets() get unsigned char* as first argument ?

Hysterical raisins, I'd guess.

In-band signaling works well in some situations -- NULL for a
failed malloc() or strchr() or getenv(), for example -- but C has
used it in situations where the benefits are not so clear. getc()
is one of those, strtoxxx() is another, and no doubt there are other
situations where the "error return" can be confused with a perfectly
valid value. Even a failed bsearch() could usefully return something
more helpful than NULL, were there an independent channel to indicate
"I didn't find it."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Francois Grieu

unread,

Mar 11, 2011, 5:26:11 AM3/11/11

to

On 10/03/2011 18:05, Spiros Bousbouras wrote:
> On 10 Mar 2011 16:49:57 GMT
> Angel <angel...@spamcop.net> wrote:
>> On 2011-03-10, Spiros Bousbouras <spi...@gmail.com> wrote:
>>> If you are reading from a file by successively calling fgetc() is there
>>> any point in storing what you read in anything other than unsigned
>>> char ?
>>
>> Yes, when you read EOF which is not an unsigned char.
>
> In my mind I was making a distinction between storing and temporarily
> assigning but I guess it wasn't clear. What I had in mind was something
> like:
>
> unsigned char arr[some_size] ;
> int a ;
>
> while ( (a = fgetc(f)) != EOF) arr[position++] = a ;

Assuming position is initially 0 and a==EOF not needed, try
position = fread(arr,1,some_size,f);
This will not cause UB if the input is too big, and it has
a fair chance to be slightly faster.

> Would there be any reason for arr to be something other than
> unsigned char ?

Usually no (possible exception: dead slow type conversion).
Whenever fgetc(f) does not return EOF (being passed a valid f),
it returns an unsigned char casted to an int, and casting that
int back to unsigned char cause no data loss.

Francois Grieu

Spiros Bousbouras

unread,

Mar 11, 2011, 2:39:15 PM3/11/11

to

On Thu, 10 Mar 2011 20:37:09 -0500
Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
> > If you are reading from a file by successively calling fgetc() is there
> > any point in storing what you read in anything other than unsigned
> > char ?
>
> Sure. To see one reason in action, try
>
> unsigned char uchar_password[SIZE];
> ...
> if (strcmp(uchar_password, "SuperSecret") == 0) ...

Just to be clear , the only thing that can go wrong with this example
is that strcmp() may try to convert the elements of uchar_password to
char thereby causing the implementation defined behavior. The same
issue could arise with any other str* function. Or is there something
specific about your example that I'm missing ?

> > If you try to store it in char or signed char then it's possible
> > that what you read may fall outside the range of the type in which case
> > you get implementation defined behavior according to 6.3.1.3 p. 3.
>
> Yes. This is, IMHO, a weakness in the library design, a weakness
> inherited from the pre-Standard days that also gave us gets(). The
> practical consequence is that the implementation must define the
> behavior "usefully" in order to make the library work as desired.
> (The situation is particularly bad for systems with signed-magnitude
> or ones' complement notations, where the sign of zero is obliterated
> on conversion to unsigned char and thus cannot be recovered again
> after getc().)

If getc() read int's from files instead of unsigned char's would it be
realistically possible that reading from a file would return a negative
zero ? That would be one strange file.

> > then why doesn't fgets() get unsigned char* as first argument ?
>
> Hysterical raisins, I'd guess.

For those who didn't get it , that's historical reasons.

> In-band signaling works well in some situations -- NULL for a
> failed malloc() or strchr() or getenv(), for example -- but C has
> used it in situations where the benefits are not so clear. getc()
> is one of those, strtoxxx() is another, and no doubt there are other
> situations where the "error return" can be confused with a perfectly
> valid value.

I don't see how this can happen with getc(). The only improvement I
can think of is that you could have two different return values to
denote exceptional situations instead of just EOF , one value would
denote end of file and the other error. But the current interface of
getc() could accommodate this just fine , you would only need to make
the 2 exceptional values negative.

> Even a failed bsearch() could usefully return something
> more helpful than NULL, were there an independent channel to indicate
> "I didn't find it."

--
If strings doesn't work, then there's the "Read Microsoft" tool, rm,
which gives you the useful content of Word files that strings can't
extract and helpfully moves the hideous fonts, ugly typography, macro
viruses, and general bloat that make up the rest of this class of Word
files into the bit bucket for you.
Dave Vandervies

Keith Thompson

unread,

Mar 11, 2011, 2:53:57 PM3/11/11

to

Spiros Bousbouras <spi...@gmail.com> writes:
> On Thu, 10 Mar 2011 20:37:09 -0500
> Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
>> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
>> > If you are reading from a file by successively calling fgetc() is there
>> > any point in storing what you read in anything other than unsigned
>> > char ?
>>
>> Sure. To see one reason in action, try
>>
>> unsigned char uchar_password[SIZE];
>> ...
>> if (strcmp(uchar_password, "SuperSecret") == 0) ...
>
> Just to be clear , the only thing that can go wrong with this example
> is that strcmp() may try to convert the elements of uchar_password to
> char thereby causing the implementation defined behavior. The same
> issue could arise with any other str* function. Or is there something
> specific about your example that I'm missing ?

The call to strcmp() violates a constraint. strcmp() expects const
char* (a non-const char* is also ok), but uchar_password, after
the implicit conversion is of type unsigned char*. Types char*
and unsigned char* are not compatible, and there is no implicit
conversion from one to the other.

If you use an explicit cast, it will *probably* work as expected,
but without the case the compiler is permitted to reject i.t

[...]

> If getc() read int's from files instead of unsigned char's would it be
> realistically possible that reading from a file would return a negative
> zero ? That would be one strange file.

What would be so strange about it? If a file contains a sequence of
ints, stored as binary, and the implementation has a distinct
representation for negative zero, then the file could certainly contain
negative zeros.

[...]

Spiros Bousbouras

unread,

Mar 11, 2011, 3:44:02 PM3/11/11

to

On Thu, 10 Mar 2011 15:37:38 -0800
Keith Thompson <ks...@mib.org> wrote:
> Spiros Bousbouras <spi...@gmail.com> writes:
> > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
> > Paul N <gw7...@aol.com> wrote:
> >> On Mar 10, 5:05 pm, Spiros Bousbouras <spi...@gmail.com> wrote:
> >> >
> >> > unsigned char arr[some_size] ;
> >> > int a ;
> >> >
> >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >> >
> >> > Would there be any reason for arr to be something other than
> >> > unsigned char ?
> >>
> >> char is normally used for storing characters, and I think that is what
> >> it was designed for. So it seems a bit odd not to use it.
> >
> > But if arr[] is char how do you avoid the implementation defined
> > behavior when doing arr[position++] = a ?
>
> Typically by ignoring the issue. (Well, this doesn't avoid
> the implementation defined behavior; it just assumes it's
> ok.) On any system where this is a sensible thing to do, the
> implementation-defined behavior is almost certain to be what you
> want.

Is there a system which has stdio.h but reading from a file and storing
what you read in an array is not a sensible thing to do ?

[...]

> C uses plain char (which may be signed) for strings, but it reads
> characters from files as unsigned char values. IMHO this is a flaw
> in the language. A byte read from a file with a representation
> of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
> a copyright symbol in Latin-1, 'z' in EBCDIC).

Which makes me wonder if there are any character encodings in use where
some characters get encoded by negative numbers.

> One solution might be to require plain char to be unsigned, but that
> causes inefficient code for some operations -- which was more of
> issue in the PDP-11 days than it is now, but it's probably still
> significant.
>
> Another might be to have fgetc() return an int representing either
> a *plain* char value or EOF, but it's too late to change that.

The standard could say that if an implementation offers stdio.h then
the following function

int foo(unsigned char a) {
char b = a ;
unsigned char c = b ;
return a == c ;
}

always returns 1. This I think would be sufficient to be able to assign
the return value of fgetc() to char (after checking for EOF) without
worries. But does it leave any existing implementations out ? And while
I'm at it , how do existing implementations handle conversion to a
signed integer type if the value doesn't fit ? Anyone has any unusual
examples ?

Another approach would be to have a macro __WBUC2CA (well behaved
unsigned char to char assignment) which will have the value 1 or 0 and
if it has the value 1 then foo() above will be guaranteed to return 1.

> I'm usually a strong advocate for writing code as portably as possible,
> but in this case I suspect that workaround around the unsigned char vs.
> plain char mismatch would be more effort than it's worth.

--
If Larry Wall had instead written a paper describing Perl, it probably
would have been dismissed as a joke.
Kaz Kylheku

Tim Rentsch

unread,

Mar 11, 2011, 4:03:33 PM3/11/11

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
>> If you are reading from a file by successively calling fgetc() is there
>> any point in storing what you read in anything other than unsigned
>> char ?
>
> Sure. To see one reason in action, try
>
> unsigned char uchar_password[SIZE];
> ...
> if (strcmp(uchar_password, "SuperSecret") == 0) ...
>
>> If you try to store it in char or signed char then it's possible
>> that what you read may fall outside the range of the type in which case
>> you get implementation defined behavior according to 6.3.1.3 p. 3.
>
> Yes. This is, IMHO, a weakness in the library design, a weakness
> inherited from the pre-Standard days that also gave us gets(). The
> practical consequence is that the implementation must define the
> behavior "usefully" in order to make the library work as desired.
> (The situation is particularly bad for systems with signed-magnitude
> or ones' complement notations, where the sign of zero is obliterated
> on conversion to unsigned char and thus cannot be recovered again

> after getc().) [snip subsequent paragaphs]

Do you mean to say that if a file has a byte with a bit
pattern corresponding to a 'char' negative-zero, and
that byte is read (in binary mode) with getc(), the
result of getc() will be zero? If that's what you're
saying I believe that is wrong.

Tim Rentsch

unread,

Mar 11, 2011, 4:08:00 PM3/11/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

> If getc() read int's from files instead of unsigned char's would it be
> realistically possible that reading from a file would return a negative
> zero ?

A call to getc() cannot return negative zero. The reason is,
getc() is defined in terms of fgetc(), which returns an
'unsigned char' converted to an 'int', and such conversions
cannot produce negative zeros.

Tim Rentsch

unread,

Mar 11, 2011, 4:10:51 PM3/11/11

to

Keith Thompson <ks...@mib.org> writes:

>> If getc() read int's from files instead of unsigned char's would it be
>> realistically possible that reading from a file would return a negative
>> zero ? That would be one strange file.
>
> What would be so strange about it? If a file contains a sequence of
> ints, stored as binary, and the implementation has a distinct
> representation for negative zero, then the file could certainly contain
> negative zeros.

I think the question he was asking is something different, which
is, "can the int values produced by getc() ever be (int) negative
zeros?", to which the answer is they cannot.

Tim Rentsch

unread,

Mar 11, 2011, 4:25:47 PM3/11/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

Assuming: the bits are in the same places for the implementation that
wrote the file and the implementation reading the file; and CHAR_BIT
is also the same; and UCHAR_MAX < INT_MAX; then you could do this:

arr[position++] = a <= CHAR_MAX ? a : a - (UCHAR_MAX+1);

which works for all values that the target machine supports.

Tim Rentsch

unread,

Mar 11, 2011, 4:29:38 PM3/11/11

to

Angel <angel...@spamcop.net> writes:

> [snip]

>
> UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
> char (it will fit in a signed char too,

It will on most implementations but the Standard does not
require that.

> but values >127 will be converted to negative values),

Again true on most implementations but not Standard-guaranteed.

Spiros Bousbouras

unread,

Mar 11, 2011, 4:55:54 PM3/11/11

to

On Fri, 11 Mar 2011 11:53:57 -0800
Keith Thompson <ks...@mib.org> wrote:
> Spiros Bousbouras <spi...@gmail.com> writes:
> > On Thu, 10 Mar 2011 20:37:09 -0500
> > Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> >> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
> >> > If you are reading from a file by successively calling fgetc() is there
> >> > any point in storing what you read in anything other than unsigned
> >> > char ?
> >>
> >> Sure. To see one reason in action, try
> >>
> >> unsigned char uchar_password[SIZE];
> >> ...
> >> if (strcmp(uchar_password, "SuperSecret") == 0) ...
> >
> > Just to be clear , the only thing that can go wrong with this example
> > is that strcmp() may try to convert the elements of uchar_password to
> > char thereby causing the implementation defined behavior. The same
> > issue could arise with any other str* function. Or is there something
> > specific about your example that I'm missing ?
>
> The call to strcmp() violates a constraint. strcmp() expects const
> char* (a non-const char* is also ok), but uchar_password, after
> the implicit conversion is of type unsigned char*. Types char*
> and unsigned char* are not compatible, and there is no implicit
> conversion from one to the other.

I see. I assumed that the implicit conversion would be ok because
paragraph 27 of 6.2.5 says "A pointer to void shall have the same
representation and alignment requirements as a pointer to a character
type.39)" and footnote 39 says "The same representation and alignment
requirements are meant to imply interchangeability as arguments to
functions, return values from functions, and members of unions." I
assumed that the relation "same representation and alignment
requirements" is transitive.

On the other hand footnote 35 of paragraph 15 says that char is not
compatible with signed or unsigned char and in 6.7.5.1 we read that
pointers to types are compatible only if the types are compatible. We
must conclude then that the relation "same representation and alignment
requirements" is not transitive. That's a damn poor choice of
terminology then.

> If you use an explicit cast, it will *probably* work as expected,
> but without the case the compiler is permitted to reject i.t

> > If getc() read int's from files instead of unsigned char's would it be

> > realistically possible that reading from a file would return a negative
> > zero ? That would be one strange file.
>
> What would be so strange about it? If a file contains a sequence of
> ints, stored as binary, and the implementation has a distinct
> representation for negative zero, then the file could certainly contain
> negative zeros.

Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

--
Metadiscussion is evil !

Spiros Bousbouras

unread,

Mar 11, 2011, 5:09:26 PM3/11/11

to

When I said "getc() read int's from files" I meant that also fgetc()
reads int's from files i.e. we're talking about an alternative C where
we don't have the intermediate unsigned char step.

Apart from that , in post

<kfnsjut...@x-alumni2.alumni.caltech.edu>
http://groups.google.com/group/comp.lang.c/msg/1909c5fe30c02e81?dmode=source

you say

Do you mean to say that if a file has a byte with a bit
pattern corresponding to a 'char' negative-zero, and
that byte is read (in binary mode) with getc(), the
result of getc() will be zero? If that's what you're
saying I believe that is wrong.

Assuming actual C (i.e. not the alternative C from above) is it not
possible in the scenario you're describing that int will get negative
zero ?

Spiros Bousbouras

unread,

Mar 11, 2011, 5:34:33 PM3/11/11

to

A better name would be __WBUC2CC for well behaved unsigned char to char
conversion.

lawrenc...@siemens.com

unread,

Mar 11, 2011, 4:57:37 PM3/11/11

to

Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
> A call to getc() cannot return negative zero. The reason is,
> getc() is defined in terms of fgetc(), which returns an
> 'unsigned char' converted to an 'int', and such conversions
> cannot produce negative zeros.

They can if char and int are the same size.
--
Larry Jones

I always send Grandma a thank-you note right away. ...Ever since she
sent me that empty box with the sarcastic note saying she was just
checking to see if the Postal Service was still working. -- Calvin

Spiros Bousbouras

unread,

Mar 11, 2011, 6:32:08 PM3/11/11

to

On 10 Mar 2011 20:36:11 GMT

Angel <angel...@spamcop.net> wrote:
> On 2011-03-10, Spiros Bousbouras <spi...@gmail.com> wrote:
> > assigning but I guess it wasn't clear. What I had in mind was something
> > like:
> >
> > unsigned char arr[some_size] ;
> > int a ;
> >
> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
> >
> > Would there be any reason for arr to be something other than
> > unsigned char ?
>
> No, but you should use a cast there or your compiler might balk because
> unsigned char is likely to have less bits than int.

A cast wouldn't buy you anything in this case because according to
paragraph 2 of 6.5.16.1 a conversion will happen anyway.

Joe Wright

unread,

Mar 11, 2011, 6:35:10 PM3/11/11

to

Pardon me for jumping in so late. I got interested when someone earlier
thought to store the EOF character. Of course the EOF is a status and need
not be stored.

The return type of fgetc() is int so as to allow the full 0..255 value of a
byte AND a value EOF. When you assign int to char, the char takes the lower
eight bits of the int without change. Try this:

#include <stdio.h>
int main(void) {
char c;
unsigned char u;
int i = 240;
c = i;
u = c;
printf("%d, %d, %d\n", i, c, u);
return 0;
}

I get: 240, -16, 240 as I expected.

The value of fgetc() being int and being assigned to char is not a problem
and not a 'defect' of the language.

--
Joe Wright
"If you rob Peter to pay Paul you can depend on the support of Paul."

Joe Wright

unread,

Mar 11, 2011, 6:59:14 PM3/11/11

to

I must be missing your point. What does UTF-8 have to do with the Standard?

Spiros Bousbouras

unread,

Mar 11, 2011, 7:01:53 PM3/11/11

to

On Fri, 11 Mar 2011 18:35:10 -0500
Joe Wright <joeww...@comcast.net> wrote:

> Pardon me for jumping in so late. I got interested when someone earlier
> thought to store the EOF character. Of course the EOF is a status and need
> not be stored.

I don't recall anyone in the thread saying that.

> The return type of fgetc() is int so as to allow the full 0..255 value of a
> byte AND a value EOF.

A byte in C can have values greater than 255 depending on the
implementation.

> When you assign int to char, the char takes the lower
> eight bits of the int without change.

Where do you get this from ? In the OP I mentioned paragraph 3 of
6.3.1.3 .Here's what it says:

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

And you do realise that a char is permitted to have more than 8 bits ,
yes ?

> Try this:
>
> #include <stdio.h>
> int main(void) {
> char c;
> unsigned char u;
> int i = 240;
> c = i;
> u = c;
> printf("%d, %d, %d\n", i, c, u);
> return 0;
> }
>
> I get: 240, -16, 240 as I expected.

That is one data point among the hundreds or thousands of C
implementations. Even if a char always had 8 bits and even if the
assignment int to char was guaranteed to copy the lower 8 bits , the
middle number could still be -112 if the implementation uses "sign and
magnitude" to represent negative numbers.

> The value of fgetc() being int and being assigned to char is not a problem
> and not a 'defect' of the language.

If only.

--
A recent statistic has showed the every 10 minutes
someone somewhere is insulting Seamus MacRae.

Keith Thompson

unread,

Mar 11, 2011, 8:47:25 PM3/11/11

to

Spiros Bousbouras <spi...@gmail.com> writes:
> On Fri, 11 Mar 2011 13:08:00 -0800
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Spiros Bousbouras <spi...@gmail.com> writes:
>>
>> > If getc() read int's from files instead of unsigned char's would it be
>> > realistically possible that reading from a file would return a negative
>> > zero ?
>>
>> A call to getc() cannot return negative zero. The reason is,
>> getc() is defined in terms of fgetc(), which returns an
>> 'unsigned char' converted to an 'int', and such conversions
>> cannot produce negative zeros.
>
> When I said "getc() read int's from files" I meant that also fgetc()
> reads int's from files i.e. we're talking about an alternative C where
> we don't have the intermediate unsigned char step.

I'm afraid I'm not following you here.

I initially assumed you meant getc and fgetc would be reading
int-sized chunks from the file, rather than (as C currently
specifies) reading bytes, interpreting them as unsigned char,
and converting that to int.

Without the intermediate step, how is the int value determined?

Perhaps you mean getc and fgetc read a byte from the file, interpret
is as *plain* char, and then convert the result to int.

If so, and if plain char is signed and has a distinct representation
for negative zero (this excludes 2's-complement systems), then
could getc() return a negative zero?

I'd say no. Converting a negative zero from char to int does not
yield a negative zero int; 6.2.6.2p3 specifies the operations that
might generate a negative zero, and conversions aren't in the list.

Which means that getc() and fgetc() would be unable to distinguish
between a positive and negative zero in a byte read from a file.
Which is probably part of the reason why the standard specifies
that the value is treated as an unsigned char.

Or the standard could have said specifically that getc and fgetc do
return a negative zero in these cases, but dealing with that in code
would be nasty (and, since most current systems don't have negative
zeros, most programmers wouldn't bother).

(As I've said before, requiring plain char to be unsigned would
avoid a lot of this confusion, but might have other bad effects.)

Keith Thompson

unread,

Mar 11, 2011, 9:12:51 PM3/11/11

to

Joe Wright <joeww...@comcast.net> writes:
> On 3/11/2011 16:29, Tim Rentsch wrote:
>> Angel<angel...@spamcop.net> writes:
>>
>>> [snip]
>>>
>>> UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
>>> char (it will fit in a signed char too,
>>
>> It will on most implementations but the Standard does not
>> require that.
>>
>>> but values>127 will be converted to negative values),
>>
>> Again true on most implementations but not Standard-guaranteed.
>
> I must be missing your point. What does UTF-8 have to do with the Standard?

Somebody upthread suggested that the plain char vs. unsigned char
mismatch isn't a problem, because ASCII characters are all in the
range 0-127. UTF-8 is one example of a character encoding where
bytes in a text file can have values exceeding 127. (Latin-1 and
EBCDIC are other examples.)

Eric Sosman

unread,

Mar 11, 2011, 9:42:59 PM3/11/11

to

On 3/11/2011 2:39 PM, Spiros Bousbouras wrote:
> On Thu, 10 Mar 2011 20:37:09 -0500
> Eric Sosman<eso...@ieee-dot-org.invalid> wrote:
>> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
>>> If you are reading from a file by successively calling fgetc() is there
>>> any point in storing what you read in anything other than unsigned
>>> char ?
>>
>> Sure. To see one reason in action, try
>>
>> unsigned char uchar_password[SIZE];
>> ...
>> if (strcmp(uchar_password, "SuperSecret") == 0) ...
>
> Just to be clear , the only thing that can go wrong with this example
> is that strcmp() may try to convert the elements of uchar_password to
> char thereby causing the implementation defined behavior.

True: After issuing the required diagnostic, the implementation
may accept the faulty translation unit anyhow, and may assign it any
meaning it's inclined to, and that meaning may be implementation-
defined.

Alternatively, the implementation may issue the diagnostic and
spit the sorry source back in your face.

> The same
> issue could arise with any other str* function. Or is there something
> specific about your example that I'm missing ?

The required diagnostic, I think. 6.5.2.2p2, plus 6.3.2.3's
omission of any description of the necessary conversion.

>> Yes. This is, IMHO, a weakness in the library design, a weakness
>> inherited from the pre-Standard days that also gave us gets(). The
>> practical consequence is that the implementation must define the
>> behavior "usefully" in order to make the library work as desired.
>> (The situation is particularly bad for systems with signed-magnitude
>> or ones' complement notations, where the sign of zero is obliterated
>> on conversion to unsigned char and thus cannot be recovered again
>> after getc().)
>
> If getc() read int's from files instead of unsigned char's would it be
> realistically possible that reading from a file would return a negative
> zero ? That would be one strange file.

One strange text file, yes. But not so strange for a binary
file, where any bit pattern at all might appear. If a char that looks
like minus zero appears somewhere in the middle of a double, and you
fwrite() that double to a binary stream, the underlying fputc() calls
(a direct requirement; not even an "as if") convert each byte in turn
from unsigned char to int. I think the conversion allows the bits to
be diddled irreversibly -- although on reconsideration it may happen
only when sizeof(int)==1 as well.

>> In-band signaling works well in some situations -- NULL for a
>> failed malloc() or strchr() or getenv(), for example -- but C has
>> used it in situations where the benefits are not so clear. getc()
>> is one of those, strtoxxx() is another, and no doubt there are other
>> situations where the "error return" can be confused with a perfectly
>> valid value.
>
> I don't see how this can happen with getc().

When sizeof(int)==1, there will exist a perfectly valid unsigned
char value whose conversion to int yields EOF. (Or else there will
exist two or more distinct unsigned char values that convert to the
same int value, which is even worse and violates 7.19.2p3.) So
checking the value of getc() against EOF isn't quite enough: Having
found EOF, you also need to call feof() and ferror() before concluding
that it's "condition" rather than "data." More information is being
forced through the return-value channel than the unaided channel
can accommodate.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,

Mar 11, 2011, 9:53:43 PM3/11/11

to

On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
> [...]

> Ok , I guess it could happen. But then I have a different objection. Eric said
>
> (The situation is particularly bad for systems with
> signed-magnitude or ones' complement notations, where the
> sign of zero is obliterated on conversion to unsigned char
> and thus cannot be recovered again after getc().)
>
> It seems to me that an implementation can easily ensure that the sign
> of zero does not get obliterated. If by using fgetc() an unsigned char
> gets the bit pattern which corresponds to negative zero then the
> implementation can assign the negative zero when converting to int .
> The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

--
Eric Sosman
eso...@ieee-dot-org.invalid

J. J. Farrell

unread,

Mar 11, 2011, 9:58:00 PM3/11/11

to

No, a cast would buy you freedom from a warning with some compilers.

Eric Sosman

unread,

Mar 11, 2011, 9:59:21 PM3/11/11

to

On 3/11/2011 4:57 PM, lawrenc...@siemens.com wrote:
> Tim Rentsch<t...@alumni.caltech.edu> wrote:
>>
>> A call to getc() cannot return negative zero. The reason is,
>> getc() is defined in terms of fgetc(), which returns an
>> 'unsigned char' converted to an 'int', and such conversions
>> cannot produce negative zeros.
>
> They can if char and int are the same size.

Despite 6.2.6.2p3? In ISO/IEC 9899:TC3 (perhaps the wording
has changed in more recent versions), "conversion" is not listed
among the operations that can generate a negative zero. Even if
a negative zero arises, this paragraph says it's unspecified whether
storing it in an object stores a negative or a "normal" zero.

--
Eric Sosman
eso...@ieee-dot-org.invalid

J. J. Farrell

unread,

Mar 11, 2011, 10:13:45 PM3/11/11

to

Joe Wright wrote:
> ...

>
> The return type of fgetc() is int so as to allow the full 0..255 value
> of a byte AND a value EOF.

... assuming the value range of a byte is limited to 0..255 which it
need not be. In particular, a byte can be the same size as an int.

> When you assign int to char, the char takes
> the lower eight bits of the int without change.

No, no, no, no, no. ISO/IEC9899:1999 6.3.1.3 Signed and unsigned integers:

"When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it is
unchanged.

Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.

Otherwise, the new type is signed and the value cannot be represented in
it; either the result is implementation-defined or an
implementation-defined signal is raised."

> Try this:

>
> #include <stdio.h>
> int main(void) {
> char c;
> unsigned char u;
> int i = 240;
> c = i;
> u = c;
> printf("%d, %d, %d\n", i, c, u);
> return 0;
> }
>
> I get: 240, -16, 240 as I expected.

You had no right to expect it. It's a common implementation, but far
from guaranteed.

Keith Thompson

unread,

Mar 11, 2011, 11:14:54 PM3/11/11

to

6.3p2:

Conversion of an operand value to a compatible type causes no
change to the value or the representation.

Looks like a mild inconsistency.

pete

unread,

Mar 12, 2011, 12:14:20 AM3/12/11

to

Keith Thompson wrote:
>
> Eric Sosman <eso...@ieee-dot-org.invalid> writes:
> > On 3/11/2011 4:57 PM, lawrenc...@siemens.com wrote:
> >> Tim Rentsch<t...@alumni.caltech.edu> wrote:
> >>>
> >>> A call to getc() cannot return negative zero. The reason is,
> >>> getc() is defined in terms of fgetc(), which returns an
> >>> 'unsigned char' converted to an 'int', and such conversions
> >>> cannot produce negative zeros.
> >>
> >> They can if char and int are the same size.
> >
> > Despite 6.2.6.2p3? In ISO/IEC 9899:TC3 (perhaps the wording
> > has changed in more recent versions), "conversion" is not listed
> > among the operations that can generate a negative zero. Even if
> > a negative zero arises, this paragraph says it's unspecified whether
> > storing it in an object stores a negative or a "normal" zero.
>
> 6.3p2:
>
> Conversion of an operand value to a compatible type causes no
> change to the value or the representation.
>
> Looks like a mild inconsistency.

I don't see the relevance of that quote,
because it is about compatible type conversion,
and I don't see anything about compatible types
in the above quoted post.

--
pete

Tim Rentsch

unread,

Mar 12, 2011, 4:01:19 AM3/12/11

to

lawrenc...@siemens.com writes:

> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>>
>> A call to getc() cannot return negative zero. The reason is,
>> getc() is defined in terms of fgetc(), which returns an
>> 'unsigned char' converted to an 'int', and such conversions
>> cannot produce negative zeros.
>
> They can if char and int are the same size.

Yes, implementations that have sizeof(int) == 1 and that
use signed magnitude or one's complement are an exception
to what I said, and I should have mentioned that.

Do any such implementations actually exist? Certainly I'm
not aware of any.

Tim Rentsch

unread,

Mar 12, 2011, 4:06:19 AM3/12/11

to

Joe Wright <joeww...@comcast.net> writes:

> On 3/11/2011 16:29, Tim Rentsch wrote:
>> Angel<angel...@spamcop.net> writes:
>>
>>> [snip]
>>>
>>> UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
>>> char (it will fit in a signed char too,
>>
>> It will on most implementations but the Standard does not
>> require that.
>>
>>> but values>127 will be converted to negative values),
>>
>> Again true on most implementations but not Standard-guaranteed.
>
> I must be missing your point. What does UTF-8 have to do with the Standard?

My comment was not about UTF-8 but about 8-bit values (ie
256 distinct non-negative values); these don't necessarily
fit in a 'signed char', etc.

Tim Rentsch

unread,

Mar 12, 2011, 4:33:14 AM3/12/11

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 3/11/2011 4:57 PM, lawrenc...@siemens.com wrote:
>> Tim Rentsch<t...@alumni.caltech.edu> wrote:
>>>
>>> A call to getc() cannot return negative zero. The reason is,
>>> getc() is defined in terms of fgetc(), which returns an
>>> 'unsigned char' converted to an 'int', and such conversions
>>> cannot produce negative zeros.
>>
>> They can if char and int are the same size.
>
> Despite 6.2.6.2p3?

Yes.

> In ISO/IEC 9899:TC3 (perhaps the wording
> has changed in more recent versions),

Still the same as of N1547.

> "conversion" is not listed
> among the operations that can generate a negative zero.

Conversion of an in-range value cannot. Conversion of an
out-of-range value gives an implementation-defined result,
which may be defined to be (not to generate) negative zero
for certain values. Subtle distinction, I admit, but I
believe that is how the committee expects these statements
will be read.

> Even if
> a negative zero arises, this paragraph says it's unspecified whether
> storing it in an object stores a negative or a "normal" zero.

Because the behavior is unspecified, an implementation may
define it to store the negative zero faithfully. 'Unspecified'
doesn't mean outside of the implementation's control, it
just means the implementation isn't obliged to document
what choices it makes.

Tim Rentsch

unread,

Mar 12, 2011, 4:49:30 AM3/12/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

> On Fri, 11 Mar 2011 13:08:00 -0800
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Spiros Bousbouras <spi...@gmail.com> writes:
>>
>> > If getc() read int's from files instead of unsigned char's would it be
>> > realistically possible that reading from a file would return a negative
>> > zero ?
>>
>> A call to getc() cannot return negative zero. The reason is,
>> getc() is defined in terms of fgetc(), which returns an
>> 'unsigned char' converted to an 'int', and such conversions
>> cannot produce negative zeros.
>
> When I said "getc() read int's from files" I meant that also fgetc()
> reads int's from files i.e. we're talking about an alternative C where
> we don't have the intermediate unsigned char step.

Ahh, I didn't understand that. I don't know what would
happen in alternative C; I don't have any kind of reference
manual or standards document for that language.

> Apart from that , in post
>
> <kfnsjut...@x-alumni2.alumni.caltech.edu>
> http://groups.google.com/group/comp.lang.c/msg/1909c5fe30c02e81?dmode=source
>
> you say
>
> Do you mean to say that if a file has a byte with a bit
> pattern corresponding to a 'char' negative-zero, and
> that byte is read (in binary mode) with getc(), the
> result of getc() will be zero? If that's what you're
> saying I believe that is wrong.
>
> Assuming actual C (i.e. not the alternative C from above) is it not
> possible in the scenario you're describing that int will get negative
> zero ?

As Larry Jones reminded me, it is possible for this to
happen in implementations that have sizeof(int) == 1 (and
that use representations with negative zeros in them). I'm
not aware of any such implementations but the Standard does
allow them. Other than that, it isn't.

Keith Thompson

unread,

Mar 12, 2011, 3:38:23 PM3/12/11

to

Ah, you're right.

I think it's still a mild inconsistency, but it doesn't apply to
the situation we're discussing. For example, suppose ptrdiff_t is
compatible with long. Then converting (say, via an explicit cast)
a negative zero of type long to ptrdiff_t would yield a negative
zero of type ptrdiff_t -- but a cast is not one of the operations
that can yield a negative zero.

This is probably one of the most obscure corner cases I've ever
run across.

lawrenc...@siemens.com

unread,

Mar 12, 2011, 11:27:48 PM3/12/11

to

Keith Thompson <ks...@mib.org> wrote:
>
> For example, suppose ptrdiff_t is
> compatible with long. Then converting (say, via an explicit cast)
> a negative zero of type long to ptrdiff_t would yield a negative
> zero of type ptrdiff_t -- but a cast is not one of the operations
> that can yield a negative zero.

The standard lists the operations that can *generate* a negative zero.
One could argue that operations like cast and assignment simply preserve
an existing negative zero rather than generating a new one.
--
Larry Jones

Oh yeah? You just wait! -- Calvin

Eric Sosman

unread,

Mar 13, 2011, 8:45:16 AM3/13/11

to

Even if a negative zero arises without being "generated," the
Standard does not assure us that negativity is preserved. 6.2.6.2p3:

"It is unspecified [...] whether a negative zero becomes
a normal zero when stored in an object."

--
Eric Sosman
eso...@ieee-dot-org.invalid

Phil Carmody

unread,

Mar 13, 2011, 9:30:37 AM3/13/11

to

That prevents ``signed char s = -0;'' from making s a negative zero?
Was that really intended?

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

pete

unread,

Mar 13, 2011, 9:50:20 AM3/13/11

to

Phil Carmody wrote:

> That prevents ``signed char s = -0;'' from making s a negative zero?
> Was that really intended?

(-0) cannot be a negative zero.

A bitwise operation has to be involved somewhere
in the generation of a negative zero.
That seems to be the intention of the standard to me.

--
pete

Eric Sosman

unread,

Mar 13, 2011, 10:16:56 AM3/13/11

to

On 3/13/2011 9:30 AM, Phil Carmody wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:
>> On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
>>> [...]
>>> Ok , I guess it could happen. But then I have a different objection. Eric said
>>>
>>> (The situation is particularly bad for systems with
>>> signed-magnitude or ones' complement notations, where the
>>> sign of zero is obliterated on conversion to unsigned char
>>> and thus cannot be recovered again after getc().)
>>>
>>> It seems to me that an implementation can easily ensure that the sign
>>> of zero does not get obliterated. If by using fgetc() an unsigned char
>>> gets the bit pattern which corresponds to negative zero then the
>>> implementation can assign the negative zero when converting to int .
>>> The standard allows this.
>>
>> Could you indicate where? I'm looking at 6.2.6.2p3, which lists
>> the operations that can generate a minus zero, and does not list
>> "conversion" among them.
>
> That prevents ``signed char s = -0;'' from making s a negative zero?

Yes.

> Was that really intended?

It certainly *looks* intentional, but I wasn't one of the
authors and can't speak for them.

Note that if you acquire a negative zero somehow, then

int minus_zero = ...whatever...;
int normal_zero = - minus_zero;

... might yield either zero, not necessarily a normal zero.
(That's my reading, anyhow.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Joe Wright

unread,

Mar 13, 2011, 10:27:02 AM3/13/11

to

I got into this computer stuff in 1963 at Philco. The other major players
of the time were IBM and Univac. None of us used ones-complement. I learned
it in school but I've never seen it in production. Minus zero is a concept
but not an actuality in my experience.

Are there any current instances of signed-magnitude or ones-complement
systems we might encounter in the 'real world'?

Eric Sosman

unread,

Mar 13, 2011, 10:42:58 AM3/13/11

to

On 3/13/2011 10:27 AM, Joe Wright wrote:
> [...]

> I got into this computer stuff in 1963 at Philco. The other major
> players of the time were IBM and Univac. None of us used
> ones-complement. I learned it in school but I've never seen it in
> production. Minus zero is a concept but not an actuality in my experience.
>
> Are there any current instances of signed-magnitude or ones-complement
> systems we might encounter in the 'real world'?

Marginal topicality ...

Dunno about "current," but I've seen both conventions on old
systems. The first machine I wrote programs for was an IBM 1620
with signed magnitude decimal arithmetic, and a little later I had
a very brief encounter with a ones' complement Univac system. No
C implementation on either system, though.

If you're interested in finding current examples, I think the
places to look would be among the embedded and special-purpose
processors. The really small ones (hand-held calculators and so
on) probably can't support C, but there might be C-programmable
"exotic" CPU's operating traffic lights and gathering telemetry
and controlling your car's fuel injectors. And burning your toast,
of course.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Tim Rentsch

unread,

Mar 13, 2011, 11:23:17 AM3/13/11

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>> On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
>> > [...]
>> > Ok , I guess it could happen. But then I have a different objection. Eric said
>> >
>> > (The situation is particularly bad for systems with
>> > signed-magnitude or ones' complement notations, where the
>> > sign of zero is obliterated on conversion to unsigned char
>> > and thus cannot be recovered again after getc().)
>> >
>> > It seems to me that an implementation can easily ensure that the sign
>> > of zero does not get obliterated. If by using fgetc() an unsigned char
>> > gets the bit pattern which corresponds to negative zero then the
>> > implementation can assign the negative zero when converting to int .
>> > The standard allows this.
>>
>> Could you indicate where? I'm looking at 6.2.6.2p3, which lists
>> the operations that can generate a minus zero, and does not list
>> "conversion" among them.
>
> That prevents ``signed char s = -0;'' from making s a negative zero?

Yes. Surprising but true.

> Was that really intended?

Apparently it was.

Tim Rentsch

unread,

Mar 13, 2011, 11:25:41 AM3/13/11

to

pete <pfi...@mindspring.com> writes:

Or some kinds of conversions, although one might argue
those could be thought of as bitwise operations also.

Tim Rentsch

unread,

Mar 13, 2011, 11:28:13 AM3/13/11

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:

The Standard doesn't, but an implementation can. We are
after all talking about implementation-specific behavior
here.

Joe Wright

unread,

Mar 13, 2011, 11:40:55 AM3/13/11

to

I'm not really looking for them. I'm curious why the designer of an ALU
would choose other than twos-complement.

Keith Thompson

unread,

Mar 13, 2011, 2:21:51 PM3/13/11

to

And I think it makes a certain amount of sense. It means that this:

int i = /* something */
int j = -i;

won't store a negative zero in j, even if the value of i is 0.

Of course, one could argue about whether that's desirable. In any case,
having *some* rules that let you avoid spurious negative zeros in
general calculations seems like a good idea.

(Using two's-complement seems like an even better idea.)

Barry Schwarz

unread,

Mar 13, 2011, 3:13:43 PM3/13/11

to

On Sun, 13 Mar 2011 10:42:58 -0400, Eric Sosman
<eso...@ieee-dot-org.invalid> wrote:

> Dunno about "current," but I've seen both conventions on old
>systems. The first machine I wrote programs for was an IBM 1620
>with signed magnitude decimal arithmetic, and a little later I had

Still the finest machine ever invented for introducing students to the
fundamentals of computer programming.

--
Remove del for email

Spiros Bousbouras

unread,

Mar 16, 2011, 3:24:36 PM3/16/11

to

On Fri, 11 Mar 2011 21:42:59 -0500
Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> On 3/11/2011 2:39 PM, Spiros Bousbouras wrote:
> >
> > If getc() read int's from files instead of unsigned char's would it be
> > realistically possible that reading from a file would return a negative
> > zero ? That would be one strange file.
>
> One strange text file, yes. But not so strange for a binary
> file, where any bit pattern at all might appear. If a char that looks
> like minus zero appears somewhere in the middle of a double, and you
> fwrite() that double to a binary stream, the underlying fputc() calls
> (a direct requirement; not even an "as if") convert each byte in turn
> from unsigned char to int. I think the conversion allows the bits to
> be diddled irreversibly -- although on reconsideration it may happen
> only when sizeof(int)==1 as well.

I see now that a negative zero in a file is a realistic possibility.
But if the bits were diddled in a way that would turn a negative zero
to a regular zero wouldn't that violate 7.19.2p3 ? I guess it depends
on what the "shall compare equal" part exactly means. IMO it should
mean equal as bit patterns in which case no bit diddling is allowed.

Another thing : I don't see why the "as if" rule wouldn't apply
to the specification of fwrite() just as much as it applies to
everything else. Why would the standard force the implementations to
actually repeatedly call fputc() ? Perhaps there is some operating
system specific way to achieve the same result faster without calling
fputc() .Why would the standard forbid that ?

> >> In-band signaling works well in some situations -- NULL for a
> >> failed malloc() or strchr() or getenv(), for example -- but C has
> >> used it in situations where the benefits are not so clear. getc()
> >> is one of those, strtoxxx() is another, and no doubt there are other
> >> situations where the "error return" can be confused with a perfectly
> >> valid value.
> >
> > I don't see how this can happen with getc().
>
> When sizeof(int)==1, there will exist a perfectly valid unsigned
> char value whose conversion to int yields EOF. (Or else there will
> exist two or more distinct unsigned char values that convert to the
> same int value, which is even worse and violates 7.19.2p3.) So
> checking the value of getc() against EOF isn't quite enough: Having
> found EOF, you also need to call feof() and ferror() before concluding
> that it's "condition" rather than "data." More information is being
> forced through the return-value channel than the unaided channel
> can accommodate.

So then this means that the common idiom
int a;
...
while ( (a = fgetc(f)) != EOF ) ...

is actually wrong ! (Unless someone has already checked that
sizeof(int) != 1 but I don't imagine you'll see a lot of code which
does that.) Yikes , I can't believe I've been doing it wrong all
those years. Has anyone seen a book which mentions the issue ?

--
Metadiscussion is evil !

Spiros Bousbouras

unread,

Mar 16, 2011, 3:37:45 PM3/16/11

to

That's my reading too but the problem is what happens when you read
from a file ? I think the standard would be more clear if it said that
reading from a binary stream can also generate a negative zero.

Spiros Bousbouras

unread,

Mar 16, 2011, 3:43:37 PM3/16/11

to

On Fri, 11 Mar 2011 21:55:54 GMT
Spiros Bousbouras <spi...@gmail.com> wrote:

> On Fri, 11 Mar 2011 11:53:57 -0800
> Keith Thompson <ks...@mib.org> wrote:
> > Spiros Bousbouras <spi...@gmail.com> writes:

> > > On Thu, 10 Mar 2011 20:37:09 -0500
> > > Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> > >> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
> > >> > If you are reading from a file by successively calling fgetc() is there
> > >> > any point in storing what you read in anything other than unsigned
> > >> > char ?
> > >>
> > >> Sure. To see one reason in action, try
> > >>
> > >> unsigned char uchar_password[SIZE];
> > >> ...
> > >> if (strcmp(uchar_password, "SuperSecret") == 0) ...
> > >
> > > Just to be clear , the only thing that can go wrong with this example
> > > is that strcmp() may try to convert the elements of uchar_password to
> > > char thereby causing the implementation defined behavior. The same
> > > issue could arise with any other str* function. Or is there something
> > > specific about your example that I'm missing ?
> >
> > The call to strcmp() violates a constraint. strcmp() expects const
> > char* (a non-const char* is also ok), but uchar_password, after
> > the implicit conversion is of type unsigned char*. Types char*
> > and unsigned char* are not compatible, and there is no implicit
> > conversion from one to the other.
>
> I see. I assumed that the implicit conversion would be ok because
> paragraph 27 of 6.2.5 says "A pointer to void shall have the same
> representation and alignment requirements as a pointer to a character
> type.39)" and footnote 39 says "The same representation and alignment
> requirements are meant to imply interchangeability as arguments to
> functions, return values from functions, and members of unions." I
> assumed that the relation "same representation and alignment
> requirements" is transitive.
>
> On the other hand footnote 35 of paragraph 15 says that char is not
> compatible with signed or unsigned char and in 6.7.5.1 we read that
> pointers to types are compatible only if the types are compatible. We
> must conclude then that the relation "same representation and alignment
> requirements" is not transitive. That's a damn poor choice of
> terminology then.

Actually if the relation "same representation and alignment
requirements" were transitive *and* symmetric then we could conclude
that the implicit conversion would be ok. The word "same" suggests to
me a relation which is transitive and symmetric so I still think it's a
poor choice of terminology.

Keith Thompson

unread,

Mar 16, 2011, 3:50:45 PM3/16/11

to

Spiros Bousbouras <spi...@gmail.com> writes:
> On Fri, 11 Mar 2011 21:42:59 -0500
> Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
>> On 3/11/2011 2:39 PM, Spiros Bousbouras wrote:
>> >
>> > If getc() read int's from files instead of unsigned char's would it be
>> > realistically possible that reading from a file would return a negative
>> > zero ? That would be one strange file.
>>
>> One strange text file, yes. But not so strange for a binary
>> file, where any bit pattern at all might appear. If a char that looks
>> like minus zero appears somewhere in the middle of a double, and you
>> fwrite() that double to a binary stream, the underlying fputc() calls
>> (a direct requirement; not even an "as if") convert each byte in turn
>> from unsigned char to int. I think the conversion allows the bits to
>> be diddled irreversibly -- although on reconsideration it may happen
>> only when sizeof(int)==1 as well.
>
> I see now that a negative zero in a file is a realistic possibility.
> But if the bits were diddled in a way that would turn a negative zero
> to a regular zero wouldn't that violate 7.19.2p3 ? I guess it depends
> on what the "shall compare equal" part exactly means. IMO it should
> mean equal as bit patterns in which case no bit diddling is allowed.

I think "shall compare equal" refers to the "==" operator.
It doesn't make sense for it to refer to bit-level representations --
nor is it necessary.

In the hypothetical C-like language you're describing, an input
file some of whose bytes contain negative zeros would indeed cause
problems; it wouldn't necessarily be possible to read data from a
binary file and write it out again without losing information.

Which is why the standard actually specifies that fgetc() reads
unsigned char values, which have a one-to-one mapping to bit-level
representations. There are no two representations that have the
same value, so the problem you're worried about doesn't arise.

Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
rather than 100% portable. It works just fine *unless* sizeof(int) == 1,
which implies CHAR_BIT >= 16.

As far as I know, all existing hosted C implementations have
CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
aren't even required to support stdio).

If I were worried about the possibility, rather than adding calls
to feof() and ferror(), I'd probably add
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
And if I ever see that error message, it almost certainly means
that I forgot to add the "#include <limits.h>"

(Actually, checking that sizeof(int) > 1 would be better, since
the usual EOF check works just fine on a system with 16-bit char
and 32-bit int, but that's a little harder to check at compile time.)

Spiros Bousbouras

unread,

Mar 16, 2011, 4:14:58 PM3/16/11

to

On Fri, 11 Mar 2011 17:47:25 -0800
Keith Thompson <ks...@mib.org> wrote:
> Spiros Bousbouras <spi...@gmail.com> writes:

> > On Fri, 11 Mar 2011 13:08:00 -0800

> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >> Spiros Bousbouras <spi...@gmail.com> writes:
> >>

> >> > If getc() read int's from files instead of unsigned char's would it be
> >> > realistically possible that reading from a file would return a negative
> >> > zero ?
> >>

> >> A call to getc() cannot return negative zero. The reason is,
> >> getc() is defined in terms of fgetc(), which returns an
> >> 'unsigned char' converted to an 'int', and such conversions
> >> cannot produce negative zeros.
> >

> > When I said "getc() read int's from files" I meant that also fgetc()
> > reads int's from files i.e. we're talking about an alternative C where
> > we don't have the intermediate unsigned char step.
>

> I'm afraid I'm not following you here.
>
> I initially assumed you meant getc and fgetc would be reading
> int-sized chunks from the file, rather than (as C currently
> specifies) reading bytes, interpreting them as unsigned char,
> and converting that to int.
>
> Without the intermediate step, how is the int value determined?

Actually , all this digression about an alternative getc() was unneeded
and it was a very poor way to ask whether a file can contain negative
zeros. But what I had in mind was simply that the implementation reads
a bit pattern which can fit into an int and puts it in a int .

> Perhaps you mean getc and fgetc read a byte from the file, interpret
> is as *plain* char, and then convert the result to int.

No , no conversions at all.

> If so, and if plain char is signed and has a distinct representation
> for negative zero (this excludes 2's-complement systems), then
> could getc() return a negative zero?
>
> I'd say no. Converting a negative zero from char to int does not
> yield a negative zero int; 6.2.6.2p3 specifies the operations that
> might generate a negative zero, and conversions aren't in the list.

I don't think the problem is the conversion. I'm with
<4ntt48...@jones.homeip.net>
http://groups.google.com/group/comp.lang.c/msg/bdfed4e3a92d711c?dmode=source
on this one. But whether the actual reading can generate a negative
zero , I feel the standard could be clearer on this.

> Which means that getc() and fgetc() would be unable to distinguish
> between a positive and negative zero in a byte read from a file.
> Which is probably part of the reason why the standard specifies
> that the value is treated as an unsigned char.

Yes , unsigned char is the type which allows one to deal with arbitrary
bit patterns so it's the appropriate type to use for reading from a
stream which might be binary.

> Or the standard could have said specifically that getc and fgetc do
> return a negative zero in these cases, but dealing with that in code
> would be nasty (and, since most current systems don't have negative
> zeros, most programmers wouldn't bother).

I don't see why any special dealing would be needed. In an
implementation which is one's complement or sign and magnitute but does
not support negative zeros the programmer needs to be careful not to
accidentally create such a pattern but if negative zeros are supported
then it's completely transparent to the programmer.

--
Of course this doesn't mean that the sciences haven't also been tools
of capitalism, imperialism, and all the rest: but the reason chemistry
is a much better tool of imperialist domination than alchemy is that
it's much more true.
Cosma Shalizi

Spiros Bousbouras

unread,

Mar 16, 2011, 4:18:39 PM3/16/11

to

On Sat, 12 Mar 2011 01:49:30 -0800

Tim Rentsch <t...@alumni.caltech.edu> wrote:
> Spiros Bousbouras <spi...@gmail.com> writes:
>
> > On Fri, 11 Mar 2011 13:08:00 -0800
> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >> Spiros Bousbouras <spi...@gmail.com> writes:
> >>
> >> > If getc() read int's from files instead of unsigned char's would it be
> >> > realistically possible that reading from a file would return a negative
> >> > zero ?
> >>
> >> A call to getc() cannot return negative zero. The reason is,
> >> getc() is defined in terms of fgetc(), which returns an
> >> 'unsigned char' converted to an 'int', and such conversions
> >> cannot produce negative zeros.
> >
> > When I said "getc() read int's from files" I meant that also fgetc()
> > reads int's from files i.e. we're talking about an alternative C where
> > we don't have the intermediate unsigned char step.
>
> Ahh, I didn't understand that. I don't know what would
> happen in alternative C; I don't have any kind of reference
> manual or standards document for that language.

As I just said in a different post , on this occasion invoking an
alternative C was pointless. But the way it works in general is that
you take the usual C , change some bits and pieces in the standard and
you have your new C.

Spiros Bousbouras

unread,

Mar 16, 2011, 4:28:58 PM3/16/11

to

On 13 Mar 2011 15:30:37 +0200

Why wouldn't it be ? I don't think there's any reason a programmer
would want to create a negative zero even where it's supported so the
standard doesn't give you a way to create one.

Pushkar Prasad

unread,

Mar 16, 2011, 5:01:14 PM3/16/11

to

Thanks everybody.

I will explore further based on your inputs. So far I have tried inline
assembly to LOCK and increment / decrement the global, used
InterlockedIncrement() API in Windows SDK without any success. For small
number of threads the modifications to the Global seems to be serialized
but when I spawn thousands of thread then things get messy due to thread
rescheduling.

As I mentioned earlier. I can put Critical Section around the Global and
ensure that the functions acquire the Critical Section for updating the
global but that will create too much of contention point in my code. I was
looking for a leaner way of doing it, InterlockedIncrement() and
InterlockedDecrement() looked to be suitable initially but it failed when
I spawned thousands of threads.

Thanks & Regards
Pushkar Prasad

Pushkar Prasad

unread,

Mar 16, 2011, 5:11:31 PM3/16/11

to

My apologies to everybody. The response below was meant for another
thread. Please ignore my response on this thread.

Thanks & Regards
Pushkar Prasad

Spiros Bousbouras

unread,

Mar 16, 2011, 5:19:44 PM3/16/11

to

On Wed, 16 Mar 2011 12:50:45 -0700
Keith Thompson <ks...@mib.org> wrote:
> Spiros Bousbouras <spi...@gmail.com> writes:
> > On Fri, 11 Mar 2011 21:42:59 -0500
> > Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> >>
> >> When sizeof(int)==1, there will exist a perfectly valid unsigned
> >> char value whose conversion to int yields EOF. (Or else there will
> >> exist two or more distinct unsigned char values that convert to the
> >> same int value, which is even worse and violates 7.19.2p3.) So
> >> checking the value of getc() against EOF isn't quite enough: Having
> >> found EOF, you also need to call feof() and ferror() before concluding
> >> that it's "condition" rather than "data." More information is being
> >> forced through the return-value channel than the unaided channel
> >> can accommodate.
> >
> > So then this means that the common idiom
> > int a;
> > ...
> > while ( (a = fgetc(f)) != EOF ) ...
> >
> > is actually wrong ! (Unless someone has already checked that
> > sizeof(int) != 1 but I don't imagine you'll see a lot of code which
> > does that.) Yikes , I can't believe I've been doing it wrong all
> > those years. Has anyone seen a book which mentions the issue ?
>
> Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
> rather than 100% portable.

It can unexpectedly give wrong results i.e. you'd think that the file
terminated before it actually did. I call that wrong.

> It works just fine *unless* sizeof(int) == 1,
> which implies CHAR_BIT >= 16.
>
> As far as I know, all existing hosted C implementations have
> CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
> aren't even required to support stdio).
>
> If I were worried about the possibility, rather than adding calls
> to feof() and ferror(),

You do need to call one of the two in order to distinguish between
error and end of file.

> I'd probably add
> #if CHAR_BIT != 8
> #error "CHAR_BIT != 8"
> #endif
> And if I ever see that error message, it almost certainly means
> that I forgot to add the "#include <limits.h>"

> (Actually, checking that sizeof(int) > 1 would be better, since
> the usual EOF check works just fine on a system with 16-bit char
> and 32-bit int, but that's a little harder to check at compile time.)

Personally I think it's more straightforward to use feof() and ferror()
which is what I'll be doing from now on.

--
But one must remember that there is such a thing as good bad taste
and bad bad taste.
John Waters

Keith Thompson

unread,

Mar 16, 2011, 7:28:42 PM3/16/11

to

But that wrongness can only occur on a vanishingly small number of
platforms -- quite possibly nonexistent in real life.

fgetc(f) can return EOF under only three circumstances:

1. You've reached the end of the file (very common).

2. You've encountered an error condition (rarer, but certainly worth
worrying about).

3. You're on a system where int cannot represent all values of type
char plus the value EOF without loss of information, *and* you've
just read a byte from the file whose value happens to match EOF
(typically -1). This can only happen if int has no more sign and
value bits than char, which is possible only if CHAR_BIT >= 16.

>> It works just fine *unless* sizeof(int) == 1,
>> which implies CHAR_BIT >= 16.
>>
>> As far as I know, all existing hosted C implementations have
>> CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
>> aren't even required to support stdio).
>>
>> If I were worried about the possibility, rather than adding calls
>> to feof() and ferror(),
>
> You do need to call one of the two in order to distinguish between
> error and end of file.

True.

>> I'd probably add
>> #if CHAR_BIT != 8
>> #error "CHAR_BIT != 8"
>> #endif
>> And if I ever see that error message, it almost certainly means
>> that I forgot to add the "#include <limits.h>"
>
>> (Actually, checking that sizeof(int) > 1 would be better, since
>> the usual EOF check works just fine on a system with 16-bit char
>> and 32-bit int, but that's a little harder to check at compile time.)
>
> Personally I think it's more straightforward to use feof() and ferror()
> which is what I'll be doing from now on.

Suppose your code is running on a system with 16-bit char and 16-bit
int, and it reads a byte with the value 0xffff, which yields -1 when
converted to int (note that even this is implementation-defined), which
happens to be the value of EOF. Since you've added extra checks, you
notice that feof() and ferror() both returned 0, meaning the -1 is a
value read from the file. How sure are you that you can handle that
value correctly? I'm not necessarily saying you can't, but I can
imagine that there might be some subtle problems.

More to the point, how are you going to be able to test your code?
Unless you have access to an exotic system, or unless you replace
fgetc() with your own version, the code that handles this case will
never be executed.

Peter Nilsson

unread,

Mar 16, 2011, 8:06:49 PM3/16/11

to

Joe Wright <joewwri...@comcast.net> wrote:

> Eric Sosman wrote:
> > If you're interested in finding current examples, I think the
> > places to look would be among the embedded and special-purpose
> > processors.

DSPs are special-purpose, but they're not exactly rare.

> > The really small ones (hand-held calculators and so
> > on) probably can't support C, but there might be C-programmable
> > "exotic" CPU's operating traffic lights and gathering telemetry
> > and controlling your car's fuel injectors. And burning your toast,
> > of course.

And various mobile phones.

> I'm not really looking for them. I'm curious why the designer of
> an ALU would choose other than twos-complement.

I've read they tend to be simpler to manufacture. Note there are many
ones' complement checksum routines. Also, signal processing tends to
be simpler if you don't have to work with a biased range, i.e. -127
to 127 is simpler to work with than -128 to 127.

--
Peter

Eric Sosman

unread,

Mar 16, 2011, 8:26:08 PM3/16/11

to

On 3/16/2011 8:06 PM, Peter Nilsson wrote:
> Joe Wright<joewwri...@comcast.net> wrote:
>> Eric Sosman wrote:
>>> If you're interested in finding current examples, I think the
>>> places to look would be among the embedded and special-purpose
>>> processors.
>
> DSPs are special-purpose, but they're not exactly rare.

Nor did I or anyone else say they were. Not exactly.

>>> The really small ones (hand-held calculators and so
>>> on) probably can't support C, but there might be C-programmable
>>> "exotic" CPU's operating traffic lights and gathering telemetry
>>> and controlling your car's fuel injectors. And burning your toast,
>>> of course.
>
> And various mobile phones.

Your phone does toast? *My* phone got toasted ...

--
Eric Sosman
eso...@ieee-dot-org.invalid

James Kuyper

unread,

Mar 16, 2011, 9:03:50 PM3/16/11

to

Sorry for the late response - I meant to reply to this message promptly,
but lost track of it.

On 03/11/2011 04:55 PM, Spiros Bousbouras wrote:
> On Fri, 11 Mar 2011 11:53:57 -0800
> Keith Thompson<ks...@mib.org> wrote:
...

>> The call to strcmp() violates a constraint. strcmp() expects const
>> char* (a non-const char* is also ok), but uchar_password, after
>> the implicit conversion is of type unsigned char*. Types char*
>> and unsigned char* are not compatible, and there is no implicit
>> conversion from one to the other.
>
> I see. I assumed that the implicit conversion would be ok because
> paragraph 27 of 6.2.5 says "A pointer to void shall have the same
> representation and alignment requirements as a pointer to a character
> type.39)" and footnote 39 says "The same representation and alignment
> requirements are meant to imply interchangeability as arguments to
> functions, return values from functions, and members of unions." I
> assumed that the relation "same representation and alignment
> requirements" is transitive.

It is, but having the "same representation and alignment requirements"
is not sufficient for two types to be compatible with each other, nor to
ensure that there's an implicit conversion from one to the other.

Now, "The same representation and alignment requirements are meant to

imply interchangeability as arguments to functions, return values from

functions, and members of unions", but even interchangeable types are
not the same thing as compatible types.

Also, that footnote is not normative, and while it is "meant to imply
...", it does not actually imply it. Two types, with exactly the same
representation and alignment, could fail to be interchangeable for those
purposes if an implementation choses to treat them differently, for
instance by passing them in function calls by different mechanisms. The
fact that they are not compatible types is sufficient to allow such a
decision.

--
James Kuyper

Joe Wright

unread,

Mar 16, 2011, 10:29:53 PM3/16/11

to

It will be executed if it is the only code there is. Just to state the
problem I'll be solving..
int c;
while ((c = fgetc(f)) != EOF)
..which doesn't work because char and int are the same width so the value
of EOF cannot be distinguisled from valid data. Why not..
int c;
while ((c = fgetc(f)), ! feof(f))
..might work.

lawrenc...@siemens.com

unread,

Mar 17, 2011, 5:43:14 AM3/17/11

to

James Kuyper <james...@verizon.net> wrote:
>
> Also, that footnote is not normative, and while it is "meant to imply
> ...", it does not actually imply it. Two types, with exactly the same
> representation and alignment, could fail to be interchangeable for those
> purposes if an implementation choses to treat them differently, for
> instance by passing them in function calls by different mechanisms. The
> fact that they are not compatible types is sufficient to allow such a
> decision.

But the fact that they are there is sufficient to indicate that such a
decision is exceedingly unwise.
--
Larry Jones

I hate it when they look at me that way. -- Calvin

James Kuyper

unread,

Mar 17, 2011, 8:47:15 AM3/17/11

to

On 03/17/2011 05:43 AM, lawrenc...@siemens.com wrote:
> James Kuyper<james...@verizon.net> wrote:
>>
>> Also, that footnote is not normative, and while it is "meant to imply
>> ...", it does not actually imply it. Two types, with exactly the same
>> representation and alignment, could fail to be interchangeable for those
>> purposes if an implementation choses to treat them differently, for
>> instance by passing them in function calls by different mechanisms. The
>> fact that they are not compatible types is sufficient to allow such a
>> decision.
>
> But the fact that they are there is sufficient to indicate that such a
> decision is exceedingly unwise.

I did not mean to suggest that it would be a good idea to implement
function calls that way. I just object to the wording of the standard,
and I'm bringing up the possibility of such an implementation as a way
of demonstrating what's wrong with that wording.

Interchangeability should either be required or at least recommended; it
should not be incorrectly identified as a implication that can be
derived solely from the requirement of "same representation and
alignment". Saying "is meant to imply" rather than "implies" is simply
weasel-wording; it should have no place in an official standard: if the
committee really feels that the implication is valid, it should say so
explicitly.

--
James Kuyper

Spiros Bousbouras

unread,

Mar 17, 2011, 10:06:27 AM3/17/11

to

The problem with this is that if there is an error reading from the
file then fgetc(f) may keep returning EOF in which case you'll have
an infinite loop. The way I plan to do it from now on is

while (1) {
c = fgetc(f) ;
if (ferror(f)) // Handle the error
if (feof(f)) // Exit the loop possibly after clean-up
}

Even visually I like it better like this than burdening the condition
of the while .

Spiros Bousbouras

unread,

Mar 17, 2011, 10:13:31 AM3/17/11

to

On Wed, 16 Mar 2011 17:06:49 -0700 (PDT)
Peter Nilsson <ai...@acay.com.au> wrote:
> Joe Wright <joewwri...@comcast.net> wrote:
>

> > I'm not really looking for them. I'm curious why the designer of
> > an ALU would choose other than twos-complement.
>
> I've read they tend to be simpler to manufacture. Note there are many
> ones' complement checksum routines. Also, signal processing tends to
> be simpler if you don't have to work with a biased range, i.e. -127
> to 127 is simpler to work with than -128 to 127.

Two's complement is consistent with symmetrical range according to the
standard.

Spiros Bousbouras

unread,

Mar 17, 2011, 10:36:22 AM3/17/11

to

On Wed, 16 Mar 2011 16:28:42 -0700

Possibly but not certainly either now or in the future. But even for
reasons of style I have come now to consider using feof() preferable.

[...]

> >> (Actually, checking that sizeof(int) > 1 would be better, since
> >> the usual EOF check works just fine on a system with 16-bit char
> >> and 32-bit int, but that's a little harder to check at compile time.)
> >
> > Personally I think it's more straightforward to use feof() and ferror()
> > which is what I'll be doing from now on.
>
> Suppose your code is running on a system with 16-bit char and 16-bit
> int, and it reads a byte with the value 0xffff, which yields -1 when
> converted to int (note that even this is implementation-defined), which
> happens to be the value of EOF. Since you've added extra checks, you
> notice that feof() and ferror() both returned 0, meaning the -1 is a
> value read from the file. How sure are you that you can handle that
> value correctly?

That depends on what the code is doing. For a lot code you wouldn't
need to handle specially EOF. Say you write code which reads lines of
input and only prints those which match some string. Then you just read
lines one by one and pass them to strcmp() .It makes no difference
whether EOF can be one of the characters in the line. But using feof()
and ferror() guarantees that you won't think the input finished before
it actually did.

Off the top of my head I can't think of examples where you would need
to do something special for EOF.

> I'm not necessarily saying you can't, but I can
> imagine that there might be some subtle problems.

For example ?

> More to the point, how are you going to be able to test your code?
> Unless you have access to an exotic system, or unless you replace
> fgetc() with your own version, the code that handles this case will
> never be executed.

You can trigger the condition by appropriately modifying the source.
The real problem is that if you only have access to systems where EOF
can never be a valid char value then the executed code can't simulate
the exotic systems where EOF may be a valid value for char.

I cannot offer a general methodology for checking the code. But I don't
think it's a problem. If I had to deal with a specific piece of code
rather than talking abstractly I could probably come up with a way to
simulate the situation even on non-exotic hardware. But more
importantly , I think it's very unlikely that EOF will ever have to be
handled specially.

--
This film sucked the big black wazoo!
amazon reader review

Tim Rentsch

unread,

Mar 17, 2011, 6:13:04 PM3/17/11

to

Keith Thompson <ks...@mib.org> writes:

The reason I say it's surprising is that the most natural
hardware implementation will yield negative zero in both
cases (ie, whether a constant zero or an expression with
the value (positive) zero is used).

> Of course, one could argue about whether that's desirable. In any case,
> having *some* rules that let you avoid spurious negative zeros in
> general calculations seems like a good idea.

As long as the integer constant 0 never produces a
negative zero, it's easy to avoid them, eg,

#define AVOID_NEGATIVE_ZERO(x) ((x) ? (x) : 0)

> (Using two's-complement seems like an even better idea.)

Maybe so but it's not really a practical option if you're writing
a C compiler for a machine that doesn't use two's complement in
its hardware.

Tim Rentsch

unread,

Mar 17, 2011, 6:16:21 PM3/17/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

> requirements" were transitive *and* symmetric [snip rest]

The relation "same representation and alignment requirements"
is reflexive, symmetric, and transitive. Probably you are
confusing it with the 'compatible' relation.

Tim Rentsch

unread,

Mar 17, 2011, 6:21:51 PM3/17/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

I guess you missed the point of the comment. What I was trying
to say (perhaps too subtly) is that asking a question about what
would happen in the hypothetical "Alternative C" is kind of a
dumb question, because since you made it up only you can answer
the question, or for that matter care about what the answer is.

Tim Rentsch

unread,

Mar 17, 2011, 6:24:45 PM3/17/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

It can't unless INT_MAX < UCHAR_MAX, in which case it's
obvious that it can because of how unsigned-to-signed
conversions work.

Tim Rentsch

unread,

Mar 17, 2011, 6:33:30 PM3/17/11

to

Keith Thompson <ks...@mib.org> writes:

> Spiros Bousbouras <spi...@gmail.com> writes:
[snip]

>> So then this means that the common idiom
>> int a;
>> ...
>> while ( (a = fgetc(f)) != EOF ) ...
>>
>> is actually wrong ! (Unless someone has already checked that
>> sizeof(int) != 1 but I don't imagine you'll see a lot of code which
>> does that.) Yikes , I can't believe I've been doing it wrong all
>> those years. Has anyone seen a book which mentions the issue ?
>
> Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
> rather than 100% portable. It works just fine *unless* sizeof(int) == 1,
> which implies CHAR_BIT >= 16.
>
> As far as I know, all existing hosted C implementations have
> CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
> aren't even required to support stdio).
>
> If I were worried about the possibility, rather than adding calls
> to feof() and ferror(), I'd probably add
> #if CHAR_BIT != 8
> #error "CHAR_BIT != 8"
> #endif
> And if I ever see that error message, it almost certainly means
> that I forgot to add the "#include <limits.h>"
>
> (Actually, checking that sizeof(int) > 1 would be better, since
> the usual EOF check works just fine on a system with 16-bit char
> and 32-bit int, but that's a little harder to check at compile time.)

It's easy if a better test is used. The test against EOF is
guaranteed to work if

UCHAR_MAX > 0 && UCHAR_MAX <= INT_MAX

Keith Thompson

unread,

Mar 17, 2011, 6:47:08 PM3/17/11

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:
[...]

>> Of course, one could argue about whether that's desirable. In any case,
>> having *some* rules that let you avoid spurious negative zeros in
>> general calculations seems like a good idea.
>
> As long as the integer constant 0 never produces a
> negative zero, it's easy to avoid them, eg,
>
> #define AVOID_NEGATIVE_ZERO(x) ((x) ? (x) : 0)

Sure, it's easy to do that by adding an invocation of
AVOID_NEGATIVE_ZERO() to every expression where you want to avoid
a negative zero result -- something that is of no benefit unless
your code is actually running on a system that has negative zeros.

>> (Using two's-complement seems like an even better idea.)
>
> Maybe so but it's not really a practical option if you're writing
> a C compiler for a machine that doesn't use two's complement in
> its hardware.

True. Think of it as advice for hardware designers. (Not that
there's any particular reason they should listen to me.)

Keith Thompson

unread,

Mar 17, 2011, 6:49:27 PM3/17/11

to

That's good!

Is the "UCHAR_MAX > 0" test intended to catch forgetting the
#include <limits.h>?

Spiros Bousbouras

unread,

Mar 18, 2011, 4:01:58 PM3/18/11

to

On Thu, 17 Mar 2011 15:21:51 -0700

No , I didn't miss the point. My post you are quoting addresses this
very point by explaining how someone other than myself can answer
questions about this alternative C. Here is a more detailed
explanation: the way they could do that is by taking the current
standard , making the modification I suggested and then they would have
a description of how this alternative C is supposed to work. By
consulting this description they can answer questions about the
language. In fact , at least 1 person in the thread did address my
question which means they cared somewhat about the answer.

Beyond that , contemplating functionality for C different that what the
standard specifies is a fairly common practice both here and on
comp.std.c .For example the thread "All inclusive header files?"
contains discussion of how C would/should behave if it were redesigned
from scratch. So I don't understand why you find my own contemplation
so problematic.

Spiros Bousbouras

unread,

Mar 18, 2011, 4:10:34 PM3/18/11

to

On Thu, 17 Mar 2011 15:16:21 -0700
Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
> The relation "same representation and alignment requirements"
> is reflexive, symmetric, and transitive. Probably you are
> confusing it with the 'compatible' relation.

Not at all as
<KTwep.103188$T_2....@newsfe06.ams2>
http://groups.google.com/group/comp.lang.c/msg/12f4d3ff0e739fdf?dmode=source
clearly demonstrates. But if it is indeed symmetric and transitive
then it follows that using a function argument of unsigned char where
the function prototype says char should be ok. But the compatibility
rules say it's not ok and that's why I believe that the standard is
misleading on this point.

Spiros Bousbouras

unread,

Mar 18, 2011, 4:25:24 PM3/18/11

to

But 6.2.6.2 p3 uses the word "only" and does not list reading from a
file as a way to generate a negative zero.

Tim Rentsch

unread,

Mar 18, 2011, 6:54:37 PM3/18/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

You are confused. The relation "has the same representation and
alignment requirements" is indeed reflexive, symmetric, and
transitive. The comment about function argument/parameter types of
unsigned char and char is irrelevant, because compatibility is not
based (just) on whether two types have the same representation and
alignment requirements.

Tim Rentsch

unread,

Mar 18, 2011, 6:57:27 PM3/18/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

That's too bad, I expect you'll continue to be mystified.

Tim Rentsch

unread,

Mar 18, 2011, 7:05:40 PM3/18/11

to

Spiros Bousbouras <spi...@gmail.com> writes:

That is a true statement. Despite that, it is still obvious that the
Standard allows getc() to return a negative zero, based on the
description of fgetc() and on the rule for how unsigned-to-signed
conversion works.

Tim Rentsch

unread,

Mar 18, 2011, 7:19:23 PM3/18/11

to

Keith Thompson <ks...@mib.org> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Keith Thompson <ks...@mib.org> writes:
> [...]
>>> Of course, one could argue about whether that's desirable. In any case,
>>> having *some* rules that let you avoid spurious negative zeros in
>>> general calculations seems like a good idea.
>>
>> As long as the integer constant 0 never produces a
>> negative zero, it's easy to avoid them, eg,
>>
>> #define AVOID_NEGATIVE_ZERO(x) ((x) ? (x) : 0)
>
> Sure, it's easy to do that by adding an invocation of
> AVOID_NEGATIVE_ZERO() to every expression where you want to avoid
> a negative zero result -- something that is of no benefit unless
> your code is actually running on a system that has negative zeros.

Yes, perhaps my comment was a little bit too glib. Still,
I can't help feeling that the Standard imposes unnecessarily
severe restrictions on implementations, especially since the
restrictions are relevant only in rare circumstances, and
don't absolutely have to be there at all since it would be
easy to deal with negative zeros even in their absence.

>>> (Using two's-complement seems like an even better idea.)
>>
>> Maybe so but it's not really a practical option if you're writing
>> a C compiler for a machine that doesn't use two's complement in
>> its hardware.
>
> True. Think of it as advice for hardware designers. (Not that
> there's any particular reason they should listen to me.)

I agree with the advice, just think it's unlikely to be
efficacious to address comments to hardware designers in
this newsgroup.

Tim Rentsch

unread,

Mar 18, 2011, 7:20:45 PM3/18/11

to

Keith Thompson <ks...@mib.org> writes:

Yes, indeed so.

James Kuyper

unread,

Mar 19, 2011, 9:37:18 AM3/19/11

to

On 03/18/2011 04:10 PM, Spiros Bousbouras wrote:
> On Thu, 17 Mar 2011 15:16:21 -0700
> Tim Rentsch<t...@alumni.caltech.edu> wrote:
>>
>> The relation "same representation and alignment requirements"
>> is reflexive, symmetric, and transitive. Probably you are
>> confusing it with the 'compatible' relation.
>
> Not at all as
> <KTwep.103188$T_2....@newsfe06.ams2>
> http://groups.google.com/group/comp.lang.c/msg/12f4d3ff0e739fdf?dmode=source

Huh? It's precisely that message that left me, too, with the impression
that you were confusing those two things.

> clearly demonstrates. But if it is indeed symmetric and transitive
> then it follows that using a function argument of unsigned char where
> the function prototype says char should be ok.

I don't see how you reach that conclusion. Could you explain in detail
how you apply symmetry and transitivity to reach that conclusion - and
you must do so without confusing "same representation and alignment"
with compatibility. I don't even see how you reach that conclusion by
assuming that they are the same thing, or at least that the "same
representation and alignment" implies "compatible"; I certainly don't
see how you could reach it without making such an incorrect assumption.

> ... But the compatibility

> rules say it's not ok and that's why I believe that the standard is
> misleading on this point.

--
James Kuyper

James Kuyper

unread,

Mar 19, 2011, 9:58:58 AM3/19/11

to

On 03/18/2011 04:01 PM, Spiros Bousbouras wrote:
> On Thu, 17 Mar 2011 15:21:51 -0700
> Tim Rentsch<t...@alumni.caltech.edu> wrote:
>> Spiros Bousbouras<spi...@gmail.com> writes:
>>
>>> On Sat, 12 Mar 2011 01:49:30 -0800
>>> Tim Rentsch<t...@alumni.caltech.edu> wrote:
>>>> Spiros Bousbouras<spi...@gmail.com> writes:

...

>>>>> When I said "getc() read int's from files" I meant that also fgetc()
>>>>> reads int's from files i.e. we're talking about an alternative C where
>>>>> we don't have the intermediate unsigned char step.

...

>> I guess you missed the point of the comment. What I was trying
>> to say (perhaps too subtly) is that asking a question about what
>> would happen in the hypothetical "Alternative C" is kind of a
>> dumb question, because since you made it up only you can answer
>> the question, or for that matter care about what the answer is.
>
> No , I didn't miss the point. My post you are quoting addresses this
> very point by explaining how someone other than myself can answer
> questions about this alternative C. Here is a more detailed
> explanation: the way they could do that is by taking the current
> standard , making the modification I suggested and then they would have
> a description of how this alternative C is supposed to work. By
> consulting this description they can answer questions about the
> language. In fact , at least 1 person in the thread did address my
> question which means they cared somewhat about the answer.

When I combine that explanation with the definitions of fread() and
fwrite(), I do get answers, but I doubt that they're the ones you
intended (if you even gave thought to that issue). The simplest fix
would be to convert references to unsigned char into 'int' in those
definitions, as well. However, that would prohibit the reading or
writing of objects with a size that is not a multiple of sizeof(int).
There's an old saying "you can never change just one thing", and this is
prime example. You need to add considerable detail to your hypothetical
alternative C before it will be sufficiently well defined to permit
asking such questions.
--
James Kuyper

Spiros Bousbouras

unread,

Mar 19, 2011, 11:59:37 AM3/19/11

to

On Sat, 19 Mar 2011 09:37:18 -0400
James Kuyper <james...@verizon.net> wrote:
> On 03/18/2011 04:10 PM, Spiros Bousbouras wrote:
> > On Thu, 17 Mar 2011 15:16:21 -0700
> > Tim Rentsch<t...@alumni.caltech.edu> wrote:
> >>
> >> The relation "same representation and alignment requirements"
> >> is reflexive, symmetric, and transitive. Probably you are
> >> confusing it with the 'compatible' relation.
> >
> > Not at all as
> > <KTwep.103188$T_2....@newsfe06.ams2>
> > http://groups.google.com/group/comp.lang.c/msg/12f4d3ff0e739fdf?dmode=source
>
> Huh? It's precisely that message that left me, too, with the impression
> that you were confusing those two things.
>
> > clearly demonstrates. But if it is indeed symmetric and transitive
> > then it follows that using a function argument of unsigned char where
> > the function prototype says char should be ok.

Above I should have said "unsigned char*" and "char*". I hope it wasn't
this omission which caused the confusion. If you thought that I did
really mean "unsigned char" and "char" then ignore what follows and
apologies for my mistake.

> I don't see how you reach that conclusion. Could you explain in detail
> how you apply symmetry and transitivity to reach that conclusion - and
> you must do so without confusing "same representation and alignment"
> with compatibility.

From 6.2.5 p27

A pointer to void shall have the same representation and

alignment requirements as a pointer to a character type.39)

And footnote 39 says

The same representation and alignment requirements are

meant to imply interchangeability as arguments to functions,
return values from functions, and members of unions.

In what follows SRAR(T1 , T2) will mean that type T1 has the same
representation and alignment requirements as type T2. The syllogism
goes as follows:
1) SRAR(void* , unsigned char*) from 6.2.5 p27
2) Using symmetry and 1) we get that SRAR(unsigned char* , void*)
3) SRAR(void* , char*) from 6.2.5 p27
4) Using 2) , 3) and transitivity we get that
SRAR(unsigned char* , char*)
5) Using footnote 39 and 4) we get that unsigned char* and char* are
interchangeable as arguments to functions. Therefore wherever you
can use unsigned char* as an argument to a function you can also
use a char* and vice versa regardless of the existence of function
prototypes or anything else.

I realise that footnotes are not normative but this particular footnote
is misleading which is what I'm complaining about. If footnote 39 were
extended to say "...provided no compatibility rules are violated" I
wouldn't be complaining.

--
Opinions are the author's and are not necessarily shared by the
University, but they should be.
Bob Park

Spiros Bousbouras

unread,

Mar 19, 2011, 12:07:13 PM3/19/11

to

On Sat, 19 Mar 2011 09:58:58 -0400
James Kuyper <james...@verizon.net> wrote:
> On 03/18/2011 04:01 PM, Spiros Bousbouras wrote:
> > No , I didn't miss the point. My post you are quoting addresses this
> > very point by explaining how someone other than myself can answer
> > questions about this alternative C. Here is a more detailed
> > explanation: the way they could do that is by taking the current
> > standard , making the modification I suggested and then they would have
> > a description of how this alternative C is supposed to work. By
> > consulting this description they can answer questions about the
> > language. In fact , at least 1 person in the thread did address my
> > question which means they cared somewhat about the answer.
>
> When I combine that explanation with the definitions of fread() and
> fwrite(), I do get answers, but I doubt that they're the ones you
> intended (if you even gave thought to that issue).

I don't think I did. But I have already said that my excursion into an
alternative C was misguided so there's no point spending any more time
on this.

James Kuyper

unread,

Mar 19, 2011, 1:39:46 PM3/19/11

to

On 03/19/2011 11:59 AM, Spiros Bousbouras wrote:
> On Sat, 19 Mar 2011 09:37:18 -0400
> James Kuyper<james...@verizon.net> wrote:
>> On 03/18/2011 04:10 PM, Spiros Bousbouras wrote:

...

>>> clearly demonstrates. But if it is indeed symmetric and transitive
>>> then it follows that using a function argument of unsigned char where
>>> the function prototype says char should be ok.
>
> Above I should have said "unsigned char*" and "char*". I hope it wasn't
> this omission which caused the confusion. If you thought that I did
> really mean "unsigned char" and "char" then ignore what follows and
> apologies for my mistake.

I did think you meant unsigned char and char; but since unsigned char*
and char* are also incompatible, I still disagree with your conclusion.

>> I don't see how you reach that conclusion. Could you explain in detail
>> how you apply symmetry and transitivity to reach that conclusion - and
>> you must do so without confusing "same representation and alignment"
>> with compatibility.
>
> From 6.2.5 p27
>
> A pointer to void shall have the same representation and
> alignment requirements as a pointer to a character type.39)
>
> And footnote 39 says
>
> The same representation and alignment requirements are
> meant to imply interchangeability as arguments to functions,
> return values from functions, and members of unions.
>
> In what follows SRAR(T1 , T2) will mean that type T1 has the same
> representation and alignment requirements as type T2. The syllogism
> goes as follows:
> 1) SRAR(void* , unsigned char*) from 6.2.5 p27
> 2) Using symmetry and 1) we get that SRAR(unsigned char* , void*)
> 3) SRAR(void* , char*) from 6.2.5 p27
> 4) Using 2) , 3) and transitivity we get that
> SRAR(unsigned char* , char*)
> 5) Using footnote 39 and 4) we get that unsigned char* and char* are
> interchangeable as arguments to functions.

That's where you went wrong, which is pretty much as I had expected.
The footnote about interchangeability does not impose any actual
requirement that they be interchangeable. The only actual requirement is
SRAR(unsigned char*, char*). The footnote is a non-normative expression
of the belief that this requirement implies interchangeability. It is
perfectly feasible for types with the same representation and alignment
requirements to not be interchangeable, which means that belief is
incorrect. An implementation where they are not, in fact,
interchangeable, could be fully conforming.

> I realise that footnotes are not normative but this particular footnote

> is misleading which is what I'm complaining about. ...

I do agree that it's misleading, and should be changed. My preference
would be to drop the weasel-wording about "is intended to", and change
it into normative text mandating interchangeability, overriding any
contrary conclusions that might otherwise be reached by applying the
compatibility rules.

--
James Kuyper

Spiros Bousbouras

unread,

Mar 20, 2011, 3:56:27 PM3/20/11

to

On Sat, 19 Mar 2011 13:39:46 -0400

Ok but I can't imagine how you concluded from that that I confused SRAR
with compatibility.

> It is
> perfectly feasible for types with the same representation and alignment
> requirements to not be interchangeable, which means that belief is
> incorrect.

Couldn't we instead resolve the contradiction by saying that the belief
is correct but SRAR is not transitive ? What would break with such an
approach ? I'm thinking that the function

int foo(void) {
char c = 0 , *p = &c ;
unsigned char *up = (unsigned char *)p ;
signed char *sp = (signed char *)up ;

return p == (char *)up && p == (char *)sp &&
up == (unsigned char *)sp &&
sp == (signed char *)p ;
}

would no longer be guaranteed to return 1 and that's a problem.

> An implementation where they are not, in fact,
> interchangeable, could be fully conforming.
>
> > I realise that footnotes are not normative but this particular footnote
> > is misleading which is what I'm complaining about. ...
>
> I do agree that it's misleading, and should be changed. My preference
> would be to drop the weasel-wording about "is intended to", and change
> it into normative text mandating interchangeability, overriding any
> contrary conclusions that might otherwise be reached by applying the
> compatibility rules.

I'm not sure I follow. Are you saying that the standard should mandate
that unsigned char* and char* should be interchangeable ? (Or at least
that would be a consequence of what you're suggesting.)

--
This is an instance of a general theorem: for every correct
proof, there exists an incorrect proof that looks the same
to the mathematically incompetent.
Daryl McCullough

James Kuyper

unread,

Mar 20, 2011, 6:44:04 PM3/20/11

to

On 03/20/2011 03:56 PM, Spiros Bousbouras wrote:
> On Sat, 19 Mar 2011 13:39:46 -0400
> James Kuyper<james...@verizon.net> wrote:
>> On 03/19/2011 11:59 AM, Spiros Bousbouras wrote:

...

>>> In what follows SRAR(T1 , T2) will mean that type T1 has the same
>>> representation and alignment requirements as type T2. The syllogism
>>> goes as follows:
>>> 1) SRAR(void* , unsigned char*) from 6.2.5 p27
>>> 2) Using symmetry and 1) we get that SRAR(unsigned char* , void*)
>>> 3) SRAR(void* , char*) from 6.2.5 p27
>>> 4) Using 2) , 3) and transitivity we get that
>>> SRAR(unsigned char* , char*)
>>> 5) Using footnote 39 and 4) we get that unsigned char* and char* are
>>> interchangeable as arguments to functions.
>>
>> That's where you went wrong, which is pretty much as I had expected.
>> The footnote about interchangeability does not impose any actual
>> requirement that they be interchangeable. The only actual requirement is
>> SRAR(unsigned char*, char*). The footnote is a non-normative expression
>> of the belief that this requirement implies interchangeability.
>
> Ok but I can't imagine how you concluded from that that I confused SRAR
> with compatibility.

I didn't know which mistake you made; I couldn't put together any valid
argument for your conclusion; I made a guess as to which mistake you
might have made to reach that conclusion.

>> It is
>> perfectly feasible for types with the same representation and alignment
>> requirements to not be interchangeable, which means that belief is
>> incorrect.
>
> Couldn't we instead resolve the contradiction by saying that the belief
> is correct but SRAR is not transitive ?

There are three problems with that approach:
a) it's relatively straightforward to demonstrate that the belief is
incorrect (for example, the two types could be passed as inputs or
return values from a function using different registers), so assuming
that it's correct would be an error.
b) SRAR can trivially be proved to be transitive, simply by the
definition of the word "same", so assuming that it's not transitive
would be a second error.
c) There is no contradiction, once you understand the issues correctly.
Therefore, using the supposed existence of a contradiction to justify
believing two false things would be a third error.

> ... What would break with such an
> approach ?

Everything. If you start with two contradictory premises (for example,
that "same" has it's ordinary English meaning, and that SRAR is not
transitive), it's possible, through perfectly valid logical operations,
to prove as a conclusion that ANY statement you wish is true - or false
- your choice. That would render the entire C standard pointless.

> I'm thinking that the function
>
> int foo(void) {
> char c = 0 , *p =&c ;
> unsigned char *up = (unsigned char *)p ;
> signed char *sp = (signed char *)up ;
>
> return p == (char *)up&& p == (char *)sp&&
> up == (unsigned char *)sp&&
> sp == (signed char *)p ;
> }
>
> would no longer be guaranteed to return 1 and that's a problem.

It would be guaranteed to return 1. It would also be guaranteed to not
return 1. It could also be proven that the result is
implementation-defined. Or, that it's not implementation-defined. I hope
you can see why this would be a bit of a problem.

>>> I realise that footnotes are not normative but this particular footnote
>>> is misleading which is what I'm complaining about. ...
>>
>> I do agree that it's misleading, and should be changed. My preference
>> would be to drop the weasel-wording about "is intended to", and change
>> it into normative text mandating interchangeability, overriding any
>> contrary conclusions that might otherwise be reached by applying the
>> compatibility rules.
>
> I'm not sure I follow. Are you saying that the standard should mandate
> that unsigned char* and char* should be interchangeable ? (Or at least
> that would be a consequence of what you're suggesting.)

That's one possible approach: the standard could be modified to state
that interchangeable types are guaranteed compatible with each other.
Another approach would be to modify every constraint that currently
requires compatibility to allow either compatibility or interchangeability.
--
James Kuyper

Spiros Bousbouras

unread,

Mar 21, 2011, 11:35:23 AM3/21/11

to

On Sun, 20 Mar 2011 18:44:04 -0400

James Kuyper <james...@verizon.net> wrote:
> On 03/20/2011 03:56 PM, Spiros Bousbouras wrote:
> > On Sat, 19 Mar 2011 13:39:46 -0400
> > James Kuyper<james...@verizon.net> wrote:
> >> On 03/19/2011 11:59 AM, Spiros Bousbouras wrote:
> ...
> >>> In what follows SRAR(T1 , T2) will mean that type T1 has the same
> >>> representation and alignment requirements as type T2. The syllogism
> >>> goes as follows:
> >>> 1) SRAR(void* , unsigned char*) from 6.2.5 p27
> >>> 2) Using symmetry and 1) we get that SRAR(unsigned char* , void*)
> >>> 3) SRAR(void* , char*) from 6.2.5 p27
> >>> 4) Using 2) , 3) and transitivity we get that
> >>> SRAR(unsigned char* , char*)
> >>> 5) Using footnote 39 and 4) we get that unsigned char* and char* are
> >>> interchangeable as arguments to functions.
> >>
> >> That's where you went wrong, which is pretty much as I had expected.
> >> The footnote about interchangeability does not impose any actual
> >> requirement that they be interchangeable. The only actual requirement is
> >> SRAR(unsigned char*, char*). The footnote is a non-normative expression
> >> of the belief that this requirement implies interchangeability.

[...]

> >> It is
> >> perfectly feasible for types with the same representation and alignment
> >> requirements to not be interchangeable, which means that belief is
> >> incorrect.
> >
> > Couldn't we instead resolve the contradiction by saying that the belief
> > is correct but SRAR is not transitive ?
>
> There are three problems with that approach:
> a) it's relatively straightforward to demonstrate that the belief is
> incorrect (for example, the two types could be passed as inputs or
> return values from a function using different registers), so assuming
> that it's correct would be an error.

I don't see what that has to do with anything. Even types which are
compatible and therefore interchangeable can be passed using different
registers.

> b) SRAR can trivially be proved to be transitive, simply by the
> definition of the word "same", so assuming that it's not transitive
> would be a second error.

The standard doesn't define "same" and it wouldn't be the first time
the standard uses a word with a different meaning than the established
one. For example "byte" usually means a quantity of 8 bits but that's
not how the standard defines it. It also uses "overflow" in a
specialised manner , the computing common sense meaning of the word is
simply that a quantity exceeds some bounds.

> c) There is no contradiction, once you understand the issues correctly.
> Therefore, using the supposed existence of a contradiction to justify
> believing two false things would be a third error.

If the belief is incorrect then there is a contradiction between the
belief and reality.

> > ... What would break with such an
> > approach ?
>
> Everything. If you start with two contradictory premises (for example,
> that "same" has it's ordinary English meaning, and that SRAR is not
> transitive), it's possible, through perfectly valid logical operations,
> to prove as a conclusion that ANY statement you wish is true - or false
> - your choice. That would render the entire C standard pointless.

If one assumes that "same" has its ordinary English meaning and if one
used formal logic to derive consequences from the standard (which would
be rather hard to do since the standard is not written in a formal
language) and if one chose a logic which contains the rule that from a
contradiction you can derive anything then one could indeed conclude
that any statement is both true and false. But since people use
informal reasoning to draw conclusions from the standard I don't think
that your excursion into logic is relevant to the discussion.

I note also that if the standard is found to have a contradiction it
will not be rendered pointless just less useful. Even mathematics for
parts of its history had contradictions and people knew it but it
didn't make it pointless.

So the question is , does the assumption that SRAR is transitive
provide guarantees useful for C programming which one wouldn't have
without this assumption ? I attempted an example with foo() below where
I think it is useful to know that it will always return 1.

> > I'm thinking that the function
> >
> > int foo(void) {
> > char c = 0 , *p =&c ;
> > unsigned char *up = (unsigned char *)p ;
> > signed char *sp = (signed char *)up ;
> >
> > return p == (char *)up&& p == (char *)sp&&
> > up == (unsigned char *)sp&&
> > sp == (signed char *)p ;
> > }
> >
> > would no longer be guaranteed to return 1 and that's a problem.
>
> It would be guaranteed to return 1. It would also be guaranteed to not
> return 1. It could also be proven that the result is
> implementation-defined. Or, that it's not implementation-defined. I hope
> you can see why this would be a bit of a problem.

It would be a problem but it doesn't follow.

> >>> I realise that footnotes are not normative but this particular footnote
> >>> is misleading which is what I'm complaining about. ...
> >>
> >> I do agree that it's misleading, and should be changed. My preference
> >> would be to drop the weasel-wording about "is intended to", and change
> >> it into normative text mandating interchangeability, overriding any
> >> contrary conclusions that might otherwise be reached by applying the
> >> compatibility rules.
> >
> > I'm not sure I follow. Are you saying that the standard should mandate
> > that unsigned char* and char* should be interchangeable ? (Or at least
> > that would be a consequence of what you're suggesting.)
>
> That's one possible approach: the standard could be modified to state
> that interchangeable types are guaranteed compatible with each other.
> Another approach would be to modify every constraint that currently
> requires compatibility to allow either compatibility or interchangeability.

But with such a modification a compiler would not have to warn you if
you did for example strcmp(p1 , p2) and p1 has type unsigned char* .
Don't you think this might create problems ? Personally I don't have a
problem with the constraints in this area but I would want at the end
of footnote 39 the addition "provided no compatibility rules are
violated".

--
I dream of a galaxy where your eyes are the stars and the universe
worships the night.
The Dauphne

Keith Thompson

unread,

Mar 21, 2011, 1:04:13 PM3/21/11

to

Spiros Bousbouras <spi...@gmail.com> writes:
> On Sun, 20 Mar 2011 18:44:04 -0400
> James Kuyper <james...@verizon.net> wrote:

[...]

>> b) SRAR can trivially be proved to be transitive, simply by the
>> definition of the word "same", so assuming that it's not transitive
>> would be a second error.
>
> The standard doesn't define "same" and it wouldn't be the first time
> the standard uses a word with a different meaning than the established
> one. For example "byte" usually means a quantity of 8 bits but that's
> not how the standard defines it. It also uses "overflow" in a
> specialised manner , the computing common sense meaning of the word is
> simply that a quantity exceeds some bounds.

The standard provides its own definitions for "byte" and "overflow".
If it *doesn't* provide a definition for a word, especially a
common English word with no special technical usage, that implies
that it's used in the common English sense.

[...]

pete

unread,

Mar 22, 2011, 7:00:48 AM3/22/11

to

Keith Thompson wrote:

> The standard provides its own definitions for "byte" and "overflow".
> If it *doesn't* provide a definition for a word, especially a
> common English word with no special technical usage, that implies
> that it's used in the common English sense.

Some technical words like "portability",
are defined in ISO/IEC 2382-1.

ISO/IEC 2382-1 then refers you to IS0 1087
for more definitions.

--
pete