Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

getc() != EOF

5 views
Skip to first unread message

FIRTH%...@cmu-cs-c.arpa

unread,
May 23, 1984, 6:18:01 PM5/23/84
to
In all conscience,

while ( (c = getc()) != EOF )

ought to work. If somebody is to be blamed, it is surely not the
people who wrote the code, but the people who made a C implementation
that broke it.

-------

g...@rlgvax.uucp

unread,
May 25, 1984, 9:39:06 PM5/25/84
to
> In all conscience,

Assuming you're referring to the case where "c" was declared as "char" and
it didn't work, the code was incorrect. "getc" is documented as returning
an "int". The reason is that it is desirable that it can return all possible
values that fit into a "char" (in the manual page it says "Getc returns the
next character (i.e., byte)), but if it returned a "char" there would be
no distinguished value which would indicate EOF. It "ought to work" only
if 1) it is defined not to work except on 7-bit ASCII text files (or, at least,
files not containing the character '\0', or '\377', or '\351', or whatever
your choice for EOF is) or 2) it is defined as returning an "int", so that
in addition to all possible one-byte values it can also return a distinguished
value for EOF. Consider this as possibly a weak vote for languages in
which a procedure (or expression; "getc" is a macro in UNIX) can return a
success/failure indication as well as a value.

Guy Harris
{seismo,ihnp4,allegra}!rlgvax!guy

Ed Nather

unread,
May 26, 1984, 10:35:34 PM5/26/84
to
[]

It will work if "c" is declared "int."
It will not work if "c" is declared "char."

Variable declarations are an essential part of the program, and should be
included in illustrative code fragments, so problems are not concealed.

Grumph.

--
Ed Nather
{allegra,ihnp4}!{ut-sally,noao}!utastro!nather
Astronomy Dept., U. of Texas, Austin

k...@turtlevax.uucp

unread,
May 29, 1984, 2:30:42 AM5/29/84
to
Beware that an alternative test for end-of-file doesn't seem to work on
4.2bsd like it did on 4.1 and before. I am referring to feof():

while (!feof(stdin)) putchar(getchar());

does not work. It seems that the EOF indicator does not come on until
the EOF marker has been read. Previous versions of the standard I/O
library set the EOF flag if the last character has been read and the
next one will be and EOF. TO run correctly on 4.2, one needs to do:

while ((i = getchar()) != EOF) putchar(i);

or

for (i = getchar(); !feof(stdin); i = getchar()) putchar(i);

--
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd70,decwrl,flairvax}!turtlevax!ken

Chris Maltby

unread,
May 30, 1984, 12:28:01 PM5/30/84
to
[]
> >
> >In all conscience,
> >
> > while ( (c = getc()) != EOF )
> >
> >ought to work. If somebody is to be blamed, it is surely not the
> >people who wrote the code, but the people who made a C implementation
> >that broke it.
>
> It will work if "c" is declared "int."
> It will not work if "c" is declared "char."
>
> Variable declarations are an essential part of the program, and should be
> included in illustrative code fragments, so problems are not concealed.
>
> Ed Nather

WRONG! The code above will work if c is int or char.
Char variables are promoted to int in expressions (see C manual)
and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
different (Any C implementors there? (kvm?)).

Chris Maltby
University of Sydney

ShanklandJA

unread,
May 30, 1984, 1:03:26 PM5/30/84
to
(sigh.)

> > > while ( (c = getc()) != EOF )
> > >
> > >ought to work. If somebody is to be blamed, it is surely not the
> > >people who wrote the code, but the people who made a C implementation
> > >that broke it.
> >
> > It will work if "c" is declared "int."
> > It will not work if "c" is declared "char."
> >
>

> WRONG! The code above will work if c is int or char.
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)).
>
> Chris Maltby
> University of Sydney

But it is not defined whether char is a signed of unsigned type in C.
On machines where char is unsigned, c will never have the value -1,
and the comparison with EOF will always fail.

All this is quite clearly described on page 40 of K&R.

Jim Shankland
..!ihnp4!druxy!opus

Andrew Koenig

unread,
May 30, 1984, 2:39:46 PM5/30/84
to

Ed Nather is right here: a char -1 is not identical to an int -1.
C isn't obligated to sign-extend characters when converting to ints,
although it is obligated to refrain from sign-extending unsigned
chars. Getc (and getchar) return ints, not chars, and the result
returned is always non-negative (except EOF), even on those
machines that sign-extend characters. If I write:

char c;

while ((c = getc (file)) != EOF) ...

I will lose on a machine that sign-extends chars as soon as I read
a char with all its bits turned on, but it will work OK if c is an
int.


--Andrew Koenig

Alan S. Driscoll

unread,
May 30, 1984, 3:05:43 PM5/30/84
to
> ... Char variables are promoted to int in expressions (see C manual)

> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different...

"Whether or not sign-extension occurs for characters is machine
dependent, but it is guaranteed that a member of the standard
character set is non-negative."

-- C Reference Manual, September 1980

--

Alan S. Driscoll
AT&T Bell Laboratories

pe...@smu.uucp

unread,
May 30, 1984, 3:32:00 PM5/30/84
to
#R:sri-arpa:-113800:smu:18600012:000:185
smu!pedz May 30 14:32:00 1984

Would it still work if c (the variable) was declared to be a
char instead of an int. It seems to me that there would be
a truncation/sign-extension problem (or could be).

Perry
15884

LaidigTL

unread,
May 30, 1984, 4:38:24 PM5/30/84
to
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)).
>
> Chris Maltby
> University of Sydney

Jim Shankland answered this adequately, but for those who feel that only
the C Reference Manual contains truth, see page 183 (section 6.1,
"Characters and integers"). Note that of the four machines they
describe, only one sign-extends (converts a char whose leftmost bit is
one to a negative int).

Tom Laidig
AT&T Information Systems Laboratories, Denver
...!ihnp4!druxm!toml

ch...@umcp-cs.uucp

unread,
May 31, 1984, 2:14:28 AM5/31/84
to
I beg to differ. K&R, p. 40:

``There is one subtle point about the conversion of characters
to integers. The language does not specify whether variables
of type {\tt char} are signed or unsigned quantities. When a
{\tt char} is converted to an {\tt int}, can it ever produce a
{\it negative} integer? Unfortunately, this varies from
machine to machine, reflecting differences in architecture.
One some machines ({\csc pdp-11}, for instance), a {\tt char}
whose leftmost bit is 1 will be converted to a negative
integer (``sign extension''). On others, a {\tt char} is
promoted to an {\tt int} by adding zeros at the left end, and
thus is always positive.''

(Now if you state that all {\it civlized} compilers default to signed
characters and allow {\tt unsigned char} datatypes, I will agree.)
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris@maryland

Guy Harris

unread,
May 31, 1984, 6:21:34 PM5/31/84
to
> (Now if you state that all {\it civlized} compilers default to signed
> characters and allow {\tt unsigned char} datatypes, I will agree.)

Well, on some machines supporting signed characters is painful; if the
machine's byte manipulation instructions don't extend the sign bit, a
program with "char" could involve more instructions than one involving
"unsigned char". (Always using "unsigned char" isn't a fix, either; on
some machines (like the PDP-11), "unsigned char" requires more code than
"char".) (Our machines all have signed characters, by the way.)

Guy Harris
{seismo,ihnp4,allegra}!rlgvax!guy

Guido van Rossum

unread,
Jun 1, 1984, 12:06:50 AM6/1/84
to
> while (!feof(stdin)) putchar(getchar());
>
>does not work. It seems that the EOF indicator does not come on until
>the EOF marker has been read. Previous versions of the standard I/O
>library set the EOF flag if the last character has been read and the
>next one will be and EOF.

How *could* this ever have worked under UNIX??? Remember that the input
can be a pipe. You only know there's no more data when a READ system
call returns <= 0.

--
Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
guido @ mcvax

Brandon Allbery

unread,
Jun 3, 1984, 9:10:35 PM6/3/84
to

The local "lint" tells me that ((ch = getc ()) != EOF) is illegal on
IBM-based Cs. This fits in with the (assumption) that an IBM/370 C would use
EBCDIC, NOT ASCII, and all 8 bits of the character data are significant, so
the ONLY way to trap an EOF would be the feof () function. (OK, for normal
text files, 0xff is not normally used, but they may have thought they were
stretching it. They don't use 0x0a either, usually, although I've never run
Unix on an IBM.)

--
--------------------------------------------------------------------------------

Brandon Allbery
decvax!cwruecmp!ncoast!bsafw
"...he himself being one universe's prime MCI MAIL: 161-7070
example of utter, rambunctious free will!" USMail (core dump):
6504 Chestnut Road
Independence, OH 44131

Jack Jansen

unread,
Jun 4, 1984, 5:45:39 AM6/4/84
to
With the PR1ME c-compiler, chars are unsigned, and they have
their parity bit on(!!). This means that
while( (c=getc())!= EOF)
doesn't work, since the (c=getc()) is not sign-extended to
an integer, but just zero padded. As soon as I found this
out I worked myself through the C manual, but this behavior
doesn't seem to violate the standard.....
Jack, {philabs|decvax}!mcvax!vu44!jack

Morris Keesan

unread,
Jun 4, 1984, 8:50:07 AM6/4/84
to
----------------------------

> > > while ( (c = getc()) != EOF )
> >

> > It will work if "c" is declared "int."
> > It will not work if "c" is declared "char."
> >

> > Ed Nather
>
> WRONG! The code above will work if c is int or char.
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)).
>
> Chris Maltby
> University of Sydney

1) When saying things like "See C manual", it would sure help if people would
give references -- preferably section numbers or page numbers.
2) The reference missing above is section 6.6, "Arithmetic conversions", on
page 184 of Kernighan and Ritchie:

A GREAT MANY operators cause conversions . . . called the "usual
arithmetic conversions."

First, any operands of type char . . . are converted to int.

(Emphasis mine -- there are some expressions where the usual arithmetic
conversions don't apply; above, they apply to !=, but NOT to = ).
3) From section 6.1 of the C manual (page 183 of K&R):

Whether or not sign-extension occurs for characters is machine

dependent . . . Of the machines treated by this manual, only the
PDP-11 sign-extends.

On many machines, char and unsigned char are equivalent. On these
machines, (char)-1 and (int)-1 are very different. C provides no
way to specify 'signed char' (a shortcoming of the language).
4) Even on machines that sign-extend characters, the above code is incorrect
if c is declared "char", because it will halt not only on EOF, but also on
(char)-1, which is a valid char.
--
Morris M. Keesan
{decvax,linus,wjh12,ima}!bbncca!keesan
keesan @ BBN-UNIX.ARPA

Ken Turkowski

unread,
Jun 4, 1984, 6:23:07 PM6/4/84
to
The number 0xff is a legal return value from getc(), and is different from -1.
Therefore, c in (c = getc(file)) should be int.

pa...@ism780.uucp

unread,
Jun 6, 1984, 12:09:17 AM6/6/84
to
#R:rlgvax:-194900:ism780:14400009:000:632
ism780!paul Jun 4 17:21:00 1984

[Nothing happens till it happens twice.]

All the comments I have seen here on

#define EOF (-1)
char c;


while ( (c = getc()) != EOF )

ignore one possibility:
if chars are signed and the file being read contains a byte equal to -1,
the loop will terminate BEFORE the end-of-file is reached! If, that is,
the compiler implements assignment expressions correctly. The VAX System III
compiler, for one, gets it wrong.

Paul Perkins
...{uscvax|ucla-vax|vortex}!ism780!paul
...decvax!yale-co!ima!ism780!paul
"Any opinions expressed in this message are not necessarily those of any
real person, organization, or computer."

Gordon Moffett

unread,
Jun 6, 1984, 10:48:00 PM6/6/84
to
From: bs...@ncoast.UUCP Brandon Allbery
Organization: North Coast XENIX, Cleveland


> The local "lint" tells me that ((ch = getc ()) != EOF) is illegal on
> IBM-based Cs. This fits in with the (assumption) that an IBM/370 C would use
> EBCDIC, NOT ASCII, and all 8 bits of the character data are significant, so

No, no, no! It has nothing to do with EBCDIC; the comparison fails
because EOF is explicitly OUTSIDE of the underlying character set,
WHATEVER IT HAPPENS TO BE. It is exactly (int)-1, and not (char)-1.

Amdahl's UTS (v7 and Sys V) runs on 370's and uses ascii anyway, but
the chars are unsigned, and that is why any comparison of a character to
-1 (EOF) is always false ... and that is where this discussion
started most recently (and I think its been beaten to death now ...).

gu...@mcvax.uucp

unread,
Jun 15, 1984, 1:38:16 AM6/15/84
to
[This discussion is getting silly. Trying to stamp out one more
misundelstanding...]

Someone suggests that a -1 byte in the file can be promoted to EOF.

No it can't, for the simple reason that getc() is defined as returning
an *int* in the range 0..255. My v7 manual doesn't state this, but just
have a short look at the definition of getc() in /usr/include/stdio.h!

Surely this was intended; the designers of the package were very well
aware of what they were doing: the do warn that the value EOF (-1) returned
by getw() [did you know that it existed?! ever used it?] can be a perfectly
valid integer.

Of course, when assigned to a signed char variable, the values in the range
128..255 become negative; but only then.

0 new messages