Checking for EOF

Marco Antoniotti

unread,

Nov 11, 2002, 11:51:54 AM11/11/02

to

Hi

I am writing a simple file/stream utility system that down in its guts
has to test for EOF on a stream. I noticed that we are lacking a

STREAM-END-P stream => T or NIL

in the CLHS. Whould something along this lines cut it?

(defun stream-end-p (stream)
(if (open-stream-p stream)
(null (peek-char nil stream nil nil))
t))

(Suggestions for improving the above welcome. I know that the above
may not cut it for binary streams to be read with READ-BYTE).

Please note that I'd like to avoid Gray streams.

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
715 Broadway 10th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.

Nils Goesche

unread,

Nov 11, 2002, 12:50:48 PM11/11/02

to

Marco Antoniotti <mar...@cs.nyu.edu> writes:

> I am writing a simple file/stream utility system that down in its guts
> has to test for EOF on a stream. I noticed that we are lacking a
>
> STREAM-END-P stream => T or NIL

How about LISTEN?

Regards,
--
Nils Gösche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0

Duane Rettig

unread,

Nov 11, 2002, 1:00:01 PM11/11/02

to

Marco Antoniotti <mar...@cs.nyu.edu> writes:

> Hi
>
> I am writing a simple file/stream utility system that down in its guts
> has to test for EOF on a stream. I noticed that we are lacking a
>
> STREAM-END-P stream => T or NIL
>
> in the CLHS. Whould something along this lines cut it?
>
> (defun stream-end-p (stream)
> (if (open-stream-p stream)
> (null (peek-char nil stream nil nil))
> t))
>
> (Suggestions for improving the above welcome. I know that the above
> may not cut it for binary streams to be read with READ-BYTE).

Besides the caveat you recognized (which could be avoided with
bivalent streams available in some gray implementations, as well
as in simple-streams by definition), there are two others I see
as well:

1. If the stream is not a "raw" stream wrt tty processing, it might
process a Ctrl-D as an EOF once, and then never again, unless the
stream implementation has allowed for the "unreading" of such a
"soft" eof.

2. If the stream is not a file or string stream, the peek-char
might hang indefinitely. Perhaps a combination of read-char-no-hang
and a conditoional unread-char would be better. An even cleaner
approach (but only available in gray or simple-streams, and not
allowable in portable CL code) is to use stream-listen (which tends
to return true for EOF conditions) before doing then peek-char.

> Please note that I'd like to avoid Gray streams.

Me too :-)

--
Duane Rettig du...@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182

Duane Rettig

unread,

Nov 11, 2002, 3:00:01 PM11/11/02

to

Nils Goesche <car...@cartan.de> writes:

> Marco Antoniotti <mar...@cs.nyu.edu> writes:
>
> > I am writing a simple file/stream utility system that down in its guts
> > has to test for EOF on a stream. I noticed that we are lacking a
> >
> > STREAM-END-P stream => T or NIL
>
> How about LISTEN?

Listen, in a socket or other non-file-like stream, can't distinguish
between the end-of-file situation and the data-not-yet-available
situation.

Marco Antoniotti

unread,

Nov 11, 2002, 3:40:16 PM11/11/02

to

Duane Rettig <du...@franz.com> writes:

> Marco Antoniotti <mar...@cs.nyu.edu> writes:
>
> > Hi
> >
> > I am writing a simple file/stream utility system that down in its guts
> > has to test for EOF on a stream. I noticed that we are lacking a
> >
> > STREAM-END-P stream => T or NIL
> >
> > in the CLHS. Whould something along this lines cut it?
> >
> > (defun stream-end-p (stream)
> > (if (open-stream-p stream)
> > (null (peek-char nil stream nil nil))
> > t))
> >
> > (Suggestions for improving the above welcome. I know that the above
> > may not cut it for binary streams to be read with READ-BYTE).
>
> Besides the caveat you recognized (which could be avoided with
> bivalent streams available in some gray implementations, as well
> as in simple-streams by definition), there are two others I see
> as well:
>
> 1. If the stream is not a "raw" stream wrt tty processing, it might
> process a Ctrl-D as an EOF once, and then never again, unless the
> stream implementation has allowed for the "unreading" of such a
> "soft" eof.

I think that is ok. I am willing to live with that. As a matter of
fact, the semantic I have in mind in this case is: once you have seen
a C-d, you have seen the EOF. Period.

> 2. If the stream is not a file or string stream, the peek-char
> might hang indefinitely.

Good point.

> Perhaps a combination of read-char-no-hang
> and a conditoional unread-char would be better.

Good point as well.

> An even cleaner
> approach (but only available in gray or simple-streams, and not
> allowable in portable CL code) is to use stream-listen (which tends
> to return true for EOF conditions) before doing then peek-char.

Looks like I'll have to bite the bullet.

>
> > Please note that I'd like to avoid Gray streams.
>
> Me too :-)

(with-grin-on "Why? :)")

Apart from the above suggestions, any other ideas for binary streams?

Erik Naggum

unread,

Nov 11, 2002, 4:10:55 PM11/11/02

to

* Duane Rettig <du...@franz.com>

| 1. If the stream is not a "raw" stream wrt tty processing, it might
| process a Ctrl-D as an EOF once, and then never again, unless the stream
| implementation has allowed for the "unreading" of such a "soft" eof.

While it may appear that Unix C-d means end of file, it actually means
"push", and acts exactly like a newline in line-oriented input mode,
except that it does not add itself to the end of the input. Many Unix
users are tremendously puzzled by this because they have learned that C-d
means end-of-file in Unix and therefore fail to understand why they need
/two/ C-d's when they want to end input at any other place than after a
newline character. E.g., if you want to stuff a some text into the X
selection, you must use

$ xsel -i
[whatever] <C-d> <C-d>

because the final newline that would be entered with C-j C-d is highly
undesirable. The first C-d pushes the input collected so far to the
reading process, so the `read´ system call returns "[whatever]" and 10.
The next `read´ system call returns "" and 0, and /this/ is the clue that
input has ended.

I have held many Unix courses over the years, and in every single one,
the majority of the attendants, even in courses for for experienced C
programmers and system admins, have believed C-d to be the Unix end of
file "character". I have found it very useful to teach people about line
input processing in general. Most Unix users are completely clueless
about this fundamental aspect of their interaction with the system.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Duane Rettig

unread,

Nov 11, 2002, 5:00:01 PM11/11/02

to

Marco Antoniotti <mar...@cs.nyu.edu> writes:

Bear in mind that terminal streams do not tend to work this way.
If you assume that the eof is "hard", then your program might assume
that however many times it calls stream-end-p on that stream, it will
return true (which is not the case unless you cache the fact and then
always return true). On the other hand, if you try typing into a
terminal that is so connected, you would not be able to process data
after the Ctrl-D, which is a normal action for a terminal. In other
words, on a terminal, an eof is just like another character, and should
be processed as if it were a character, including its un-read
characteristics.

> > 2. If the stream is not a file or string stream, the peek-char
> > might hang indefinitely.
>
> Good point.
>
> > Perhaps a combination of read-char-no-hang
> > and a conditoional unread-char would be better.
>
> Good point as well.
>
> > An even cleaner
> > approach (but only available in gray or simple-streams, and not
> > allowable in portable CL code) is to use stream-listen (which tends
> > to return true for EOF conditions) before doing then peek-char.
>
> Looks like I'll have to bite the bullet.
>
> >
> > > Please note that I'd like to avoid Gray streams.
> >
> > Me too :-)
>
> (with-grin-on "Why? :)")

Call me biased, but I tend to like simple-streams :-)

> Apart from the above suggestions, any other ideas for binary streams?

If you use simple-streams, bivalence comes for free, and you don't have
to worry about whether you are testing for characters or octets; the
same functionality tends to be used for both. Although if there is a
chance that your stream which has a multibyte external-format might
have half-a-character before the eof, then instead of testing for
characters, you might want to test for octet availability;
simple-streams offers a read-no-hang-p function to check _any_
avaiability of data on the stream, including just one octet, whereas
listen, stream-listen, and read-char-no-hang all try to build a
character, and might end up either giving you a false end-of-file
indication half-a-character early, or else never giving you an eof
(each being consequenses of not being able to complete that last
character).

Erik Naggum

unread,

Nov 11, 2002, 5:29:28 PM11/11/02

to

* Duane Rettig

| On the other hand, if you try typing into a terminal that is so
| connected, you would not be able to process data after the Ctrl-D, which
| is a normal action for a terminal.

Not so. When the C-d is not at the start of the input, it only forces
the terminal handler to send the data collected so far to the caller. If
you monitor a program that calls the `read´ system call until it returns
0 and look at what it actually does with a system-call tracer, you will
see that you can indeed continue to type after a C-d that does not follow
another C-d or C-j that pushed the line with the newline to the caller,
and if the caller waits for a newline before it terminates its loop of
`read´ calls, you should see two major effects for which this is expressly
employed: You can write longer lines than the hard line buffer limit that
is often only 256 characters, and you cannot edit the input line past the
C-d. Under any Linux system, the command `strace cat´ will do just fine.

| In other words, on a terminal, an eof is just like another character, and
| should be processed as if it were a character, including its un-read
| characteristics.

Not in line mode. You never see the C-d in cooked mode any more than you
see C-r or any of the other line-editing characters.

If you set a terminal line to raw mode so that you do see C-d and the
like, there /is/ no out-of-band end-of-file signal, and you have to settle
for an in-band signal, instead.

Duane Rettig

unread,

Nov 11, 2002, 6:00:01 PM11/11/02

to

Erik Naggum <er...@naggum.no> writes:

> * Duane Rettig <du...@franz.com>
> | 1. If the stream is not a "raw" stream wrt tty processing, it might
> | process a Ctrl-D as an EOF once, and then never again, unless the stream
> | implementation has allowed for the "unreading" of such a "soft" eof.
>
> While it may appear that Unix C-d means end of file,

Correct. C-d's ASCII name is "EOT", which means "End of Transmission",
but its interpretation depends on how it's cooked up by the terminal
device driver. And the standard behavior for cooked C-d behavior
is not something that is generally known thoroughly.

On the other hand, any Unix program can configure its terminal in "raw"
mode, and treat the C-d as whatever the program desires. So in fact,
C-d is in fact just a character with no meaning in raw mode; emacs
and vi are examples of usage of this.

[example elided]

> I have held many Unix courses over the years, and in every single one,
> the majority of the attendants, even in courses for for experienced C
> programmers and system admins, have believed C-d to be the Unix end of
> file "character". I have found it very useful to teach people about line
> input processing in general. Most Unix users are completely clueless
> about this fundamental aspect of their interaction with the system.

I think that the reason for the confusion is simply a naming issue,
due to the output of "stty -a", which specifically labels the unix
cooking that is normally assigned to C-d as "eof".

Duane Rettig

unread,

Nov 11, 2002, 6:00:01 PM11/11/02

to

Responding to myself:

Duane Rettig <du...@franz.com> writes:

> Erik Naggum <er...@naggum.no> writes:
>
> > * Duane Rettig <du...@franz.com>
> > | 1. If the stream is not a "raw" stream wrt tty processing, it might
> > | process a Ctrl-D as an EOF once, and then never again, unless the stream
> > | implementation has allowed for the "unreading" of such a "soft" eof.
> >
> > While it may appear that Unix C-d means end of file,
>
> Correct. C-d's ASCII name is "EOT", which means "End of Transmission",
> but its interpretation depends on how it's cooked up by the terminal
> device driver. And the standard behavior for cooked C-d behavior
> is not something that is generally known thoroughly.

=====^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After having read this I am not satisfied with the above phrase as
being descriptive enough. I probably thought it was descriptive
enough because of your example, which shows the not-generally-known
behavior of the standard cooked C-d action, but eliding your example
removed the necessary context.

In fact, if you call read on stdin, a C-d will cause the read to return;
if characters had been read, then these are returned as part of the read.
But if no characters are read, then a 0 is returned, which usually indicates
an end of transmission situation.

What I meant by "not generally known thoroughly" is that although most
unix wizards know of the eof-like behavior after a newline (i.e. when
no characters have been read in the line), many are not aware of the
other behavior, since so many times people tend to type <Enter> <C-d>
rather than just <C-d>, so it just never comes up for most people.

Duane Rettig

unread,

Nov 11, 2002, 7:00:01 PM11/11/02

to

Erik Naggum <er...@naggum.no> writes:

> * Duane Rettig
> | On the other hand, if you try typing into a terminal that is so
> | connected, you would not be able to process data after the Ctrl-D, which
> | is a normal action for a terminal.
>
> Not so. When the C-d is not at the start of the input, it only forces
> the terminal handler to send the data collected so far to the caller. If
> you monitor a program that calls the `read´ system call until it returns
> 0 and look at what it actually does with a system-call tracer, you will
> see that you can indeed continue to type after a C-d that does not follow
> another C-d or C-j that pushed the line with the newline to the caller,
> and if the caller waits for a newline before it terminates its loop of
> `read´ calls, you should see two major effects for which this is expressly
> employed: You can write longer lines than the hard line buffer limit that
> is often only 256 characters, and you cannot edit the input line past the
> C-d. Under any Linux system, the command `strace cat´ will do just fine.

I think perhaps you misunderstood my motivations for the above sentence,
and I had taken some shortcuts that might have contributed to confusion.

I was responding to Marco's statement:

| As a matter of
| fact, the semantic I have in mind in this case is: once you have seen
| a C-d, you have seen the EOF. Period.

which might be read as "once my program sees an eof, it will never have to
read from that stream again". This would of course be disastrous, since
if that stream happened to be terminal-like, such a program would then
ignore data from the terminal forever. By "so connected", I meant "a
program which makes such assumptions and doesn't try reading from the
stream after such an eof situation".

I agree that I should have said "eof" instead of "C-d", since the zero-length
read is what gets through in cooked mode, not C-d.

> | In other words, on a terminal, an eof is just like another character, and
> | should be processed as if it were a character, including its un-read
> | characteristics.
>
> Not in line mode. You never see the C-d in cooked mode any more than you
> see C-r or any of the other line-editing characters.
>
> If you set a terminal line to raw mode so that you do see C-d and the
> like, there /is/ no out-of-band end-of-file signal, and you have to settle
> for an in-band signal, instead.

Note that I said "eof" here, and not C-d. I agree with your statements,
of course.

Barry Margolin

unread,

Nov 11, 2002, 7:51:57 PM11/11/02

to

In article <4u1in4...@beta.franz.com>,

Duane Rettig <du...@franz.com> wrote:
>I was responding to Marco's statement:
>
>| As a matter of
>| fact, the semantic I have in mind in this case is: once you have seen
>| a C-d, you have seen the EOF. Period.
>
>which might be read as "once my program sees an eof, it will never have to
>read from that stream again". This would of course be disastrous, since
>if that stream happened to be terminal-like, such a program would then
>ignore data from the terminal forever. By "so connected", I meant "a
>program which makes such assumptions and doesn't try reading from the
>stream after such an eof situation".

That's pretty normal for a program that performs device-independent input.
In fact, programs that keep reading past the first EOF are pretty unusual.
They're generally designed specifically for interactive use, so his generic
EOF-handling code is not going to be used.

The problem I can see with some of the suggested solutions are that they
*don't* implement his "you've seen EOF, period" semantics. If the
implementation allows you to read past the EOF on terminal streams, then
the solutions that use LISTEN, PEEK-CHAR, or READ-CHAR-NO-HANG won't work
if the stream previously reported EOF. That EOF was a temporary state that
was cleared as soon as it was reported to the caller.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Harald Hanche-Olsen

unread,

Nov 12, 2002, 5:12:47 PM11/12/02

to

+ Erik Naggum <er...@naggum.no>:

| If you monitor a program that calls the `read´ system call until
| it returns 0 and look at what it actually does with a system-call
| tracer, you will see that you can indeed continue to type after a
| C-d that does not follow another C-d or C-j that pushed the line
| with the newline to the caller,

I found this sufficiently fascinating that I deviced this little
experiment using CMUCL:

First run tty in a terminal window. Say it returns /dev/ttyp6:

(defparameter *fd* (unix:unix-open "/dev/ttyp6" unix:o_rdonly #o444))
(defvar *a* (alien:make-alien (array (alien:unsigned 8) 256)))
(defvar *aa* (alien:deref *a* 0))
(defun read-a-bit ()
(let ((count (unix:unix-read *fd* *a* 256)))
(map 'string #'code-char
(loop for n below count collect (alien:deref *aa* n)))))

Go back to the terminal window and run "sleep 3600" to avoid any
competition over who gets to read from it. Repeatedly evaluate
(read-a-bit) and type some stuff into the terminal window, including
newlines and control-Ds (or whatever your eof character may be).

I certainly found the result illuminating.

To clean up run: (unix:unix-close *fd*)

--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- Yes it works in practice - but does it work in theory?

Erik Naggum

unread,

Nov 12, 2002, 7:57:12 PM11/12/02

to

* Duane Rettig <du...@franz.com>

| I was responding to Marco's statement:
|
| | As a matter of
| | fact, the semantic I have in mind in this case is: once you have seen
| | a C-d, you have seen the EOF. Period.
|
| which might be read as "once my program sees an eof, it will never have to
| read from that stream again". This would of course be disastrous, since
| if that stream happened to be terminal-like, such a program would then
| ignore data from the terminal forever.

But this is how line-oriented terminal input is defined under Unix. You
can actually send an end-of-file with a C-d after a C-j or C-d, as hard an
end-of-file as it gets. The user has unceremoniously communicated his
intent to terminate input to that particular process. It would be wrong
for that process to continue to read from that terminal stream.

However, if you do not use the line-oriented terminal mode under Unix,
and instead go for device-level reads, there is no end-of-file and the
source has to signal his intent to terminate communication with that
process through a small ceremony, like C-x C-c or some other in-band
mechanism, or, in the case of the appropriate hardware, something like
loss of carrier and the like.

In other words, if you accept line mode, you accept the whole protocol.
If you reject line mode, you also reject the whole protocol. If you want
a protocol, you have to implement it yourself on top of the raw stream.
If you manage to do this, but you define some other meaning to end-of-file
or C-d, that is of course your prerogative, but it is no longer the Unix
line-oriented terminal protocol and should be properly identified.

Kaz Kylheku

unread,

Nov 13, 2002, 7:08:23 PM11/13/02

to

Erik Naggum <er...@naggum.no> wrote in message news:<32461378...@naggum.no>...

> * Duane Rettig <du...@franz.com>
> | I was responding to Marco's statement:
> |
> | | As a matter of
> | | fact, the semantic I have in mind in this case is: once you have seen
> | | a C-d, you have seen the EOF. Period.
> |
> | which might be read as "once my program sees an eof, it will never have to
> | read from that stream again". This would of course be disastrous, since
> | if that stream happened to be terminal-like, such a program would then
> | ignore data from the terminal forever.
>
> But this is how line-oriented terminal input is defined under Unix. You
> can actually send an end-of-file with a C-d after a C-j or C-d, as hard an
> end-of-file as it gets. The user has unceremoniously communicated his
> intent to terminate input to that particular process. It would be wrong
> for that process to continue to read from that terminal stream.

C-d causes the read() system call to return immediately. When this is
done at the beginning of the line, it returns 0 characters read, which
looks like an end-of-file. If it's done in the middle of a line, the
read() returns with an incomplete line---no newline. So you have to
type C-d twice in that case. Try it:

$ cat > foo
asdasdf<C-d>

nothing visible happens

asdasdf<C-d><C-d>

cat terminates.

File foo now contains an incomplete line. See, you don't have to use
pico or xedit to cause incomplete lines; there is official kernel
support for doing it! :)

It's possible to ignore the zero byte read and keep reading. In the
case of reading from a FILE * stream in C, you would just
clearerr(stdin) to reset the stream's end-of-file flag and merrily
keep going. This could be useful for implementing nested command
sessions within one process, allowing the interpretation that the user
really wishes to discontinue sending input to just the present nesting
level.

> However, if you do not use the line-oriented terminal mode under Unix,
> and instead go for device-level reads, there is no end-of-file and the

Line-oriented mode *is* device-level reads, and there really is no
true end-of-file. The line editing behavior is implemented in the
kernel, by a line discipline module that is interposed between the
system calls and the tty driver. So to the applications, it looks like
there really is a piece of hardware that delivers entire lines of
input to the operating system.

There is usually a ``standard'' line discipline that gives you all the
POSIX behaviors, with all the various degrees from raw to cooked, and
input-output translation bells and whistles, timed aggregation and
timeouts, flow control, etc. Things like the inter-byte timer that
continues a read() so long as the delay between successive characters
is less than a specified value, and fewer than N bytes have been
accumlated, etc.

Special protocols like SLIP and PPP and whatnot are also implemented
over tty devices as line disciplines that hook into the networking
code at the same time---a kind of multiple inheritance from ``network
driver'' and ``tty discipline'' pseudo-classes. With these, normal
read() and write() may be disabled completely, or else perhaps provide
a special interface for a user-land daemon to inject its own frames
and receive replies so that link-level protocols such as HDLC, LCP,
CHAP and whatnot can be hacked up out of the kernel.

Sorry about the long off topic rant about arcane obscurities. ;)

Erik Naggum

unread,

Nov 13, 2002, 7:37:06 PM11/13/02

to

* Kaz Kylheku

| C-d causes the read() system call to return immediately.

Thank you for re-explaining what was obviously impossible to understand
from what I already posted.

| Line-oriented mode *is* device-level reads, and there really is no
| true end-of-file.

Please work harder to get the point.

| So to the applications, it looks like there really is a piece of hardware
| that delivers entire lines of input to the operating system.

Wrong.

| Sorry about the long off topic rant about arcane obscurities. ;)

It would have been OK if you got it right.