Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bidirectional streams

31 views
Skip to first unread message

Juanjo

unread,
Jan 1, 2009, 5:58:22 AM1/1/09
to
Given the following piece of code, which behavior do you find more
correct of the ones shown below?

(with-open-file (s "foo.txt" :direction :io :if-exists :supersede)
(write-line (print "abcdefghijk") s)
(file-position s 0)
(let (a)
(print (list (file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)
(file-position s)
(write-char #\x s)
(file-position s)
(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

Basically the problem relates to the value of file-position and the
actual place where data is written right after an unread-char
operation. This behavior also affects the behavior of an
implementation when reading, for instance, a Return + Linefeed
sequence.

Is this simply undefined? I found the section on streams in the ANSI
specification to be, say, lacking in all aspects, hehe.

This is SBCL 1.0.10, an implementation of ANSI Common Lisp.
"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
"abxdefghijk"

ECL (Embeddable Common-Lisp) 8.12.0 (CVS 2008-07-12 18:54)
"abcdefghijk"
(0 #\a 1 NIL 1 #\x 2 #\a 2 #\c)
"axcdefghijk"

Welcome to GNU CLISP 2.45 (2008-04-04) <http://clisp.cons.org/>
"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\c)
"axcdefghijk"

Pascal J. Bourguignon

unread,
Jan 1, 2009, 7:43:54 AM1/1/09
to
Juanjo <juanjose.g...@googlemail.com> writes:


Well, the note of UNREAD-CHAR says that it's just a Q&D hack to let the
lisp reader look ahead one character. The lisp reader doesn't write the
stream it reads, and doesn't use file-position since it will work on
streams with no attached file.

So clearly, when mixin file-position and write-char with unread-char,
you're out of bounds of unread-char. This is simply undefined, and
anything goes.

Personnaly, I would have a preference for this result:

(0 #\a 1 nil 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"


But note that when you mix read-char, write-char, and file-position
(implicitly thru unread-char) on streams encoded in UTF-8 (or some other
variable width character encoding), you may get very surprizing results.

What if the stream contained "áeìoü"?
File positions after reading each character may be: 0 2 3 5 6 8,
or you could even have bigger increments with normalized forms.

So unreading a character, it may be difficult to find the correct file
position.

And then, writting over a character that needs a different number of
bytes should prevent further reading characters.

So your code is really nasty, you can't complain for the implementation
variability.

--
__Pascal Bourguignon__

Rainer Joswig

unread,
Jan 1, 2009, 7:51:15 AM1/1/09
to
In article
<bf7996a7-a0f9-468f...@w1g2000prk.googlegroups.com>,
Juanjo <juanjose.g...@googlemail.com> wrote:


I would add FINISH-OUTPUT after writing to a stream,
if you want to see effects. That's a common error
in portable code: Writing something to a (file/window) stream
and assuming the output is done immediately.

--
http://lispm.dyndns.org/

Juanjo

unread,
Jan 1, 2009, 8:45:59 AM1/1/09
to
On Jan 1, 1:51 pm, Rainer Joswig <jos...@lisp.de> wrote:
> I would add FINISH-OUTPUT after writing to a stream,
> if you want to see effects. That's a common error
> in portable code: Writing something to a (file/window) stream
> and assuming the output is done immediately.

Ok, adding FORCE-OUTPUT or FINISH-OUTPUT does the same in the three
implementations. SBCL now seems to get a more sensible result, but
CLISP is really doing something weird: file-position is 1 after
writing #\x

This is SBCL 1.0.10, an implementation of ANSI Common Lisp.
"abcdefghijk"

(0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"

Welcome to GNU CLISP 2.45 (2008-04-04) <http://clisp.cons.org/>

Juanjo

unread,
Jan 1, 2009, 8:54:27 AM1/1/09
to
On Jan 1, 1:43 pm, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> So clearly, when mixin file-position and write-char with unread-char,
> you're out of bounds of unread-char.  This is simply undefined, and
> anything goes.
>
> Personnaly, I would have a preference for this result:
>
> (0 #\a 1 nil 0 #\x 1 #\b 2 #\c)
> "xbcdefghijk"

Seems also the most sensible outcome to me. I would like to get this
in ECL as well.

> But note that when you mix read-char, write-char, and file-position
> (implicitly thru unread-char) on streams encoded in UTF-8 (or some other
> variable width character encoding), you may get very surprizing results.

There is no problem using file positions with multibyte encodings. As
far as you pass to FILE-POSITION only values that were output by FILE-
POSITION itself, the result should be the one you expect. This is the
reason why I think it is important to get consistent results when
querying file positions after READ/UNREAD.

Just in case someone cares, I got to this discussion when implementing
a reader for CR+LF sequences. I faced the problem of finding CR + (not
a LF character). In this case flexi streams outputs CR and keeps the
next character. It turns out that this cannot be implemented using
UNREAD-CHAR because the user should be able to use UNREAD-CHAR on the #
\Return character. Anyway, I will eventually figure out how to
implement this :-)

Juanjo

Rainer Joswig

unread,
Jan 1, 2009, 9:01:51 AM1/1/09
to
In article
<0ded48b3-ffb1-487d...@a26g2000prf.googlegroups.com>,
Juanjo <juanjose.g...@googlemail.com> wrote:


FORCE-OUTPUT and FINISH-OUTPUT are similar, but FORCE-OUTPUT
does not need to wait for completion. If you really want reliable
output (output done and no error) use FINISH-OUTPUT.

--
http://lispm.dyndns.org/

Tobias C. Rittweiler

unread,
Jan 1, 2009, 9:21:41 AM1/1/09
to
Juanjo <juanjose.g...@googlemail.com> writes:

> There is no problem using file positions with multibyte encodings.

FILE-POSITION may legitimately return a byte-offset, not a
character-offset even for character streams.

See https://bugs.launchpad.net/sbcl/+bug/310079 for more details.

-T.

Juanjo

unread,
Jan 2, 2009, 2:21:04 PM1/2/09
to
On Jan 1, 2:54 pm, Juanjo <juanjose.garciarip...@googlemail.com>
wrote:

> On Jan 1, 1:43 pm, p...@informatimago.com (Pascal J. Bourguignon)
> wrote:
>
> > So clearly, when mixin file-position and write-char with unread-char,
> > you're out of bounds of unread-char.  This is simply undefined, and
> > anything goes.
>
> > Personnaly, I would have a preference for this result:
>
> > (0 #\a 1 nil 0 #\x 1 #\b 2 #\c)
> > "xbcdefghijk"
>
> Seems also the most sensible outcome to me. I would like to get this
> in ECL as well.

I just implemented it today. Available in the git/CVS repositories
together with support for fixed and variable width external formats

(with-open-file (s "foo.txt" :direction :io :if-exists :supersede)

(write-line (print "ßçéî") s)


(file-position s 0)
(let (a)
(print (list (file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)
(file-position s)

(prog1 (write-char #\ç s) (finish-output s))


(file-position s)
(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

"ßçéî"
(0 #\U00df 2 NIL 0 #\U00e7 2 #\U00e7 4 #\U00e9)
"ççéî"

d...@scieneer.com

unread,
Jan 2, 2009, 8:13:20 PM1/2/09
to
On Jan 1, 9:58 pm, Juanjo <juanjose.garciarip...@googlemail.com>
wrote:

The Scieneer CL returns the following:
"abcdefghijk"


(0 #\a 1 nil 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"

Also for a multi-byte encoding (e.g. UTF-16):
"áeìoü"
(0 #\U00E1 1 nil 0 #\x nil 1 #\U00E1 2 #\e)
"xáeìoü"

The Scieneer CL defines the file-position of character streams as the
number of characters read. This allows some transforms of the
underlying file, such as CR-LF translation, or a change to the
character encoding, or file compression or encryption, without a
change to the file positions. It also allows a character file to be
loaded into a CL string and for the file-positions to remain
consistent when reading from a string input stream - file-position is
usable on string input streams in the Scieneer CL. Further it
supports block encoded streams which could not be supported using the
byte position because ANSI CL does require a monotonic increase in the
file position with each character.

d...@scieneer.com

unread,
Jan 2, 2009, 8:22:44 PM1/2/09
to
On Jan 2, 1:21 am, "Tobias C. Rittweiler" <t...@freebits.de.invalid>
wrote:

Yes, a byte offset may be returned, but only for file encodings that
have a monotonic increase in position for each character. This may
not be the case for block encoded files, such as compressed or block
encrypted files, so this is only a conforming ANSI CL implementation
to the limited extent of the file encodings supported and this
definition of file-position restricts extensions.

Pascal J. Bourguignon

unread,
Jan 2, 2009, 8:37:57 PM1/2/09
to
d...@scieneer.com writes:

My understanding is that when CLHS says "monotonic increase" it is not a
strict increase, the increase may be 0.

For example, on a file encoded in base-64, I think
(loop repeat 10 do (read-char s) collect (file-position s))
could return (1 1 1 1 4 4 4 4 7 7)

--
__Pascal Bourguignon__

d...@scieneer.com

unread,
Jan 2, 2009, 9:09:29 PM1/2/09
to
On Jan 3, 12:37 pm, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> d...@scieneer.com writes:
> > On Jan 2, 1:21 am, "Tobias C. Rittweiler" <t...@freebits.de.invalid>
> > wrote:
> >> Juanjo <juanjose.garciarip...@googlemail.com> writes:
> >> > There is no problem using file positions with multibyte encodings.
>
> >> FILE-POSITION may legitimately return a byte-offset, not a
> >> character-offset even for character streams.
>
> >> Seehttps://bugs.launchpad.net/sbcl/+bug/310079formore details.

>
> >> -T.
>
> > Yes, a byte offset may be returned, but only for file encodings that
> > have a monotonic increase in position for each character. This may
> > not be the case for block encoded files, such as compressed or block
> > encrypted files, so this is only a conforming ANSI CL implementation
> > to the limited extent of the file encodings supported and this
> > definition of file-position restricts extensions.
>
> My understanding is that when CLHS says "monotonic increase" it is not a
> strict increase, the increase may be 0.
>
> For example, on a file encoded in base-64, I think
> (loop repeat 10 do (read-char s) collect (file-position s))
> could return (1 1 1 1 4 4 4 4 7 7)
>
> --
> __Pascal Bourguignon__

This interpretation would lead to the file position becoming
ambiguous. A conforming implementation can simply return 'nil if it
does not support file-position for a particular encoding, however
there are lots of useful block encodings so this is rather limiting.

Juanjo

unread,
Jan 3, 2009, 4:21:29 AM1/3/09
to
On Jan 3, 2:13 am, d...@scieneer.com wrote:
> The Scieneer CL defines the file-position of character streams as the
> number of characters read.

How do you handle seeking a file-position for multibyte encodings?
That becomes very inefficient unless you keep track of the positions
at which characters were read -- in other words, it implies scanning
the whole file forwards or backwards.

Juanjo

d...@scieneer.com

unread,
Jan 3, 2009, 9:15:51 AM1/3/09
to
On Jan 3, 8:21 pm, Juanjo <juanjose.garciarip...@googlemail.com>
wrote:

Yes, variable length multibyte encodings may require scanning the file
to find a position and this is less efficient than positioning to a
byte position in a file. However defining the character file position
as the number of characters read has many advantages and can often
lead to much improved performance. For example it allows character
streams to be buffered in units of characters, buffering can
significantly improve stream performance, and positioning within the
buffer is fast. The definition allows a file to be firstly loaded
into a CL string and a string input stream used within which
positioning is very fast. The Scieneer CL supports encapsulated
character conversion streams so it is possible to work with both the
byte stream, quickly changing the byte position when possible, and
encapsulate the byte stream with a character stream after positioning.

George Neuner

unread,
Jan 3, 2009, 10:25:56 PM1/3/09
to


I think the HyperSpec is lacking some detail that would clarify what
is to be expected. Nonetheless ...

(open ... :direction :io) returns a "bidirectional file stream".
Nothing in particular is said about whether a "file stream" has 1 or 2
position markers, but bidirectional streams are definitively said to
be composed of 2 streams. Logically each stream should have its own
position marker and nowhere is it said that the markers are tied when
both streams are pointing to the same data object.

So I would say it's correct for the read and write positions to be
independent - if you want to be certain where you are you need to
track the position and explicitly set when changing between reading
and writing.

This is the way stream IO is handled by many (most?) other languages
as well. It might make some sense to tie the markers in the case of
files, but then the stream abstraction would be broken because files
would behave differently from other data objects.

George

Juanjo

unread,
Jan 5, 2009, 4:10:31 AM1/5/09
to
On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
> So I would say it's correct for the read and write positions to be
> independent - if you want to be certain where you are you need to
> track the position and explicitly set when changing between reading
> and writing.

This is impossible following the Hyperspec because the FILE-POSITION
function does not allow you to select which position marker (if there
were two) you are updating.

> This is the way stream IO is handled by many (most?) other languages
> as well.

Not the case of C derivatives, AFAIK. Again, there is only one seek
and tell function and it does not allow you to select whether you want
the read or write position.

Juanjo

Juanjo

unread,
Jan 5, 2009, 6:42:26 AM1/5/09
to
On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
> >ECL(Embeddable Common-Lisp) 8.12.0 (CVS 2008-07-12 18:54)

> >"abcdefghijk"
> >(0 #\a 1 NIL 1 #\x 2 #\a 2 #\c)
> >"axcdefghijk"
>
> >Welcome to GNU CLISP 2.45 (2008-04-04) <http://clisp.cons.org/>
> >"abcdefghijk"
> >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\c)
> >"axcdefghijk"
>
> I think the HyperSpec is lacking some detail that would clarify what
> is to be expected.  Nonetheless ...
>
> (open ... :direction :io) returns a "bidirectional file stream".
> Nothing in particular is said about whether a "file stream" has 1 or 2
> position markers, but bidirectional streams are definitively said to
> be composed of 2 streams.

This statement needs also to be precised. A bidirectional file stream
is not made of two streams: it is rather a stream that is both an
input and an output stream
http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_b.htm#bidirectional

This is different from a two-way-stream, which is indeed a composite
stream and thus keeps two pointers, one for reading and one for
writing. But FILE-POSITION does not work on composite streams.

Juanjo

George Neuner

unread,
Jan 5, 2009, 8:58:32 AM1/5/09
to
On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo
<juanjose.g...@googlemail.com> wrote:

>On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
>> So I would say it's correct for the read and write positions to be
>> independent - if you want to be certain where you are you need to
>> track the position and explicitly set when changing between reading
>> and writing.
>
>This is impossible following the Hyperspec because the FILE-POSITION
>function does not allow you to select which position marker (if there
>were two) you are updating.

You misunderstand.

I'm not claiming that a file has 2 position markers (that's beyond the
spec). What I am saying is that it seems clear that a bidirectional
stream *may* have both read and write position markers and that the 2
markers are not necessarily tied together. That makes sense because a
stream can be bound to things other than files, e.g., sockets, pipes,
IO channels, etc. for which independent read and write makes sense.

In particular the Hyperspec does not say that a bidirectional stream
acts in any way differently from having independent input and output
streams bound to the same file object. It only specifies that both
input and output functionality will be available through a single
stream object.

AFAICS, a bi-stream could just encapsulate separate input and output
streams and transparently change an underlying file marker to allow
reading and writing from different places. Such an implementation is
unlikely but I don't see it being prohibited. The OP's message showed
differing results from various implementations suggesting that either
some are buggy or there is not agreement on how a bi-stream should
behave on a file.

So the only way to be certain where you are is to independently track
position in your application and (re)set it as necessary before each
read or write call.


>> This is the way stream IO is handled by many (most?) other languages
>> as well.
>
>Not the case of C derivatives, AFAIK. Again, there is only one seek
>and tell function and it does not allow you to select whether you want
>the read or write position.

Well C doesn't have streams per se ... it has only buffered files and
the FILE structure has only one position marker.

In C++ seek and tell are specifically guaranteed only to work on
fstreams - ie., on files - not on any kind of stream. Secondly, if
you look closely at the implementation, fstream is subclassed from one
of istream or ostream (usually ostream), it creates the counterpart
stream and ties the 2 position markers so their view of the underlying
file is unified.

Readers and Writers in C# and Java are handled in much the same way.
The file object creates the Reader or Writer and can control whether
they see a unified position. It's a little more convoluted because
both C# and Java allow composable filters in streams and so the notion
of position may not translate between Reader and Writer, but usually
the end result is similar to C++'s implementation.

And in each of these languages you can easily create your own stream
type that would not tie the positions. In C# and Java, for example,
Reader and Writer streams on a socket are not tied.


Which brings us back to whether Lisp's bidirectional streams are a
unique entity or are similarly just an object encapsulating 2 streams
and what guarantees are there (or should there be).

George

Juanjo

unread,
Jan 5, 2009, 9:53:44 AM1/5/09
to
On Jan 5, 2:58 pm, George Neuner <gneun...@comcast.net> wrote:
> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo
>
> <juanjose.garciarip...@googlemail.com> wrote:
> >On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
> >> So I would say it's correct for the read and write positions to be
> >> independent - if you want to be certain where you are you need to
> >> track the position and explicitly set when changing between reading
> >> and writing.
>
> >This is impossible following the Hyperspec because the FILE-POSITION
> >function does not allow you to select which position marker (if there
> >were two) you are updating.
>
> You misunderstand.  
>
> I'm not claiming that a file has 2 position markers (that's beyond the
> spec).  What I am saying is that it seems clear that a bidirectional
> stream *may* have both read and write position markers and that the 2
> markers are not necessarily tied together.  That makes sense because a
> stream can be bound to things other than files, e.g., sockets, pipes,
> IO channels, etc. for which independent read and write makes sense.

I am afraid this is not the case, at least for binary streams: "For a
binary file, every read-byte or write-byte operation increases the
file position by 1." (http://www.lispworks.com/documentation/HyperSpec/
Body/f_file_p.htm) In my understanding, that means the read and write
position markers must be tied together, or not make sense at -- which
is permitted by FILE-POSITION returning NIL, as it would be the case
of sockets.

Juanjo

George Neuner

unread,
Jan 5, 2009, 1:28:14 PM1/5/09
to

I agree that a cursory reading would tend to suggest that to a
programmer familiar with IO in other languages, but the Hyperspec is
written in English rather than a formal language.

Turning my hat sideways and playing language lawyer, I would say that,
by itself, the sentence "... every read-byte or write-byte operation
increases the file position by 1." only tells you (one of) the effects
of those particular named operations. It says nothing about what a
"file position" is, where it starts from or what happens to it if
operations other than read or write are performed. It does not say
that read and write share the position, nor does it explicitly say
that you can perform multiple successive reads or writes, or that
reads and writes can be alternated.

Of course, some of those issues are covered elsewhere in the Hyperspec
but my point is that you (usually) can't just pick a passage out of
the Hyperspec to prove your point ... it's typically more involved
than that.

George

Raymond Toy

unread,
Jan 5, 2009, 1:21:15 PM1/5/09
to
>>>>> "George" == George Neuner <gneu...@comcast.net> writes:

George> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo
George> <juanjose.g...@googlemail.com> wrote:

>> On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
>>> So I would say it's correct for the read and write positions to be
>>> independent - if you want to be certain where you are you need to
>>> track the position and explicitly set when changing between reading
>>> and writing.
>>
>> This is impossible following the Hyperspec because the FILE-POSITION
>> function does not allow you to select which position marker (if there
>> were two) you are updating.

George> You misunderstand.

George> I'm not claiming that a file has 2 position markers (that's beyond the
George> spec). What I am saying is that it seems clear that a bidirectional
George> stream *may* have both read and write position markers and that the 2
George> markers are not necessarily tied together. That makes sense because a
George> stream can be bound to things other than files, e.g., sockets, pipes,
George> IO channels, etc. for which independent read and write makes sense.

George> In particular the Hyperspec does not say that a bidirectional stream
George> acts in any way differently from having independent input and output
George> streams bound to the same file object. It only specifies that both
George> input and output functionality will be available through a single
George> stream object.

Not sure if this is intentional or not, but this is how CMUCL handles
the test case:

"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)

"abcdefghijk"

If you look at the actual file, we find:

abcdefghijk
x

That is, the write-char actually appended to the file.

Ray

George Neuner

unread,
Jan 5, 2009, 10:29:39 PM1/5/09
to

Another county heard from. So now we have:

SBCL 1.0.10


"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)

"abxdefghijk"

ECL 8.12.0 (CVS 2008-07-12 18:54)
"abcdefghijk"


(0 #\a 1 NIL 1 #\x 2 #\a 2 #\c)
"axcdefghijk"

GNU CLISP 2.45 (2008-04-04)
"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\c)
"axcdefghijk"

and now CMUCL (you [Raymond] didn't say which version)


"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
"abcdefghijk"

and I'll add Corman 3.0.3


"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
"abcdefghijk"

Given the OP's code:

(with-open-file (s "foo.txt" :direction :io :if-exists :supersede)
(write-line (print "abcdefghijk") s)
(file-position s 0)
(let (a)
(print (list (file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)
(file-position s)
(write-char #\x s)
(file-position s)
(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

we have 3 different results from 5 implementations.

ECL and CLisp perform mostly as Juango expects they should. I would
argue that they are still wrong because in a perfect world unread-char
would decrement the file position and cause the #\a to be overwritten
instead of the #\b.

Since SBCL does the same but is off-by-1, first guess is it has a bug.

However CMUCL's and Corman's wildly different, yet identical, results
- Corman also appends #\x to the file - show that the issue of
bidirectional stream positioning on a file is far from settled.

FWIW: the code works as desired (overwrites #\a) in most of the Lisps
mentioned (I dont have ECL) iff you call file-position to set the file
marker whenever you change between reading and writing.

(write-line (print "abcdefghijk") s)
(file-position s 0)
(let (a)
(print (list
(file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)

(file-position s (file-position s))
(write-char #\x s)
(file-position s (file-position s))


(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"


The results seem to show that reading and writing on a bidirectional
stream *do* have different ideas of what the current position is.

George

Madhu

unread,
Jan 6, 2009, 3:11:25 AM1/6/09
to

* George Neuner <rri5m4hp74qhg6m22...@4ax.com> :
Wrote on Mon, 05 Jan 2009 22:29:39 -0500:

| On Mon, 05 Jan 2009 13:21:15 -0500, Raymond Toy
| <raymo...@ericsson.com> wrote:
|
|> George> I'm not claiming that a file has 2 position markers

|> George> (that's beyond the spec). What I am saying is that it
|> George> seems clear that a bidirectional stream *may* have both
|> George> read and write position markers and that the 2 markers are
|> George> not necessarily tied together. That makes sense because a


|> George> stream can be bound to things other than files, e.g.,

|> George> sockets, pipes, IO channels, etc. for which independent
|> George> read and write makes sense.


|>
|> George> In particular the Hyperspec does not say that a

|> George> bidirectional stream acts in any way differently from
|> George> having independent input and output streams bound to the
|> George> same file object. It only specifies that both input and
|> George> output functionality will be available through a single


|> George> stream object.
|>
|>Not sure if this is intentional or not, but this is how CMUCL handles
|>the test case:
|>
|>"abcdefghijk"
|>(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
|>"abcdefghijk"
|>
|>If you look at the actual file, we find:
|>
|>abcdefghijk
|>x
|>
|>That is, the write-char actually appended to the file.

I believe This is an artifact of buffering reads and writes. CMUCL does
not flush its buffers after interleaved READ and WRITE operations.

Calling FINISH-OUTPUT after the WRITE-CHAR operation gives the expected
result of replacing the #\a with an #\x.

| Another county heard from. So now we have:
|
| SBCL 1.0.10
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
| "abxdefghijk"
|
| ECL 8.12.0 (CVS 2008-07-12 18:54)
| "abcdefghijk"
| (0 #\a 1 NIL 1 #\x 2 #\a 2 #\c)
| "axcdefghijk"
|
| GNU CLISP 2.45 (2008-04-04)
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\a 2 #\c)
| "axcdefghijk"
|
| and now CMUCL (you [Raymond] didn't say which version)
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
| "abcdefghijk"
|
| and I'll add Corman 3.0.3
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
| "abcdefghijk"

Both Allegro acl81_express and Lispworks personal 5.1.1 behave still
differently from all the above [and SCL] (and conform to the expectation
that #\a is overwritten)

Allegro and Lispworks:

"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"

"xbcdefghijk"

I'm not sure that is the case. I believe there are two issues here, One
is does UNREAD-CHAR change affect position on an input stream. The
other is how the lisp implementation implements buffering on the
underlying stream. Both of these are stated without regard to the
stream being bidirectional or otherwise.

Note UNREAD-CHAR is specified only in situations where the next
operation on the file is expected to be a READ. It also implies a
buffer is used.

I suspect CMUCL and Corman use the same buffering strategy.

| FWIW: the code works as desired (overwrites #\a) in most of the Lisps
| mentioned (I dont have ECL) iff you call file-position to set the file
| marker whenever you change between reading and writing.


I believe the same effect woule be achieved with conforming code if you
forced output inbetween read and write operations to the stream via
FINISH-OUTPUT.

i.e. if you replaced
(file-position s (file-position s))
with
(finish-output s)
in the following code

| (write-line (print "abcdefghijk") s)
| (file-position s 0)
| (let (a)
| (print (list
| (file-position s)
| (setf a (read-char s))
| (file-position s)
| (unread-char a s)
| (file-position s (file-position s))
| (write-char #\x s)
| (file-position s (file-position s))
| (read-char s)
| (file-position s)
| (read-char s))))
| (file-position s 0)
| (print (read-line s)))
|
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
| "xbcdefghijk"
|
|
| The results seem to show that reading and writing on a bidirectional
| stream *do* have different ideas of what the current position is.


The spec mentions buffering issues when describing FINISH-OUTPUT
CLEAR-OUTPUT and FORCE-OUTPUT but says "The precise actions of these
functions are implementation-dependent."

--
Madhu

du...@franz.com

unread,
Jan 6, 2009, 3:26:48 AM1/6/09
to
On Jan 5, 7:29 pm, George Neuner <gneun...@comcast.net> wrote:
> On Mon, 05 Jan 2009 13:21:15 -0500, Raymond Toy
>
>
>
> <raymond....@ericsson.com> wrote:

> >>>>>> "George" == George Neuner <gneun...@comcast.net> writes:
>
> >    George> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo

Let's make that 4 from 6 ...

Allegro CL's concept is that a file stream is a "single channel"
stream,
so that a bidirectional file stream nevertheless has only one file
position, and that file position is reconciled when directions switch
(with any accompanying flushing). Setting the file-position to the
file-position should not be necessary in a file stream.

CL-USER(1): (with-open-file (s "foo.txt" :direction :io :if-


exists :supersede)
(write-line (print "abcdefghijk") s)
(file-position s 0)
(let (a)
(print (list (file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)
(file-position s)
(write-char #\x s)
(file-position s)
(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

"abcdefghijk"


(0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"

"xbcdefghijk"
CL-USER(2):


Rob Warnock

unread,
Jan 6, 2009, 3:31:51 AM1/6/09
to
George Neuner <gneu...@comcast.net> wrote:
+---------------

| Raymond Toy <raymo...@ericsson.com> wrote:
| >Not sure if this is intentional or not, but this is how CMUCL handles
| >the test case:
| >
| >"abcdefghijk"
| >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
| >"abcdefghijk"
| >
| >If you look at the actual file, we find:
| >
| >abcdefghijk
| >x
| >
| >That is, the write-char actually appended to the file.
...

| and now CMUCL (you [Raymond] didn't say which version)
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
| "abcdefghijk"
+---------------

Doesn't seem to matter: CMUCL-18e, -19a-pre3, -19c, & -19e
all give the same results.


-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Rainer Joswig

unread,
Jan 6, 2009, 3:49:05 AM1/6/09
to
In article <rri5m4hp74qhg6m22...@4ax.com>,
George Neuner <gneu...@comcast.net> wrote:

...

> Since SBCL does the same but is off-by-1, first guess is it has a bug.
>
> However CMUCL's and Corman's wildly different, yet identical, results
> - Corman also appends #\x to the file - show that the issue of
> bidirectional stream positioning on a file is far from settled.

That's also a question of buffering.

If you open a stream to a file, write some stuff to the file and now do

(write somestring :stream stream)

What is the file position now?

a) the same as before?

b) incremented by some arbitrary value?

c) incremented by the size of somestring?

d) we are already getting an error when calling WRITE

e) we are getting an error during calling FILE-POSITION

Every of above is possible, I'd say.

a) when the stream is buffered and somestring still fits into
the output buffer

b) if writing somestring triggers emptying the buffer because
it does not fit into the buffer (because there is still
output in the buffer and it adds up OR somestring is
larger than the buffer). We also don't know what the
increment is, because there might be other output pending.

c) if the string is not buffered or the string just lets
the buffer write its contents, then file position
is incremented by the size of somestring.

d) some parts of the string or the complete string will
be written and there is not enough space
(or some other IO error happen)

e) FILE-POSITION calls FINISH-OUTPUT for some reason and there is not
enough space (or some other IO error happens)

To have set the file position after a WRITE to some
more reliable value, I would expect that one needs to
call FINISH-OUTPUT, so that in a buffered stream implementation
pending output will be finished.

--
http://lispm.dyndns.org/

du...@franz.com

unread,
Jan 6, 2009, 4:00:09 AM1/6/09
to
On Jan 6, 12:11 am, Madhu <enom...@meer.net> wrote:
> * George Neuner <rri5m4hp74qhg6m225t5h2e0pakjrlh...@4ax.com> :

> Wrote on Mon, 05 Jan 2009 22:29:39 -0500:
>
> | However CMUCL's and Corman's wildly different, yet identical, results
> | - Corman also appends #\x to the file - show that the issue of
> | bidirectional stream positioning on a file is far from settled.
>
> I'm not sure that is the case.  I believe there are two issues here, One
> is does UNREAD-CHAR change affect position on an input stream.  The
> other is how the lisp implementation implements buffering on the
> underlying stream.  Both of these are stated without regard to the
> stream being bidirectional or otherwise.

Agreed. For the answer to the first question, see the glossary
entry for the term "file position", especially the last sentence
about monotonic and increasing. The unread-char doesn't directly
imply a decrement of the file-position, but the read-char that
would normally follow the unread-char does imply an increment
of the file position. Thus unread-char indirectly implies a
file-position decrement (but by an unspecified amount).

> Note UNREAD-CHAR is specified only in situations where the next
> operation on the file is expected to be a READ.

Not really; the only restrictions are what precedes an
unread-char, namely a peek-char or another unread-char
(thus, neither a peek-char nor an unread-char can portably
follow an unread-char).

> It also implies a buffer is used.

No; it would be simple to implement an unbuffered stream with a
single extra-character slot that becomes the character that has
been unread. The reason why implementations don't normally do
this is because all reads would have to first check for the
existence of the unread-character before reading from the stream.

Duane

du...@franz.com

unread,
Jan 6, 2009, 4:18:18 AM1/6/09
to
On Jan 6, 12:49 am, Rainer Joswig <jos...@lisp.de> wrote:
> In article <rri5m4hp74qhg6m225t5h2e0pakjrlh...@4ax.com>,

>  George Neuner <gneun...@comcast.net> wrote:
>
> ...
>
> > Since SBCL does the same but is off-by-1, first guess is it has a bug.
>
> > However CMUCL's and Corman's wildly different, yet identical, results
> > - Corman also appends #\x to the file - show that the issue of
> > bidirectional stream positioning on a file is far from settled.
>
> That's also a question of buffering.
>
> If you open a stream to a file, write some stuff to the file and now do
>
>   (write somestring :stream stream)
>
> What is the file position now?
>
> a) the same as before?
>
> b) incremented by some arbitrary value?
>
> c) incremented by the size of somestring?
>
> d) we are already getting an error when calling WRITE
>
> e) we are getting an error during calling FILE-POSITION
>
> Every of above is possible, I'd say.
>
> a) when the stream is buffered and somestring still fits into
>    the output buffer

File-position is not defined in terms that differe based on the
buffering or non-buffering of a stream, but the number of preceding
bytes or characters in the stream (the characters requirement is
looser, but the requirement for monotonic-increasing is still there).
So the file-position of a buffered stream that has just been written
to would be the file-position of the underlying file plus the number
of bytes/characters in the buffer.

> b) if writing somestring triggers emptying the buffer because
>    it does not fit into the buffer (because there is still
>    output in the buffer and it adds up OR somestring is
>    larger than the buffer). We also don't know what the
>    increment is, because there might be other output pending.

Based on my answer in (a), this value can be well-known, because the
file-position does not depend on the size of the buffer. Now of
course, it is not possible to determine the increment of the current
read position of the underlying file, because of the possibility that
buffering has delayed the writing of some of the characters. But
since the file-positiobn itself is fairly well defined, it can be
predicted.

> c) if the string is not buffered or the string just lets
>    the buffer write its contents, then file position
>    is incremented by the size of somestring.

Correct; in the buffered case this would be due to the file-position
depending on the total characters written so far, without regard to
how much of the file is in the buffer.

> d) some parts of the string or the complete string will
>    be written and there is not enough space
>    (or some other IO error happen)
>
> e) FILE-POSITION calls FINISH-OUTPUT for some reason and there is not
>    enough space (or some other IO error happens)

Agreed here.

> To have set the file position after a WRITE to some
> more reliable value, I would expect that one needs to
> call FINISH-OUTPUT, so that in a buffered stream implementation
> pending output will be finished.

I think that this requirement would be ludicrous, give the relative
weight of unread-char to finnish-output - the former should be a very
lightweight operation, and the latter is generally heavyweight. Read-
char should be close enough to the rest of the read/write functions to
know to either perform a pseudo-"finish" or else to adjust things so
that the finish-output is simulated without all that overhead.

Duane

Madhu

unread,
Jan 6, 2009, 5:36:00 AM1/6/09
to
* du...@franz.com <7a67d467-df75-47d3...@d36g2000prf.googlegroups.com> :
Wrote on Tue, 6 Jan 2009 01:00:09 -0800 (PST):

|> Note UNREAD-CHAR is specified only in situations where the next
|> operation on the file is expected to be a READ.
|
| Not really; the only restrictions are what precedes an unread-char,
| namely a peek-char or another unread-char (thus, neither a peek-char
| nor an unread-char can portably follow an unread-char).

True, there are no restrictions, But UNREAD-CHAR says:

,----
| unread-char places character back onto the front of input-stream so
| that it will again be the next character in input-stream.
`----

I just wanted to stress that this "the next character in input stream"
only makes sense for a read operation. The implication being: if you
wanted to get at that last unread character, your next operation better
be a read operation (and not, say, a WRITE-CHAR operation which would
replace the "next character in the input stream" which can be read.

--
Madhu

Raymond Toy

unread,
Jan 6, 2009, 9:19:55 AM1/6/09
to
>>>>> "Rob" == Rob Warnock <rp...@rpw3.org> writes:

Rob> George Neuner <gneu...@comcast.net> wrote:
Rob> +---------------
Rob> | Raymond Toy <raymo...@ericsson.com> wrote:
Rob> | >Not sure if this is intentional or not, but this is how CMUCL handles
Rob> | >the test case:
Rob> | >
Rob> | >"abcdefghijk"
Rob> | >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
Rob> | >"abcdefghijk"
Rob> | >
Rob> | >If you look at the actual file, we find:
Rob> | >
Rob> | >abcdefghijk
Rob> | >x
Rob> | >
Rob> | >That is, the write-char actually appended to the file.
Rob> ...
Rob> | and now CMUCL (you [Raymond] didn't say which version)
Rob> | "abcdefghijk"
Rob> | (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
Rob> | "abcdefghijk"
Rob> +---------------

Rob> Doesn't seem to matter: CMUCL-18e, -19a-pre3, -19c, & -19e
Rob> all give the same results.

Thanks for testing so I don't have to. :-)

The version I tested was the 2008-12 snapshot. But I would guess the
behavior holds for all versions of cmucl.

Ray

du...@franz.com

unread,
Jan 6, 2009, 11:06:02 AM1/6/09
to
On Jan 6, 2:36 am, Madhu <enom...@meer.net> wrote:
> * du...@franz.com <7a67d467-df75-47d3-bedd-93173f771...@d36g2000prf.googlegroups.com> :

Is an :io stream an input stream?

The answer is "yes", of course, otherwise an unread-char would not
work on the io stream, because unread-char requires an input stream.

When you open a stream with :io direction the next character in the
input stream is also the next character in the output stream, and
the :if-exists :supersede specification emphasizes that the next
character that would normally have been read will be the next
character to be overwritten if a write operation occurs.

Duane

Madhu

unread,
Jan 6, 2009, 11:51:46 AM1/6/09
to
* du...@franz.com <88a097a4-e577-4bac...@n33g2000pri.googlegroups.com> :
Wrote on Tue, 6 Jan 2009 08:06:02 -0800 (PST):

| On Jan 6, 2:36 am, Madhu <enometh.@meer.net> wrote:
| Duane

|> Wrote on Tue, 6 Jan 2009 01:00:09 -0800 (PST):
|>
|> |> Note UNREAD-CHAR is specified only in situations where the next
|> |> operation on the file is expected to be a READ.
|> |
|> | Not really; the only restrictions are what precedes an unread-char,
|> | namely a peek-char or another unread-char (thus, neither a peek-char
|> | nor an unread-char can portably follow an unread-char).
|>
|> True, there are no restrictions,  But UNREAD-CHAR says:
|>
|> ,----
|> | unread-char places character back onto the front of input-stream so
|> | that it will again be the next character in input-stream.
|> `----
|>
|> I just wanted to stress that this "the next character in input stream"
|> only makes sense for a read operation.  The implication being: if you
|> wanted to get at that last unread character, your next operation better
|> be a read operation (and not, say, a WRITE-CHAR operation which would
|> replace the "next character in the input stream" which can be read.
|
| Is an :io stream an input stream?
|
| The answer is "yes", of course, otherwise an unread-char would not
| work on the io stream, because unread-char requires an input stream.

Yes of course. I was NOT trying to say that UNREAD-CHAR does not apply
to IO streams. I just wanted to make the point that if you call
UNREAD-CHAR and put a character back on the stream (input stream, or io
stream), the only way you can get at that character is by doing a
READ. If the next operation is a WRITE to that stream, that character
which was unread is no longer available for reading, so the UNREAD-CHAR
becomes a no-op. [Am I missing something?]

|
| The answer is "yes", of course, otherwise an unread-char would not
| work on the io stream, because unread-char requires an input stream.
| When you open a stream with :io direction the next character in the
| input stream is also the next character in the output stream, and
| the :if-exists :supersede specification emphasizes that the next
| character that would normally have been read will be the next
| character to be overwritten if a write operation occurs.

Small nit. ITYM :OVERWRITE instead of :SUPERSEDE here.
Using :SUPERSEDE will start without any characters in the io stream.

--
Madhu

Juanjo

unread,
Jan 7, 2009, 8:20:29 AM1/7/09
to
On Jan 6, 4:29 am, George Neuner <gneun...@comcast.net> wrote:
> Another county heard from.  So now we have:
>
>   SBCL 1.0.10
>   "abcdefghijk"
>   (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
>   "abxdefghijk"

and ECL 9.1.0 (CVS/git a few days ago)

> ECL and CLisp perform mostly as Juanjo expects they should.  I would


> argue that they are still wrong because in a perfect world unread-char
> would decrement the file position and cause the #\a to be overwritten
> instead of the #\b.

Please do not falsify my words. I did not say that the current
behavior of ECL is correct! The behavior in the current release is
plain legacy, and as I have stated before, the newest version (CVS/git
sources) behaves just like SBCL, which is what I think most
appropriate (after adding the force-output statement before)

Juanjo

Message has been deleted

Juanjo

unread,
Jan 7, 2009, 12:53:09 PM1/7/09
to
On Jan 7, 4:14 pm, Madhu <enom...@meer.net> wrote:
> * Juanjo
> Wrote on Wed, 7 Jan 2009 05:20:29 -0800 (PST):

> |>
> |>  SBCL 1.0.10
> |>  "abcdefghijk"
> |>  (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)
> |>  "abxdefghijk"
> |
> | and ECL 9.1.0 (CVS/git a few days ago)
> I cannot follow the rationale for choosing this apparently
> bug-compatibile behaviour.  Could you explain it by showing the contents
> of the stream after each operation?

Sorry, not that one. I misread the previous line -- reading news with
google is a pita --. When you add FINISH-OUTPUT, or FORCE-OUTPUT, the
buffers in SBCL are flushed and you get what I think is the right
behavior. The line in the post I mistakenly cited, was _not_ using
FORCE-OUTPUT and hence the output of SBCL was in my opinion
conformant.

In order to settle the issue, the code to compare implementations

(with-open-file (s "foo.txt" :direction :io :if-exists :supersede)
(write-line (print "abcdefghijk") s)
(file-position s 0)
(let (a)
(print (list (file-position s)
(setf a (read-char s))
(file-position s)
(unread-char a s)
(file-position s)

(prog1 (write-char #\x s) (force-output s))


(file-position s)
(read-char s)
(file-position s)
(read-char s))))
(file-position s 0)
(print (read-line s)))

gives

SBCL 1.0.10 and ECL 9.1.0

"abcdefghijk"


(0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
"xbcdefghijk"

GNU CLISP 2.45
"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\c)
"axcdefghijk"

ECL 8.12.0
"abcdefghijk"

Rob Warnock

unread,
Jan 7, 2009, 9:33:12 PM1/7/09
to
Juanjo <juanjose.g...@googlemail.com> wrote:
+---------------

| In order to settle the issue, the code to compare implementations
...[now includes FORCE-OUTPUT]...

| gives
|
| SBCL 1.0.10 and ECL 9.1.0
| "abcdefghijk"
| (0 #\a 1 NIL 0 #\x 1 #\b 2 #\c)
| "xbcdefghijk"
+---------------

CMUCL-19a, -19c, & -19e can be added to that list as well.


-Rob

p.s. The earlier [April 2003] CMUCL-18e requires an additional
:IF-DOES-NOT-EXIST :CREATE option on the WITH-OPEN-FILE
(if "foo.txt" doesn't, in fact, already exists), and also
gives a different result:

"abcdefghijk"
(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b)

"abcdefghijk"

But the -18 series is so old I think we can safely ignore it
as a measure of "what's right".

0 new messages