Given the following piece of code, which behavior do you find more correct of the ones shown below?
(with-open-file (s "foo.txt" :direction :io :if-exists :supersede) (write-line (print "abcdefghijk") s) (file-position s 0) (let (a) (print (list (file-position s) (setf a (read-char s)) (file-position s) (unread-char a s) (file-position s) (write-char #\x s) (file-position s) (read-char s) (file-position s) (read-char s)))) (file-position s 0) (print (read-line s)))
Basically the problem relates to the value of file-position and the actual place where data is written right after an unread-char operation. This behavior also affects the behavior of an implementation when reading, for instance, a Return + Linefeed sequence.
Is this simply undefined? I found the section on streams in the ANSI specification to be, say, lacking in all aspects, hehe.
This is SBCL 1.0.10, an implementation of ANSI Common Lisp. "abcdefghijk" (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) "abxdefghijk"
Juanjo <juanjose.garciarip...@googlemail.com> writes: > Given the following piece of code, which behavior do you find more > correct of the ones shown below?
> Basically the problem relates to the value of file-position and the > actual place where data is written right after an unread-char > operation. This behavior also affects the behavior of an > implementation when reading, for instance, a Return + Linefeed > sequence.
> Is this simply undefined? I found the section on streams in the ANSI > specification to be, say, lacking in all aspects, hehe.
> This is SBCL 1.0.10, an implementation of ANSI Common Lisp. > "abcdefghijk" > (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) > "abxdefghijk"
Well, the note of UNREAD-CHAR says that it's just a Q&D hack to let the lisp reader look ahead one character. The lisp reader doesn't write the stream it reads, and doesn't use file-position since it will work on streams with no attached file.
So clearly, when mixin file-position and write-char with unread-char, you're out of bounds of unread-char. This is simply undefined, and anything goes.
Personnaly, I would have a preference for this result:
(0 #\a 1 nil 0 #\x 1 #\b 2 #\c) "xbcdefghijk"
But note that when you mix read-char, write-char, and file-position (implicitly thru unread-char) on streams encoded in UTF-8 (or some other variable width character encoding), you may get very surprizing results.
What if the stream contained "áeìoü"? File positions after reading each character may be: 0 2 3 5 6 8, or you could even have bigger increments with normalized forms.
So unreading a character, it may be difficult to find the correct file position.
And then, writting over a character that needs a different number of bytes should prevent further reading characters.
So your code is really nasty, you can't complain for the implementation variability.
Juanjo <juanjose.garciarip...@googlemail.com> wrote: > Given the following piece of code, which behavior do you find more > correct of the ones shown below?
> Basically the problem relates to the value of file-position and the > actual place where data is written right after an unread-char > operation. This behavior also affects the behavior of an > implementation when reading, for instance, a Return + Linefeed > sequence.
> Is this simply undefined? I found the section on streams in the ANSI > specification to be, say, lacking in all aspects, hehe.
> This is SBCL 1.0.10, an implementation of ANSI Common Lisp. > "abcdefghijk" > (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) > "abxdefghijk"
I would add FINISH-OUTPUT after writing to a stream, if you want to see effects. That's a common error in portable code: Writing something to a (file/window) stream and assuming the output is done immediately.
On Jan 1, 1:51 pm, Rainer Joswig <jos...@lisp.de> wrote:
> I would add FINISH-OUTPUT after writing to a stream, > if you want to see effects. That's a common error > in portable code: Writing something to a (file/window) stream > and assuming the output is done immediately.
Ok, adding FORCE-OUTPUT or FINISH-OUTPUT does the same in the three implementations. SBCL now seems to get a more sensible result, but CLISP is really doing something weird: file-position is 1 after writing #\x
This is SBCL 1.0.10, an implementation of ANSI Common Lisp. "abcdefghijk" (0 #\a 1 NIL 0 #\x 1 #\b 2 #\c) "xbcdefghijk"
Welcome to GNU CLISP 2.45 (2008-04-04) <http://clisp.cons.org/> "abcdefghijk" (0 #\a 1 NIL 0 #\x 1 #\a 2 #\c) "axcdefghijk"
On Jan 1, 1:43 pm, p...@informatimago.com (Pascal J. Bourguignon) wrote:
> So clearly, when mixin file-position and write-char with unread-char, > you're out of bounds of unread-char. This is simply undefined, and > anything goes.
> Personnaly, I would have a preference for this result:
> (0 #\a 1 nil 0 #\x 1 #\b 2 #\c) > "xbcdefghijk"
Seems also the most sensible outcome to me. I would like to get this in ECL as well.
> But note that when you mix read-char, write-char, and file-position > (implicitly thru unread-char) on streams encoded in UTF-8 (or some other > variable width character encoding), you may get very surprizing results.
There is no problem using file positions with multibyte encodings. As far as you pass to FILE-POSITION only values that were output by FILE- POSITION itself, the result should be the one you expect. This is the reason why I think it is important to get consistent results when querying file positions after READ/UNREAD.
Just in case someone cares, I got to this discussion when implementing a reader for CR+LF sequences. I faced the problem of finding CR + (not a LF character). In this case flexi streams outputs CR and keeps the next character. It turns out that this cannot be implemented using UNREAD-CHAR because the user should be able to use UNREAD-CHAR on the # \Return character. Anyway, I will eventually figure out how to implement this :-)
Juanjo <juanjose.garciarip...@googlemail.com> wrote: > On Jan 1, 1:51 pm, Rainer Joswig <jos...@lisp.de> wrote: > > I would add FINISH-OUTPUT after writing to a stream, > > if you want to see effects. That's a common error > > in portable code: Writing something to a (file/window) stream > > and assuming the output is done immediately.
> Ok, adding FORCE-OUTPUT or FINISH-OUTPUT does the same in the three > implementations. SBCL now seems to get a more sensible result, but > CLISP is really doing something weird: file-position is 1 after > writing #\x
> This is SBCL 1.0.10, an implementation of ANSI Common Lisp. > "abcdefghijk" > (0 #\a 1 NIL 0 #\x 1 #\b 2 #\c) > "xbcdefghijk"
FORCE-OUTPUT and FINISH-OUTPUT are similar, but FORCE-OUTPUT does not need to wait for completion. If you really want reliable output (output done and no error) use FINISH-OUTPUT.
On Jan 1, 2:54 pm, Juanjo <juanjose.garciarip...@googlemail.com> wrote:
> On Jan 1, 1:43 pm, p...@informatimago.com (Pascal J. Bourguignon) > wrote:
> > So clearly, when mixin file-position and write-char with unread-char, > > you're out of bounds of unread-char. This is simply undefined, and > > anything goes.
> > Personnaly, I would have a preference for this result:
> Basically the problem relates to the value of file-position and the > actual place where data is written right after an unread-char > operation. This behavior also affects the behavior of an > implementation when reading, for instance, a Return + Linefeed > sequence.
> Is this simply undefined? I found the section on streams in the ANSI > specification to be, say, lacking in all aspects, hehe.
> This is SBCL 1.0.10, an implementation of ANSI Common Lisp. > "abcdefghijk" > (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) > "abxdefghijk"
The Scieneer CL returns the following: "abcdefghijk" (0 #\a 1 nil 0 #\x 1 #\b 2 #\c) "xbcdefghijk"
Also for a multi-byte encoding (e.g. UTF-16): "áeìoü" (0 #\U00E1 1 nil 0 #\x nil 1 #\U00E1 2 #\e) "xáeìoü"
The Scieneer CL defines the file-position of character streams as the number of characters read. This allows some transforms of the underlying file, such as CR-LF translation, or a change to the character encoding, or file compression or encryption, without a change to the file positions. It also allows a character file to be loaded into a CL string and for the file-positions to remain consistent when reading from a string input stream - file-position is usable on string input streams in the Scieneer CL. Further it supports block encoded streams which could not be supported using the byte position because ANSI CL does require a monotonic increase in the file position with each character.
Yes, a byte offset may be returned, but only for file encodings that have a monotonic increase in position for each character. This may not be the case for block encoded files, such as compressed or block encrypted files, so this is only a conforming ANSI CL implementation to the limited extent of the file encodings supported and this definition of file-position restricts extensions.
d...@scieneer.com writes: > On Jan 2, 1:21 am, "Tobias C. Rittweiler" <t...@freebits.de.invalid> > wrote: >> Juanjo <juanjose.garciarip...@googlemail.com> writes: >> > There is no problem using file positions with multibyte encodings.
>> FILE-POSITION may legitimately return a byte-offset, not a >> character-offset even for character streams.
> Yes, a byte offset may be returned, but only for file encodings that > have a monotonic increase in position for each character. This may > not be the case for block encoded files, such as compressed or block > encrypted files, so this is only a conforming ANSI CL implementation > to the limited extent of the file encodings supported and this > definition of file-position restricts extensions.
My understanding is that when CLHS says "monotonic increase" it is not a strict increase, the increase may be 0.
For example, on a file encoded in base-64, I think (loop repeat 10 do (read-char s) collect (file-position s)) could return (1 1 1 1 4 4 4 4 7 7)
> d...@scieneer.com writes: > > On Jan 2, 1:21 am, "Tobias C. Rittweiler" <t...@freebits.de.invalid> > > wrote: > >> Juanjo <juanjose.garciarip...@googlemail.com> writes: > >> > There is no problem using file positions with multibyte encodings.
> >> FILE-POSITION may legitimately return a byte-offset, not a > >> character-offset even for character streams.
> > Yes, a byte offset may be returned, but only for file encodings that > > have a monotonic increase in position for each character. This may > > not be the case for block encoded files, such as compressed or block > > encrypted files, so this is only a conforming ANSI CL implementation > > to the limited extent of the file encodings supported and this > > definition of file-position restricts extensions.
> My understanding is that when CLHS says "monotonic increase" it is not a > strict increase, the increase may be 0.
> For example, on a file encoded in base-64, I think > (loop repeat 10 do (read-char s) collect (file-position s)) > could return (1 1 1 1 4 4 4 4 7 7)
> -- > __Pascal Bourguignon__
This interpretation would lead to the file position becoming ambiguous. A conforming implementation can simply return 'nil if it does not support file-position for a particular encoding, however there are lots of useful block encodings so this is rather limiting.
> The Scieneer CL defines the file-position of character streams as the > number of characters read.
How do you handle seeking a file-position for multibyte encodings? That becomes very inefficient unless you keep track of the positions at which characters were read -- in other words, it implies scanning the whole file forwards or backwards.
> > The Scieneer CL defines the file-position of character streams as the > > number of characters read.
> How do you handle seeking a file-position for multibyte encodings? > That becomes very inefficient unless you keep track of the positions > at which characters were read -- in other words, it implies scanning > the whole file forwards or backwards.
> Juanjo
Yes, variable length multibyte encodings may require scanning the file to find a position and this is less efficient than positioning to a byte position in a file. However defining the character file position as the number of characters read has many advantages and can often lead to much improved performance. For example it allows character streams to be buffered in units of characters, buffering can significantly improve stream performance, and positioning within the buffer is fast. The definition allows a file to be firstly loaded into a CL string and a string input stream used within which positioning is very fast. The Scieneer CL supports encapsulated character conversion streams so it is possible to work with both the byte stream, quickly changing the byte position when possible, and encapsulate the byte stream with a character stream after positioning.
>Basically the problem relates to the value of file-position and the >actual place where data is written right after an unread-char >operation. This behavior also affects the behavior of an >implementation when reading, for instance, a Return + Linefeed >sequence.
>Is this simply undefined? I found the section on streams in the ANSI >specification to be, say, lacking in all aspects, hehe.
>This is SBCL 1.0.10, an implementation of ANSI Common Lisp. >"abcdefghijk" >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) >"abxdefghijk"
>Welcome to GNU CLISP 2.45 (2008-04-04) <http://clisp.cons.org/> >"abcdefghijk" >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\c) >"axcdefghijk"
I think the HyperSpec is lacking some detail that would clarify what is to be expected. Nonetheless ...
(open ... :direction :io) returns a "bidirectional file stream". Nothing in particular is said about whether a "file stream" has 1 or 2 position markers, but bidirectional streams are definitively said to be composed of 2 streams. Logically each stream should have its own position marker and nowhere is it said that the markers are tied when both streams are pointing to the same data object.
So I would say it's correct for the read and write positions to be independent - if you want to be certain where you are you need to track the position and explicitly set when changing between reading and writing.
This is the way stream IO is handled by many (most?) other languages as well. It might make some sense to tie the markers in the case of files, but then the stream abstraction would be broken because files would behave differently from other data objects.
On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote:
> So I would say it's correct for the read and write positions to be > independent - if you want to be certain where you are you need to > track the position and explicitly set when changing between reading > and writing.
This is impossible following the Hyperspec because the FILE-POSITION function does not allow you to select which position marker (if there were two) you are updating.
> This is the way stream IO is handled by many (most?) other languages > as well.
Not the case of C derivatives, AFAIK. Again, there is only one seek and tell function and it does not allow you to select whether you want the read or write position.
> <juanjose.garciarip...@googlemail.com> wrote: > >Given the following piece of code, which behavior do you find more > >correct of the ones shown below?
> >Basically the problem relates to the value of file-position and the > >actual place where data is written right after an unread-char > >operation. This behavior also affects the behavior of an > >implementation when reading, for instance, a Return + Linefeed > >sequence.
> >Is this simply undefined? I found the section on streams in the ANSI > >specification to be, say, lacking in all aspects, hehe.
> >This is SBCL 1.0.10, an implementation of ANSI Common Lisp. > >"abcdefghijk" > >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) > >"abxdefghijk"
> I think the HyperSpec is lacking some detail that would clarify what > is to be expected. Nonetheless ...
> (open ... :direction :io) returns a "bidirectional file stream". > Nothing in particular is said about whether a "file stream" has 1 or 2 > position markers, but bidirectional streams are definitively said to > be composed of 2 streams.
This is different from a two-way-stream, which is indeed a composite stream and thus keeps two pointers, one for reading and one for writing. But FILE-POSITION does not work on composite streams.
<juanjose.garciarip...@googlemail.com> wrote: >On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: >> So I would say it's correct for the read and write positions to be >> independent - if you want to be certain where you are you need to >> track the position and explicitly set when changing between reading >> and writing.
>This is impossible following the Hyperspec because the FILE-POSITION >function does not allow you to select which position marker (if there >were two) you are updating.
You misunderstand.
I'm not claiming that a file has 2 position markers (that's beyond the spec). What I am saying is that it seems clear that a bidirectional stream *may* have both read and write position markers and that the 2 markers are not necessarily tied together. That makes sense because a stream can be bound to things other than files, e.g., sockets, pipes, IO channels, etc. for which independent read and write makes sense.
In particular the Hyperspec does not say that a bidirectional stream acts in any way differently from having independent input and output streams bound to the same file object. It only specifies that both input and output functionality will be available through a single stream object.
AFAICS, a bi-stream could just encapsulate separate input and output streams and transparently change an underlying file marker to allow reading and writing from different places. Such an implementation is unlikely but I don't see it being prohibited. The OP's message showed differing results from various implementations suggesting that either some are buggy or there is not agreement on how a bi-stream should behave on a file.
So the only way to be certain where you are is to independently track position in your application and (re)set it as necessary before each read or write call.
>> This is the way stream IO is handled by many (most?) other languages >> as well.
>Not the case of C derivatives, AFAIK. Again, there is only one seek >and tell function and it does not allow you to select whether you want >the read or write position.
Well C doesn't have streams per se ... it has only buffered files and the FILE structure has only one position marker.
In C++ seek and tell are specifically guaranteed only to work on fstreams - ie., on files - not on any kind of stream. Secondly, if you look closely at the implementation, fstream is subclassed from one of istream or ostream (usually ostream), it creates the counterpart stream and ties the 2 position markers so their view of the underlying file is unified.
Readers and Writers in C# and Java are handled in much the same way. The file object creates the Reader or Writer and can control whether they see a unified position. It's a little more convoluted because both C# and Java allow composable filters in streams and so the notion of position may not translate between Reader and Writer, but usually the end result is similar to C++'s implementation.
And in each of these languages you can easily create your own stream type that would not tie the positions. In C# and Java, for example, Reader and Writer streams on a socket are not tied.
Which brings us back to whether Lisp's bidirectional streams are a unique entity or are similarly just an object encapsulating 2 streams and what guarantees are there (or should there be).
> <juanjose.garciarip...@googlemail.com> wrote: > >On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: > >> So I would say it's correct for the read and write positions to be > >> independent - if you want to be certain where you are you need to > >> track the position and explicitly set when changing between reading > >> and writing.
> >This is impossible following the Hyperspec because the FILE-POSITION > >function does not allow you to select which position marker (if there > >were two) you are updating.
> You misunderstand.
> I'm not claiming that a file has 2 position markers (that's beyond the > spec). What I am saying is that it seems clear that a bidirectional > stream *may* have both read and write position markers and that the 2 > markers are not necessarily tied together. That makes sense because a > stream can be bound to things other than files, e.g., sockets, pipes, > IO channels, etc. for which independent read and write makes sense.
I am afraid this is not the case, at least for binary streams: "For a binary file, every read-byte or write-byte operation increases the file position by 1." (http://www.lispworks.com/documentation/HyperSpec/ Body/f_file_p.htm) In my understanding, that means the read and write position markers must be tied together, or not make sense at -- which is permitted by FILE-POSITION returning NIL, as it would be the case of sockets.
<juanjose.garciarip...@googlemail.com> wrote: >On Jan 5, 2:58 pm, George Neuner <gneun...@comcast.net> wrote: >> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo
>> <juanjose.garciarip...@googlemail.com> wrote: >> >On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: >> >> So I would say it's correct for the read and write positions to be >> >> independent - if you want to be certain where you are you need to >> >> track the position and explicitly set when changing between reading >> >> and writing.
>> >This is impossible following the Hyperspec because the FILE-POSITION >> >function does not allow you to select which position marker (if there >> >were two) you are updating.
>> You misunderstand.
>> I'm not claiming that a file has 2 position markers (that's beyond the >> spec). What I am saying is that it seems clear that a bidirectional >> stream *may* have both read and write position markers and that the 2 >> markers are not necessarily tied together. That makes sense because a >> stream can be bound to things other than files, e.g., sockets, pipes, >> IO channels, etc. for which independent read and write makes sense.
>I am afraid this is not the case, at least for binary streams: "For a >binary file, every read-byte or write-byte operation increases the >file position by 1." (http://www.lispworks.com/documentation/HyperSpec/ >Body/f_file_p.htm) In my understanding, that means the read and write >position markers must be tied together, or not make sense at -- which >is permitted by FILE-POSITION returning NIL, as it would be the case >of sockets.
I agree that a cursory reading would tend to suggest that to a programmer familiar with IO in other languages, but the Hyperspec is written in English rather than a formal language.
Turning my hat sideways and playing language lawyer, I would say that, by itself, the sentence "... every read-byte or write-byte operation increases the file position by 1." only tells you (one of) the effects of those particular named operations. It says nothing about what a "file position" is, where it starts from or what happens to it if operations other than read or write are performed. It does not say that read and write share the position, nor does it explicitly say that you can perform multiple successive reads or writes, or that reads and writes can be alternated.
Of course, some of those issues are covered elsewhere in the Hyperspec but my point is that you (usually) can't just pick a passage out of the Hyperspec to prove your point ... it's typically more involved than that.
>>>>> "George" == George Neuner <gneun...@comcast.net> writes:
George> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo George> <juanjose.garciarip...@googlemail.com> wrote:
>> On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: >>> So I would say it's correct for the read and write positions to be >>> independent - if you want to be certain where you are you need to >>> track the position and explicitly set when changing between reading >>> and writing. >> >> This is impossible following the Hyperspec because the FILE-POSITION >> function does not allow you to select which position marker (if there >> were two) you are updating.
George> You misunderstand.
George> I'm not claiming that a file has 2 position markers (that's beyond the George> spec). What I am saying is that it seems clear that a bidirectional George> stream *may* have both read and write position markers and that the 2 George> markers are not necessarily tied together. That makes sense because a George> stream can be bound to things other than files, e.g., sockets, pipes, George> IO channels, etc. for which independent read and write makes sense.
George> In particular the Hyperspec does not say that a bidirectional stream George> acts in any way differently from having independent input and output George> streams bound to the same file object. It only specifies that both George> input and output functionality will be available through a single George> stream object.
Not sure if this is intentional or not, but this is how CMUCL handles the test case:
<raymond....@ericsson.com> wrote: >>>>>> "George" == George Neuner <gneun...@comcast.net> writes:
> George> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo > George> <juanjose.garciarip...@googlemail.com> wrote:
> >> On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: > >>> So I would say it's correct for the read and write positions to be > >>> independent - if you want to be certain where you are you need to > >>> track the position and explicitly set when changing between reading > >>> and writing.
> >> This is impossible following the Hyperspec because the FILE-POSITION > >> function does not allow you to select which position marker (if there > >> were two) you are updating.
> George> You misunderstand.
> George> I'm not claiming that a file has 2 position markers (that's beyond the > George> spec). What I am saying is that it seems clear that a bidirectional > George> stream *may* have both read and write position markers and that the 2 > George> markers are not necessarily tied together. That makes sense because a > George> stream can be bound to things other than files, e.g., sockets, pipes, > George> IO channels, etc. for which independent read and write makes sense.
> George> In particular the Hyperspec does not say that a bidirectional stream > George> acts in any way differently from having independent input and output > George> streams bound to the same file object. It only specifies that both > George> input and output functionality will be available through a single > George> stream object.
>Not sure if this is intentional or not, but this is how CMUCL handles >the test case:
(with-open-file (s "foo.txt" :direction :io :if-exists :supersede) (write-line (print "abcdefghijk") s) (file-position s 0) (let (a) (print (list (file-position s) (setf a (read-char s)) (file-position s) (unread-char a s) (file-position s) (write-char #\x s) (file-position s) (read-char s) (file-position s) (read-char s)))) (file-position s 0) (print (read-line s)))
we have 3 different results from 5 implementations.
ECL and CLisp perform mostly as Juango expects they should. I would argue that they are still wrong because in a perfect world unread-char would decrement the file position and cause the #\a to be overwritten instead of the #\b.
Since SBCL does the same but is off-by-1, first guess is it has a bug.
However CMUCL's and Corman's wildly different, yet identical, results - Corman also appends #\x to the file - show that the issue of bidirectional stream positioning on a file is far from settled.
FWIW: the code works as desired (overwrites #\a) in most of the Lisps mentioned (I dont have ECL) iff you call file-position to set the file marker whenever you change between reading and writing.
(write-line (print "abcdefghijk") s) (file-position s 0) (let (a) (print (list (file-position s) (setf a (read-char s)) (file-position s) (unread-char a s) (file-position s (file-position s)) (write-char #\x s) (file-position s (file-position s)) (read-char s) (file-position s) (read-char s)))) (file-position s 0) (print (read-line s)))
| |> George> I'm not claiming that a file has 2 position markers |> George> (that's beyond the spec). What I am saying is that it |> George> seems clear that a bidirectional stream *may* have both |> George> read and write position markers and that the 2 markers are |> George> not necessarily tied together. That makes sense because a |> George> stream can be bound to things other than files, e.g., |> George> sockets, pipes, IO channels, etc. for which independent |> George> read and write makes sense. |> |> George> In particular the Hyperspec does not say that a |> George> bidirectional stream acts in any way differently from |> George> having independent input and output streams bound to the |> George> same file object. It only specifies that both input and |> George> output functionality will be available through a single |> George> stream object. |> |>Not sure if this is intentional or not, but this is how CMUCL handles |>the test case: |> |>"abcdefghijk" |>(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) |>"abcdefghijk" |> |>If you look at the actual file, we find: |> |>abcdefghijk |>x |> |>That is, the write-char actually appended to the file.
I believe This is an artifact of buffering reads and writes. CMUCL does not flush its buffers after interleaved READ and WRITE operations.
Calling FINISH-OUTPUT after the WRITE-CHAR operation gives the expected result of replacing the #\a with an #\x.
Both Allegro acl81_express and Lispworks personal 5.1.1 behave still differently from all the above [and SCL] (and conform to the expectation that #\a is overwritten)
| Given the OP's code: | | (with-open-file (s "foo.txt" :direction :io :if-exists :supersede) | (write-line (print "abcdefghijk") s) | (file-position s 0) | (let (a) | (print (list (file-position s) | (setf a (read-char s)) | (file-position s) | (unread-char a s) | (file-position s) | (write-char #\x s) | (file-position s) | (read-char s) | (file-position s) | (read-char s)))) | (file-position s 0) | (print (read-line s))) | | we have 3 different results from 5 implementations. | ECL and CLisp perform mostly as Juango expects they should. I would | argue that they are still wrong because in a perfect world unread-char | would decrement the file position and cause the #\a to be overwritten | instead of the #\b. | Since SBCL does the same but is off-by-1, first guess is it has a bug. | | However CMUCL's and Corman's wildly different, yet identical, results | - Corman also appends #\x to the file - show that the issue of | bidirectional stream positioning on a file is far from settled.
I'm not sure that is the case. I believe there are two issues here, One is does UNREAD-CHAR change affect position on an input stream. The other is how the lisp implementation implements buffering on the underlying stream. Both of these are stated without regard to the stream being bidirectional or otherwise.
Note UNREAD-CHAR is specified only in situations where the next operation on the file is expected to be a READ. It also implies a buffer is used.
I suspect CMUCL and Corman use the same buffering strategy.
| FWIW: the code works as desired (overwrites #\a) in most of the Lisps | mentioned (I dont have ECL) iff you call file-position to set the file | marker whenever you change between reading and writing.
I believe the same effect woule be achieved with conforming code if you forced output inbetween read and write operations to the stream via FINISH-OUTPUT.
i.e. if you replaced (file-position s (file-position s)) with (finish-output s) in the following code
| (write-line (print "abcdefghijk") s) | (file-position s 0) | (let (a) | (print (list | (file-position s) | (setf a (read-char s)) | (file-position s) | (unread-char a s) | (file-position s (file-position s)) | (write-char #\x s) | (file-position s (file-position s)) | (read-char s) | (file-position s) | (read-char s)))) | (file-position s 0) | (print (read-line s))) | | "abcdefghijk" | (0 #\a 1 NIL 0 #\x 1 #\b 2 #\c) | "xbcdefghijk" | | | The results seem to show that reading and writing on a bidirectional | stream *do* have different ideas of what the current position is.
The spec mentions buffering issues when describing FINISH-OUTPUT CLEAR-OUTPUT and FORCE-OUTPUT but says "The precise actions of these functions are implementation-dependent."
> > George> On Mon, 5 Jan 2009 01:10:31 -0800 (PST), Juanjo > > George> <juanjose.garciarip...@googlemail.com> wrote:
> > >> On Jan 4, 4:25 am, George Neuner <gneun...@comcast.net> wrote: > > >>> So I would say it's correct for the read and write positions to be > > >>> independent - if you want to be certain where you are you need to > > >>> track the position and explicitly set when changing between reading > > >>> and writing.
> > >> This is impossible following the Hyperspec because the FILE-POSITION > > >> function does not allow you to select which position marker (if there > > >> were two) you are updating.
> > George> You misunderstand.
> > George> I'm not claiming that a file has 2 position markers (that's beyond the > > George> spec). What I am saying is that it seems clear that a bidirectional > > George> stream *may* have both read and write position markers and that the 2 > > George> markers are not necessarily tied together. That makes sense because a > > George> stream can be bound to things other than files, e.g., sockets, pipes, > > George> IO channels, etc. for which independent read and write makes sense.
> > George> In particular the Hyperspec does not say that a bidirectional stream > > George> acts in any way differently from having independent input and output > > George> streams bound to the same file object. It only specifies that both > > George> input and output functionality will be available through a single > > George> stream object.
> >Not sure if this is intentional or not, but this is how CMUCL handles > >the test case:
> ECL and CLisp perform mostly as Juango expects they should. I would > argue that they are still wrong because in a perfect world unread-char > would decrement the file position and cause the #\a to be overwritten > instead of the #\b.
> Since SBCL does the same but is off-by-1, first guess is it has a bug.
> However CMUCL's and Corman's wildly different, yet identical, results > - Corman also appends #\x to the file - show that the issue of > bidirectional stream positioning on a file is far from settled.
> FWIW: the code works as desired (overwrites #\a) in most of the Lisps > mentioned (I dont have ECL) iff you call file-position to set the file > marker whenever you change between reading and writing.
> The results seem to show that reading and writing on a bidirectional > stream *do* have different ideas of what the current position is.
> George
Allegro CL's concept is that a file stream is a "single channel" stream, so that a bidirectional file stream nevertheless has only one file position, and that file position is reconciled when directions switch (with any accompanying flushing). Setting the file-position to the file-position should not be necessary in a file stream.
CL-USER(1): (with-open-file (s "foo.txt" :direction :io :if- exists :supersede) (write-line (print "abcdefghijk") s) (file-position s 0) (let (a) (print (list (file-position s) (setf a (read-char s)) (file-position s) (unread-char a s) (file-position s) (write-char #\x s) (file-position s) (read-char s) (file-position s) (read-char s)))) (file-position s 0) (print (read-line s)))
George Neuner <gneun...@comcast.net> wrote: +--------------- | Raymond Toy <raymond....@ericsson.com> wrote: | >Not sure if this is intentional or not, but this is how CMUCL handles | >the test case: | > | >"abcdefghijk" | >(0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) | >"abcdefghijk" | > | >If you look at the actual file, we find: | > | >abcdefghijk | >x | > | >That is, the write-char actually appended to the file. ... | and now CMUCL (you [Raymond] didn't say which version) | "abcdefghijk" | (0 #\a 1 NIL 0 #\x 1 #\a 2 #\b) | "abcdefghijk" +---------------
Doesn't seem to matter: CMUCL-18e, -19a-pre3, -19c, & -19e all give the same results.
-Rob
----- Rob Warnock <r...@rpw3.org> 627 26th Avenue <URL:http://rpw3.org/> San Mateo, CA 94403 (650)572-2607