On 2021-11-14 05:26, dxforth wrote:
> On 3/11/2021 18:48, Ruvim wrote:
>>
>> There's little ground for discussion without your draft for a new
>> specification ;-)
>
> Ok, here's the new spec. An implementation of which I've previously
> posted:
>
>
https://pastebin.com/BT4UQ1Zu
[...]
[...]
Thank you!
In general, your rationale is reasonable. But there are some weak points
to discuss.
One weak point of this specification is that it allows a line terminator
to be broken up into two parts, and then a user should handle this
special case, if he wants to find a line terminator by the following
reads. But a user cannot know whether this case takes place or not.
For example, having a CRLF line terminator, the following scenario is
allowed by your specification:
s" test.tmp" r/w create-file throw value h
s\" abc\r\l" h write-file throw
0. h reposition-file throw
pad 4 h read-line throw . dup . \ prints "1 4"
pad swap dump
\ prints: 61 62 63 0D
So, CR is read, but LF is not read.
The standard READ-LINE is specified in a way that doesn't allow this
scenario (or seems so).
Also, the standard specifies that if an error occurs, then 'u2' is the
number of read characters. But your variant doesn't specify this
important clause (and your reference implementation relies on that when
it ignores ior from 'reposition-file').
Of course, your specification can be corrected somehow in these regards.
>
> The spec is largely backward compatible with Forth-94 READ-LINE with 'n'
> replacing 'flag'. Existing applications should work; the exception being
> those that specifically tested 'flag' = TRUE. Such cases however are
> likely to be rare.
A more critical weak point is backward compatibility.
Except 'flag', the meaning of 'u2' is also changed.
Before that a program compared 'u2' with 'u1' to detect whether a
completed line was read. But with your version such a program will not
work any more.
For example, the following word "readout-file-line-resizable" takes a
buffer and resize it step by step until a completed line is read.
: readout-file-line-resizeable
( addr1 u1 fileid -- addr2 u2 flag ior )
0 {: buf l0 h pos | flag ior :}
l0 dup 2- to l0 3 u< if buf 0 0 -24 exit then
begin
buf dup pos + l0 h read-line ( buf u flag ior )
to ior to flag dup pos + to pos ( buf u )
l0 <> ior or if pos flag ior exit then \ completed line, or err
pos l0 + 2+ resize dup if ( buf ior )
pos true rot exit
then ( buf2 0 ) drop to buf
again
;
Obviously, this word will work incorrectly with your READ-LINE
So I think the proposed modification is too drastic for the old word,
and a new name should be used.
The next question is about the returned values and their meaning.
What do you think if the new word will use 'ior' to indicate that a
buffer is not sufficient to accept a completed line?
Rationale: if I want to read a file line by line, then I probably should
not handle a part of a line as a completed line. And if I don't want to
handle this special case of insufficient buffer, I can just throw an
exception. But when this case is handled, it doesn't matter what to
check for a special value: 'n' or 'ior'.
I can suggest the following variant.
11.6.1.xxxx READ-FILE-LINE
( c-addr1 u1 fileid -- u2 flag ior )
If 'u1' is less than the length of an implementation defined line
terminator sequence of characters, then 'u2' is zero, 'flag' is false,
'ior' is -80.
Otherwise, try to read as most as possible characters from the file
specified by 'fileid' into the data space region at the address
'c-addr1', meeting the following conditions at the end:
1. Not more than 'u1' characters are read from the file.
2. No one character is read after a line terminator sequence is read.
3. Either all characters of a line terminator sequence are read, or no
one character of a possible line terminator sequence is read.
If something was read (and the file position was changed), then 'flag'
is true; otherwise 'flag' is false, nothing was read from the file (and
the file position was not changed).
If some underlying I/O operation was not successful, then 'ior' is an
implementation-defined I/O result code, 'u2' is the number of characters
read, and the conditions 2 and 3 may be not met.
Otherwise, if a line terminator sequence was read, then 'ior' is zero,
'u2' is the number of characters read excluding the line terminator
sequence.
Otherwise, if the end of file was reached before 'u1' characters were
read, then 'ior' is zero, 'u2' is the number of characters read.
Otherwise, ior is -80, 'u2' is the number of characters read.
The table of the result parameters combinations:
'ior' 'flag' 'u2'
0 -1 any -- a completed line was read
0 0 0 -- the end of file is reached
-80 0 0 -- the buffer is less then a line terminator
-80 -1 >0 -- the buffer is insufficient
<>0 0 0 -- an exception occurred, nothing was read
<>0 -1 >0 -- an exception occurred during reading
Other combinations are not allowed.
A lazy implementation (just a proof of the concept) is following.
-80 constant ior-insufficient-buffer
: move-file-position ( n fileid -- ior )
over 0= if drop exit then
dup >r file-position dup if nip nip nip rdrop exit then drop
rot s>d d+ r> reposition-file
;
: read-file-line-lt ( addr1 u1 fileid sd.line-term -- u2 flag ior )
{: buf u1 h lta ltl | u ior :}
ltl 0= if 0 0 -71 exit then
u1 ltl u< if 0 false ior-insufficient-buffer exit then
buf u1 h read-file to ior to u
u 0= ior or if u dup 0<> ior exit then
buf u lta ltl search 0= if nip ( u )
dup u1 u< if true 0 exit then \ EOF is reached
ltl 1- tuck - swap ( buf u2 n.unshift )
ior-insufficient-buffer to ior
else ( a3 u3 )
ltl - swap buf - swap
then ( u2 n.unshift )
negate h move-file-position dup if ( u2 ior )
nip u true rot exit \ u2 is synced with the file position
then drop ( u2 ) true ior
;
: read-file-line ( addr1 u1 fileid -- u2 flag ior )
s\" \n" read-file-line-lt
;
Comments are welcome.
--
Ruvim