Proposed new READ-LINE specification

dxforth

unread,

Nov 13, 2021, 9:26:55 PM11/13/21

to

On 3/11/2021 18:48, Ruvim wrote:
>
> There's little ground for discussion without your draft for a new
> specification ;-)

Ok, here's the new spec. An implementation of which I've previously posted:

https://pastebin.com/BT4UQ1Zu

The spec is largely backward compatible with Forth-94 READ-LINE with 'n'
replacing 'flag'. Existing applications should work; the exception being
those that specifically tested 'flag' = TRUE. Such cases however are
likely to be rare.

The key feature of the spec is the simplified buffer requirements compared
to Forth-94. It also guarantees users can test whether a completed or
partial line was received. Modifying READ-LINE on my system to the new
spec required the addition of one word. I expect it to be equally trivial
for other systems.

11.6.1.xxxx READ-LINE
FILE
( c-addr u1 fileid -- u2 n ior )

Read the next line from the file specified by fileid into memory at the address
c-addr. At most u1 characters are read which may include up to two implementation-
defined line-terminating characters at the end of the line. The line buffer
provided by c-addr shall be at least u1 characters and a minimum of 2.

If the operation succeeded, ior is zero and u2 is the number of characters
actually read, not including the line terminator. n is -1 if a line terminator
was received, or 1 otherwise.

If the operation is initiated when the value returned by FILE-POSITION is equal
to the value returned by FILE-SIZE for the file identified by fileid, n is zero,
ior is zero, and u2 is zero. If ior is non-zero, an exception occurred during
the operation and ior is the implementation-defined I/O result code.

An ambiguous condition exists if the operation is initiated when the value
returned by FILE-POSITION is greater than the value returned by FILE-SIZE for
the file identified by fileid, or if the requested operation attempts to read
portions of the file not written.

At the conclusion of the operation, FILE-POSITION returns the next file position
after the last character read.

NN

unread,

Nov 14, 2021, 7:36:20 AM11/14/21

to

Could you do a quick summary of whats different
just in case I have missed something obvious?

I am not sure I can spot whats changed.

(1) The key feature of the spec is the simplified buffer requirements compared to Forth-94.

I dont follow how its simplified.

(2) n is -1 if a line terminator was received, or 1 otherwise.

But if its only 2 values, is it not just a boolean aka flag ?

dxforth

unread,

Nov 14, 2021, 8:42:42 AM11/14/21

to

On 14/11/2021 23:36, NN wrote:
>
> Could you do a quick summary of whats different
> just in case I have missed something obvious?
>
> I am not sure I can spot whats changed.
>
> (1) The key feature of the spec is the simplified buffer requirements compared to Forth-94.
>
> I dont follow how its simplified.

It shifts the burden from the user to READ-LINE.

Under ANS, the user had to remember to add 2 to u1 when assigning the buffer size.
If the user forgot or didn't know, the app might work on some systems and fail on
others due to buffer overwrite. In the revised spec u1 (a minimum of 2) is now
the buffer length and READ-LINE must not exceed it.

>
> (2) n is -1 if a line terminator was received, or 1 otherwise.
>
> But if its only 2 values, is it not just a boolean aka flag ?

Three values. When end-of-file is reached n=0 :

dxforth

unread,

Nov 14, 2021, 10:39:56 AM11/14/21

to

On 14/11/2021 23:36, NN wrote:
>

> (2) n is -1 if a line terminator was received, or 1 otherwise.
>
> But if its only 2 values, is it not just a boolean aka flag ?

The second paragraph in the spec could have been clearer e.g.

"If the operation succeeded, ior is zero, n is non-zero and u2 is the number of

characters actually read, not including the line terminator. n is -1 if a line

terminator was received, or 1 if not received."

NN

unread,

Nov 14, 2021, 4:45:05 PM11/14/21

to

Thank you for that explanation.

I agree it was a pain to remember to add the 2. But surely anyone
writing in forth would look up the function for guidance.
If someone is not familiar with forth why would they attempt to guess ?

I also wonder if it should be a minimum of 3 rather than 2 because
because you want the ability to read at least 1 char.

Secondly ,

( forth standard suggests - )
flag= true - not reached EOF
flag=false ==> EOF ( end-of-file)
if u1=u2 there more to read for that line
if u2 < u1 you read the line up to EOL ( end-of-line)

And your spec shifts that to 'n'
-1 = EOL
0 = EOF
1 = otherwise

They are both accomplishing the same thing so I am undecided which is better.
Your's does make it clearer when you have reached EOL and EOF.

dxforth

unread,

Nov 14, 2021, 9:00:17 PM11/14/21

to

On 15/11/2021 08:45, NN wrote:
>
> I agree it was a pain to remember to add the 2. But surely anyone
> writing in forth would look up the function for guidance.
> If someone is not familiar with forth why would they attempt to guess ?

Let's examine it from the user's perspective. It's not a guess to call
a read function with the size of the buffer one has assigned for the
task - it's expected. It's a convention that READ-FILE follows, as does
fgets in C, and presumably every other language.

Before asking users to do something that breaks convention and risks users
screw-ups, they deserve to be told why. ANS offered no explanation, nor
do we know there even was a rationale.

>
> I also wonder if it should be a minimum of 3 rather than 2 because
> because you want the ability to read at least 1 char.
>
> Secondly ,
>
> ( forth standard suggests - )
> flag= true - not reached EOF
> flag=false ==> EOF ( end-of-file)
> if u1=u2 there more to read for that line
> if u2 < u1 you read the line up to EOL ( end-of-line)
>
> And your spec shifts that to 'n'
> -1 = EOL
> 0 = EOF
> 1 = otherwise
>
> They are both accomplishing the same thing so I am undecided which is better.
> Your's does make it clearer when you have reached EOL and EOF.

ANS 'u2 < u1' statement is open to interpretation. Again, if it were
ANS intent it should be used to detect EOL, I would have expected them
to explicitly say so. For most users the issue is moot as it's a rarely
used feature.

Which spec is better? That depends on whether one is satisfied with ANS.
I wasn't for all the reasons given above.

dxforth

unread,

Nov 15, 2021, 12:34:36 AM11/15/21

to

On 15/11/2021 08:45, NN wrote:
>

> I also wonder if it should be a minimum of 3 rather than 2 because
> because you want the ability to read at least 1 char.

It's somewhat academic as few applications will use buffers this small.
The reason I specified 2 rather 1 is because eol detectors such as used
in Swiftforth need to look ahead one character if an $0D was encountered.

BTW Swiftforth's READ-LINE always reads u1+1 characters and will overwrite
if a buffer of only u1 chars was supplied.

NN

unread,

Nov 15, 2021, 9:50:04 AM11/15/21

to

On Monday, 15 November 2021 at 02:00:17 UTC, dxforth wrote:
> On 15/11/2021 08:45, NN wrote:
> >
> > I agree it was a pain to remember to add the 2. But surely anyone
> > writing in forth would look up the function for guidance.
> > If someone is not familiar with forth why would they attempt to guess ?
> Let's examine it from the user's perspective. It's not a guess to call
> a read function with the size of the buffer one has assigned for the
> task - it's expected. It's a convention that READ-FILE follows, as does
> fgets in C, and presumably every other language.
>
> Before asking users to do something that breaks convention and risks users
> screw-ups, they deserve to be told why. ANS offered no explanation, nor
> do we know there even was a rationale.
> >

( -------------------------------------------------------------------------------------------------- )

I don't know why it is the way it is. & I don't see it as breaking convention.
Was it different before forth 94 ?

As far as READ-LINE is concerned , i normally do something like :

eg

-1 value fid
20 value b1-sz
( b1-sz buffer: b1 - in vfx )
variable b1 b1-sz allot

: opf ( -- ) s" test3.f" r/w open-file throw to fid ; ( 0 )

: rln ( -- )
begin
b1 b1-sz 2 - fid READ-LINE throw while ( 1 )
dup >r b1 swap type
r> b1 2 - < if cr then ( 2 )
repeat ;

: clf ( -- ) fid close-file throw -1 to fid ;

: start ( -- ) opf rln clf

( NOTES )
( 0 ) test3.f was the name of the file so substitute whatever you use.
( 1 ) b1 b1-sz 2 - fid READ-LINE throw while
-- deduct 2
( 2 ) r> b1 2 - < if cr then
-- deduct 2 for the check .

( -------------------------------------------------------------------------------------------------- )

> > I also wonder if it should be a minimum of 3 rather than 2 because
> > because you want the ability to read at least 1 char.
> >
> > Secondly ,
> >
> > ( forth standard suggests - )
> > flag= true - not reached EOF
> > flag=false ==> EOF ( end-of-file)
> > if u1=u2 there more to read for that line
> > if u2 < u1 you read the line up to EOL ( end-of-line)
> >
> > And your spec shifts that to 'n'
> > -1 = EOL
> > 0 = EOF
> > 1 = otherwise
> >
> > They are both accomplishing the same thing so I am undecided which is better.
> > Your's does make it clearer when you have reached EOL and EOF.

> ANS 'u2 < u1' statement is open to interpretation. Again, if it were

( -------------------------------------------------------------------------------------------------- )

I am not sure I understood this, can you give an example where its
open to interpretation ?

( -------------------------------------------------------------------------------------------------- )

> ANS intent it should be used to detect EOL, I would have expected them
> to explicitly say so. For most users the issue is moot as it's a rarely
> used feature.
>
> Which spec is better? That depends on whether one is satisfied with ANS.
> I wasn't for all the reasons given above.

( -------------------------------------------------------------------------------------------------- )

ANS has proven adequate for my use.
Others might have a very different opinion.

( -------------------------------------------------------------------------------------------------- )

dxforth

unread,

Nov 15, 2021, 10:37:44 AM11/15/21

to

On 16/11/2021 01:50, NN wrote:
> ...

> I don't know why it is the way it is. & I don't see it as breaking convention.
> Was it different before forth 94 ?

I don't know why it's the way it is either. It follows no convention that
I'm aware. All it does is make users jump through hoops inserting 2 +/-
for no good reason anyone can fathom. If the option exists to correct it,
I'll do so. I didn't come to Forth to park my brain at the door.

> As far as READ-LINE is concerned , i normally do something like :
>
> eg
>
> -1 value fid
> 20 value b1-sz
> ( b1-sz buffer: b1 - in vfx )
> variable b1 b1-sz allot
>
> : opf ( -- ) s" test3.f" r/w open-file throw to fid ; ( 0 )
>
> : rln ( -- )
> begin
> b1 b1-sz 2 - fid READ-LINE throw while ( 1 )
> dup >r b1 swap type
> r> b1 2 - < if cr then ( 2 )
> repeat ;
>
> : clf ( -- ) fid close-file throw -1 to fid ;
>
> : start ( -- ) opf rln clf
>

I avoid reading partial lines for the very reason one must join them up.
It rather defeats the purpose of 'READ-A-LINE'.

NN

unread,

Nov 15, 2021, 10:42:15 AM11/15/21

to

READ-FILE appears to be simpler than READ-LINE
I have only used if to grab the full file rather than a line at a time.
I might change how I read lines from now...

This is what I used :

-1 value fid
20 value b1-sz

variable b1 b1-sz allot

: opf ( -- ) s" test3.f" r/w open-file throw to fid ;

: rln ( -- )
begin
b1 b1-sz 2 - fid READ-LINE throw while

dup >r b1 swap type

r> b1-sz 2 - < if cr then
repeat ;

: chk? ( -- f )
fid file-size throw
fid file-position throw
d= ;

: rfl ( -- )
begin
b1 b1-sz fid READ-FILE throw
b1 swap type
chk? until ;

: clf ( -- ) fid close-file throw -1 to fid ;

: start1 ( -- ) opf rln clf ;
: start2 ( -- ) opf rfl clf ;

NN

unread,

Nov 15, 2021, 11:25:47 AM11/15/21

to

The answer to that is simple , if you want to read complete lines , have a
larger buffer to read your longest line.

I dont know how often the situation arises where the lines are long , the
buffer is small, and you dont have a choice.

dxforth

unread,

Nov 15, 2021, 8:46:41 PM11/15/21

to

What presumably every application writer has been doing.

>
> I dont know how often the situation arises where the lines are long , the
> buffer is small, and you dont have a choice.

There's no choice but to write your own read-a-line routine when your
system's READ-LINE doesn't support features necessary for a project.
ANS supported what most people needed, namely the ability to read an
entire line in one call. Given some folks have cited scenarios involving
absurdly long lines or absurdly small buffers, any new spec would need
to accommodate them if only to keep the peace.

Ruvim

unread,

Dec 24, 2021, 6:51:55 AM12/24/21

to

On 2021-11-14 05:26, dxforth wrote:
> On 3/11/2021 18:48, Ruvim wrote:
>>
>> There's little ground for discussion without your draft for a new
>> specification ;-)
>
> Ok, here's the new spec. An implementation of which I've previously
> posted:
>
> https://pastebin.com/BT4UQ1Zu

[...]

[...]

Thank you!

In general, your rationale is reasonable. But there are some weak points
to discuss.

One weak point of this specification is that it allows a line terminator
to be broken up into two parts, and then a user should handle this
special case, if he wants to find a line terminator by the following
reads. But a user cannot know whether this case takes place or not.

For example, having a CRLF line terminator, the following scenario is
allowed by your specification:

s" test.tmp" r/w create-file throw value h
s\" abc\r\l" h write-file throw
0. h reposition-file throw
pad 4 h read-line throw . dup . \ prints "1 4"
pad swap dump
\ prints: 61 62 63 0D

So, CR is read, but LF is not read.

The standard READ-LINE is specified in a way that doesn't allow this
scenario (or seems so).

Also, the standard specifies that if an error occurs, then 'u2' is the
number of read characters. But your variant doesn't specify this
important clause (and your reference implementation relies on that when
it ignores ior from 'reposition-file').

Of course, your specification can be corrected somehow in these regards.

>
> The spec is largely backward compatible with Forth-94 READ-LINE with 'n'
> replacing 'flag'. Existing applications should work; the exception being
> those that specifically tested 'flag' = TRUE. Such cases however are
> likely to be rare.

A more critical weak point is backward compatibility.
Except 'flag', the meaning of 'u2' is also changed.

Before that a program compared 'u2' with 'u1' to detect whether a
completed line was read. But with your version such a program will not
work any more.

For example, the following word "readout-file-line-resizable" takes a
buffer and resize it step by step until a completed line is read.

: readout-file-line-resizeable
( addr1 u1 fileid -- addr2 u2 flag ior )
0 {: buf l0 h pos | flag ior :}
l0 dup 2- to l0 3 u< if buf 0 0 -24 exit then
begin
buf dup pos + l0 h read-line ( buf u flag ior )
to ior to flag dup pos + to pos ( buf u )
l0 <> ior or if pos flag ior exit then \ completed line, or err
pos l0 + 2+ resize dup if ( buf ior )
pos true rot exit
then ( buf2 0 ) drop to buf
again
;

Obviously, this word will work incorrectly with your READ-LINE

So I think the proposed modification is too drastic for the old word,
and a new name should be used.

The next question is about the returned values and their meaning.

What do you think if the new word will use 'ior' to indicate that a
buffer is not sufficient to accept a completed line?

Rationale: if I want to read a file line by line, then I probably should
not handle a part of a line as a completed line. And if I don't want to
handle this special case of insufficient buffer, I can just throw an
exception. But when this case is handled, it doesn't matter what to
check for a special value: 'n' or 'ior'.

I can suggest the following variant.

11.6.1.xxxx READ-FILE-LINE
( c-addr1 u1 fileid -- u2 flag ior )

If 'u1' is less than the length of an implementation defined line
terminator sequence of characters, then 'u2' is zero, 'flag' is false,
'ior' is -80.

Otherwise, try to read as most as possible characters from the file
specified by 'fileid' into the data space region at the address
'c-addr1', meeting the following conditions at the end:
1. Not more than 'u1' characters are read from the file.
2. No one character is read after a line terminator sequence is read.
3. Either all characters of a line terminator sequence are read, or no
one character of a possible line terminator sequence is read.

If something was read (and the file position was changed), then 'flag'
is true; otherwise 'flag' is false, nothing was read from the file (and
the file position was not changed).

If some underlying I/O operation was not successful, then 'ior' is an
implementation-defined I/O result code, 'u2' is the number of characters
read, and the conditions 2 and 3 may be not met.

Otherwise, if a line terminator sequence was read, then 'ior' is zero,
'u2' is the number of characters read excluding the line terminator
sequence.

Otherwise, if the end of file was reached before 'u1' characters were
read, then 'ior' is zero, 'u2' is the number of characters read.

Otherwise, ior is -80, 'u2' is the number of characters read.

The table of the result parameters combinations:

'ior' 'flag' 'u2'
0 -1 any -- a completed line was read
0 0 0 -- the end of file is reached
-80 0 0 -- the buffer is less then a line terminator
-80 -1 >0 -- the buffer is insufficient
<>0 0 0 -- an exception occurred, nothing was read
<>0 -1 >0 -- an exception occurred during reading

Other combinations are not allowed.

A lazy implementation (just a proof of the concept) is following.

-80 constant ior-insufficient-buffer

: move-file-position ( n fileid -- ior )
over 0= if drop exit then
dup >r file-position dup if nip nip nip rdrop exit then drop
rot s>d d+ r> reposition-file
;

: read-file-line-lt ( addr1 u1 fileid sd.line-term -- u2 flag ior )
{: buf u1 h lta ltl | u ior :}
ltl 0= if 0 0 -71 exit then
u1 ltl u< if 0 false ior-insufficient-buffer exit then
buf u1 h read-file to ior to u
u 0= ior or if u dup 0<> ior exit then
buf u lta ltl search 0= if nip ( u )
dup u1 u< if true 0 exit then \ EOF is reached
ltl 1- tuck - swap ( buf u2 n.unshift )
ior-insufficient-buffer to ior
else ( a3 u3 )
ltl - swap buf - swap
then ( u2 n.unshift )
negate h move-file-position dup if ( u2 ior )
nip u true rot exit \ u2 is synced with the file position
then drop ( u2 ) true ior
;

: read-file-line ( addr1 u1 fileid -- u2 flag ior )
s\" \n" read-file-line-lt
;

Comments are welcome.

--
Ruvim

dxforth

unread,

Dec 25, 2021, 9:15:12 PM12/25/21

to

On 24/12/2021 22:51, Ruvim wrote:
> ...

> One weak point of this specification is that it allows a line terminator
> to be broken up into two parts, and then a user should handle this
> special case, if he wants to find a line terminator by the following
> reads. But a user cannot know whether this case takes place or not.
>
> For example, having a CRLF line terminator, the following scenario is
> allowed by your specification:
>
> s" test.tmp" r/w create-file throw value h
> s\" abc\r\l" h write-file throw
> 0. h reposition-file throw
> pad 4 h read-line throw . dup . \ prints "1 4"
> pad swap dump
> \ prints: 61 62 63 0D
>
> So, CR is read, but LF is not read.

The spec states eol shall not be included in the count u2. For
implementations where eol can straddle two reads, either of the
following actions would be considered valid:

u2 n
-- --
3 -1 first read
0 0 second read

u2 n
-- --
3 1 first read
0 -1 second read
0 0 third read

> ...

> Also, the standard specifies that if an error occurs, then 'u2' is the
> number of read characters. But your variant doesn't specify this
> important clause (and your reference implementation relies on that when
> it ignores ior from 'reposition-file').

I can't agree with any of that.

> ...

> A more critical weak point is backward compatibility.
> Except 'flag', the meaning of 'u2' is also changed.
>
> Before that a program compared 'u2' with 'u1' to detect whether a
> completed line was read. But with your version such a program will not
> work any more.

Yes and it's a change for the better. No longer do you need to retain
u1 in order to compare it with u2. No longer do you need to remember to
allocate u1 + 2 bytes for the buffer.

Ruvim

unread,

Dec 26, 2021, 2:08:32 AM12/26/21

to

On 2021-12-26 05:15, dxforth wrote:
> On 24/12/2021 22:51, Ruvim wrote:
>> ...
>> One weak point of this specification is that it allows a line terminator
>> to be broken up into two parts, and then a user should handle this
>> special case, if he wants to find a line terminator by the following
>> reads. But a user cannot know whether this case takes place or not.
>>
>> For example, having a CRLF line terminator, the following scenario is
>> allowed by your specification:
>>
>>     s" test.tmp" r/w create-file throw value h
>>     s\" abc\r\l" h write-file throw
>>     0. h reposition-file throw
>>     pad 4 h read-line throw . dup . \ prints "1 4"
>>     pad swap dump
>>     \ prints: 61 62 63 0D
>>
>> So, CR is read, but LF is not read.

(1)

>
> The spec states eol shall not be included in the count u2.

0D is not an EOL sequence, so this condition is met.

What character is next after 0D -- is unknown without additional reads.

> For implementations where eol can straddle two reads,
> either of the following actions would be considered valid:
>
> u2   n
> -- --
> 3   -1   first read
> 0    0   second read

This variant is not valid, since 5 characters are read (it can be tested
via FILE-POSITION), but the spec says than not more than u1 (that is 4)
characters are read.

> u2   n
> -- --
> 3    1   first read
> 0   -1   second read
> 0    0   third read

Yes, it's valid. But the question is, why (1) is not valid.

>> ...
>> Also, the standard specifies that if an error occurs, then 'u2' is the
>> number of read characters. But your variant doesn't specify this
>> important clause (and your reference implementation relies on that when
>> it ignores ior from 'reposition-file').
>
> I can't agree with any of that.

You can't agree that your spec doesn't specify what u2 mean in the case
of error? Could you please quote the relevant part then?

>> ...
>> A more critical weak point is backward compatibility.
>> Except 'flag', the meaning of 'u2' is also changed.
>>
>> Before that a program compared 'u2' with 'u1' to detect whether a
>> completed line was read. But with your version such a program will not
>> work any more.
>
> Yes and it's a change for the better.

I don't object that it's better. My point here is that a new name should
be, since otherwise a correct program will become incorrect. And I
provided an example of such a program.

> No longer do you need to retain
> u1 in order to compare it with u2. No longer do you need to remember to
> allocate u1 + 2 bytes for the buffer.

--
Ruvim

dxforth

unread,

Dec 26, 2021, 7:22:43 AM12/26/21

to

On 26/12/2021 18:08, Ruvim wrote:
> On 2021-12-26 05:15, dxforth wrote:
>> On 24/12/2021 22:51, Ruvim wrote:
>>> ...
>>> One weak point of this specification is that it allows a line terminator
>>> to be broken up into two parts, and then a user should handle this
>>> special case, if he wants to find a line terminator by the following
>>> reads. But a user cannot know whether this case takes place or not.
>>>
>>> For example, having a CRLF line terminator, the following scenario is
>>> allowed by your specification:
>>>
>>>     s" test.tmp" r/w create-file throw value h
>>>     s\" abc\r\l" h write-file throw
>>>     0. h reposition-file throw
>>>     pad 4 h read-line throw . dup . \ prints "1 4"
>>>     pad swap dump
>>>     \ prints: 61 62 63 0D
>>>
>>> So, CR is read, but LF is not read.
> (1)
>
>>
>> The spec states eol shall not be included in the count u2.
>
> 0D is not an EOL sequence, so this condition is met.
>
> What character is next after 0D -- is unknown without additional reads.

READ-LINE parses lines of text delimited by eol. Implementers are not
entitled to split an eol and treat it as separate characters. If you
want to parse characters use READ-FILE.

>
>
>> For implementations where eol can straddle two reads,
>> either of the following actions would be considered valid:
>>
>> u2   n
>> -- --
>> 3   -1   first read
>> 0    0   second read
>
> This variant is not valid, since 5 characters are read (it can be tested
> via FILE-POSITION), but the spec says than not more than u1 (that is 4)
> characters are read.

In my spec 'u1' is the size of the line buffer - which READ-LINE fills as
necessary. Your example suggests 4 characters were read - 'abc' and the
beginning of an eol. Having found the beginning of an eol, an implementation
is entitled to say eol was received. That is what is indicated in the
results shown above.

>
>> u2   n
>> -- --
>> 3    1   first read
>> 0   -1   second read
>> 0    0   third read
>
>
> Yes, it's valid. But the question is, why (1) is not valid.
>
>
>
>>> ...
>>> Also, the standard specifies that if an error occurs, then 'u2' is the
>>> number of read characters. But your variant doesn't specify this
>>> important clause (and your reference implementation relies on that when
>>> it ignores ior from 'reposition-file').
>>
>> I can't agree with any of that.
>
> You can't agree that your spec doesn't specify what u2 mean in the case
> of error? Could you please quote the relevant part then?

I see no mention of u2:

"If ior is non-zero, an exception occurred during the operation and ior
is the implementation-defined I/O result code."

>>> ...
>>> A more critical weak point is backward compatibility.
>>> Except 'flag', the meaning of 'u2' is also changed.
>>>
>>> Before that a program compared 'u2' with 'u1' to detect whether a
>>> completed line was read. But with your version such a program will not
>>> work any more.
>>
>> Yes and it's a change for the better.
>
> I don't object that it's better. My point here is that a new name should
> be, since otherwise a correct program will become incorrect. And I
> provided an example of such a program.

No TC has certified those programs are 'correct' and likely never will.
So the decision of supporting them falls back to you.

Ruvim

unread,

Dec 27, 2021, 9:38:59 AM12/27/21

to

It's unclear from your specification. It should be testable, and it
should explicitly follow from the spec, I think.

> If you want to parse characters use READ-FILE.

It's irrelevant to the problem I pointed to.

>>
>>
>>> For implementations where eol can straddle two reads,
>>> either of the following actions would be considered valid:
>>>
>>>   u2   n
>>>   -- --
>>>   3   -1   first read
>>>   0    0   second read
>>
>> This variant is not valid, since 5 characters are read (it can be tested
>> via FILE-POSITION), but the spec says than not more than u1 (that is 4)
>> characters are read.
>
> In my spec 'u1' is the size of the line buffer - which READ-LINE fills as
> necessary. Your example suggests 4 characters were read - 'abc' and the
> beginning of an eol. Having found the beginning of an eol, an
> implementation is entitled to say eol was received.

It's unclear, in this example, do you mean that 5 characters are read on
the first read?

Or 4 on the first read, and 1 on the second read?

I.e., what position would be returned by FILE-POSITION after each read
(i.e. each call of READ-LINE)?

> That is what is indicated in the results shown above.
>

>>>> ...
>>>> Also, the standard specifies that if an error occurs, then 'u2' is the
>>>> number of read characters. But your variant doesn't specify this
>>>> important clause (and your reference implementation relies on that when
>>>> it ignores ior from 'reposition-file').
>>>
>>> I can't agree with any of that.
>>
>> You can't agree that your spec doesn't specify what u2 mean in the case
>> of error? Could you please quote the relevant part then?
>
> I see no mention of u2:
>
> "If ior is non-zero, an exception occurred during the operation and ior
> is the implementation-defined I/O result code."

You are right, it's my oversight. I actually read it in the spec for
"READ-FILE":

"If an exception occurs, ior is the implementation-defined I/O result
code, and u2 is the number of characters transferred to c-addr without
an exception"

Probably it's an omission in READ-LINE. Since an the moment the meaning
of u2 doesn't depend whether an error occurred or not.

I believe, the meaning for u2 in the case of an error should be
specially specified for READ-LINE (or its replacement) too.

>>>> ...
>>>> A more critical weak point is backward compatibility.
>>>> Except 'flag', the meaning of 'u2' is also changed.
>>>>
>>>> Before that a program compared 'u2' with 'u1' to detect whether a
>>>> completed line was read. But with your version such a program will not
>>>> work any more.
>>>
>>> Yes and it's a change for the better.
>>
>> I don't object that it's better. My point here is that a new name should
>> be, since otherwise a correct program will become incorrect. And I
>> provided an example of such a program.
>
> No TC has certified those programs are 'correct' and likely never will.

It's irrelevant.

> So the decision of supporting them falls back to you.

You seem to agree that the changes you propose will break backward
compatibility of a system, and some standard programs will stop work.

What is your rationale to prefer this variant (i.t. changing behavior of
the existing standard word) over introducing a new word?

--
Ruvim

dxforth

unread,

Dec 28, 2021, 4:40:27 AM12/28/21

to

The first line of ANS states the objective - and by implication - what needs
to be tested:

"Read the next line from the file [...] into memory at the address c-addr."

The definition of "line" being:

"line: A sequence of characters followed by an actual or implied line
terminator."

If one accepts input to READ-LINE consists of 'text separated by line
terminators' then the output [within the size limit of the buffer] cannot
be anything different. Interpretations which result in output different
from the input are, IMO, contrary to ANS' objective.

>
>
>> If you want to parse characters use READ-FILE.
>
> It's irrelevant to the problem I pointed to.

It's relevant given you asked "What character is next after 0D" - which is
to think in terms of a stream of characters as opposed to text separated
by line terminators.

> ...

>>> You can't agree that your spec doesn't specify what u2 mean in the case
>>> of error? Could you please quote the relevant part then?
>>
>> I see no mention of u2:
>>
>> "If ior is non-zero, an exception occurred during the operation and ior
>> is the implementation-defined I/O result code."
>
> You are right, it's my oversight. I actually read it in the spec for
> "READ-FILE":
>
> "If an exception occurs, ior is the implementation-defined I/O result
> code, and u2 is the number of characters transferred to c-addr without
> an exception"
>
> Probably it's an omission in READ-LINE. Since an the moment the meaning
> of u2 doesn't depend whether an error occurred or not.

AFAIK it's only for READ-FILE which has to handle random access files where
one may encounter gaps (aka sparse files). READ-LINE only handles text files
which are sequential and have no gaps.

>
>>>>> ...
>>>>> A more critical weak point is backward compatibility.
>>>>> Except 'flag', the meaning of 'u2' is also changed.
>>>>>
>>>>> Before that a program compared 'u2' with 'u1' to detect whether a
>>>>> completed line was read. But with your version such a program will not
>>>>> work any more.
>>>>
>>>> Yes and it's a change for the better.
>>>
>>> I don't object that it's better. My point here is that a new name should
>>> be, since otherwise a correct program will become incorrect. And I
>>> provided an example of such a program.
>>
>> No TC has certified those programs are 'correct' and likely never will.
>
> It's irrelevant.
>
>
>> So the decision of supporting them falls back to you.
>
> You seem to agree that the changes you propose will break backward
> compatibility of a system, and some standard programs will stop work.

I'm not convinced ANS supports detection of partial lines. If someone
really needed the feature and wanted it to be portable, it would be
wiser to write the routine from ground up than rely on some implementer's
interpretation of ANS.

>
> What is your rationale to prefer this variant (i.t. changing behavior of
> the existing standard word) over introducing a new word?

As with my REPRESENT variant its specs are better than ANS while retaining
essential features and interface. I wrote it for myself and considered it
so good I put it in the public domain. Isn't that what you do?

Ruvim

unread,

Jan 5, 2022, 4:09:45 PM1/5/22

to

On 2021-12-28 12:40, dxforth wrote:
> On 28/12/2021 01:38, Ruvim wrote:

>> It's unclear from your specification.

[...]

> If one accepts input to READ-LINE consists of 'text separated by line
> terminators' then the output [within the size limit of the buffer] cannot
> be anything different.

I consider a case when a line with a line terminator doesn't fit the
buffer. In this case the buffer cannot contain a line.

Also I asked the following question concerning my example:
What position would be returned by FILE-POSITION after each read

(i.e. each call of READ-LINE)?

This question is remained unanswered.

> Interpretations which result in output different
> from the input are, IMO, contrary to ANS' objective.

If you mean to read a file (an input) with READ-LINE and write the text
into another file (an output) via WRITE-FILE and WRITE-LINE, in my
example the output will be the same as the input. So it's irrelevant to

the problem I pointed to.

In any case, one your claim to the current specification is that it
allows different interpretations. But then it's strange that you are
reluctant to make your specification more clear.

>>> So the decision of supporting them falls back to you.
>>
>> You seem to agree that the changes you propose will break backward
>> compatibility of a system, and some standard programs will stop work.
>
> I'm not convinced ANS supports detection of partial lines.

And an official clarification (RFI 0001 [1]) is not enough persuasive?

| u2=u1, flag=true, ior=zero
| A partial line was read; the rest would not fit in the buffer,
| and can be acquired by additional calls to READ-LINE.

Then what can change your mind?

[1] ANS Forth RFI 0001: READ-LINE
http://www.complang.tuwien.ac.at/forth/dpans-html/a0001.htm

> If someone
> really needed the feature and wanted it to be portable, it would be
> wiser to write the routine from ground up than rely on some implementer's
> interpretation of ANS.

So your assumption is that Forth-systems usually provide a broken
"READ-LINE" in this regard, isn't it?

And your next assumption is that implementers will change their
READ-LINE according to a new specification (even if they don't want to
fix a broken READ-LINE according to the current specification)?

These assumptions are not well founded at the moment, and I can expect
only a far more mess in implementations.

>>
>> What is your rationale to prefer this variant (i.t. changing behavior of
>> the existing standard word) over introducing a new word?
>
> As with my REPRESENT variant its specs are better than ANS while retaining
> essential features and interface.

I don't know anything concerning your REPRESENT, but your READ-LINE
doesn't retain the essential interface.

> I wrote it for myself and considered it
> so good I put it in the public domain. Isn't that what you do?

--
Ruvim

dxforth

unread,

Jan 5, 2022, 6:27:35 PM1/5/22

to

On 6/01/2022 08:09, Ruvim wrote:
> On 2021-12-28 12:40, dxforth wrote:
>> On 28/12/2021 01:38, Ruvim wrote:
>
>>> It's unclear from your specification.
> [...]
>
>> If one accepts input to READ-LINE consists of 'text separated by line
>> terminators' then the output [within the size limit of the buffer] cannot
>> be anything different.
>
> I consider a case when a line with a line terminator doesn't fit the
> buffer. In this case the buffer cannot contain a line.

That you don't know whether a line terminator is in the buffer is a
false dichotomy.

>
> Also I asked the following question concerning my example:
> What position would be returned by FILE-POSITION after each read
> (i.e. each call of READ-LINE)?
>
> This question is remained unanswered.

Convention informs you. For

a) MAC & UNIX - it's one character after the beginning of the eol
b) DOS/Windows - it's two characters after the beginning of the eol

Needless to say an ambiguous condition exists where the input text
does not conform to convention regarding eol.

>
>> Interpretations which result in output different
>> from the input are, IMO, contrary to ANS' objective.
>
> If you mean to read a file (an input) with READ-LINE and write the text
> into another file (an output) via WRITE-FILE and WRITE-LINE, in my
> example the output will be the same as the input. So it's irrelevant to
> the problem I pointed to.
>
> In any case, one your claim to the current specification is that it
> allows different interpretations. But then it's strange that you are
> reluctant to make your specification more clear.

I meant your example - which showed an eol going into READ-LINE and
something else coming out. Such a result is clearly wrong. It's not
the role of a spec to rule out every nonsensical interpretation - it's
the implementer's responsibility to do so assuming he is serious.