Progressing Matthias Trute's recognizer proposal

Alex McDonald

unread,

May 27, 2020, 2:15:40 PM5/27/20

to

After Matthias' untimely passing, his proposal for recognizers has
stalled and I'm not aware of any discussion since his death. It would be
a shame if the work he did went to waste and wasn't carried forward.

http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
the last proposal he made, and I would like to take it forward on his
behalf. If there are no objections, I'd like to further refine it and
create another set of RFCs based on my and others' experiences of using
recognizers.

This may not match some of the current implementations in the wild; for
example, decisions made by gforth to do with an "automatic" action for
postpone actions based on the compile action (that is, having two
actions rather than three).

It will also attempt, in no specific order: to reduce the number of
words required to implement the proposal; do some significant bike
shedding around names (for example, removing the ambiguity of RECTYPE
and avoiding words like or containing NULL); remove the requirement for
a fixed name REC-NUM REC-FLOAT; be less prescriptive and more
descriptive to allow greater implementation flexibility; and so on.

Polite comments welcome.

--
Alex

Ruvim

unread,

May 27, 2020, 2:29:59 PM5/27/20

to

On 2020-05-27 21:15, Alex McDonald wrote:
> After Matthias' untimely passing, his proposal for recognizers has
> stalled and I'm not aware of any discussion since his death.

UlrichHoffmann proposed "Recognizer RfD rephrase" on 2020-02-24

See at https://forth-standard.org/standard/intro#contribution-131

Alex McDonald

unread,

May 27, 2020, 2:44:05 PM5/27/20

to

On 27-May-20 19:29, Ruvim wrote:
> On 2020-05-27 21:15, Alex McDonald wrote:
>> After Matthias' untimely passing, his proposal for recognizers has
>> stalled and I'm not aware of any discussion since his death.
>
> UlrichHoffmann proposed "Recognizer RfD rephrase" on 2020-02-24
>
> See at https://forth-standard.org/standard/intro#contribution-131

Thank you very much, I missed that. (The website is a great place to
document the standard but a horrible place to propose & discuss as this
kind of effort gets completely lost.)

Does Ulrich Hoffmann frequent clf?

>
>
>> It would be a shame if the work he did went to waste and wasn't
>> carried forward.
>>
>> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I
>> believe) the last proposal he made, and I would like to take it
>> forward on his behalf. If there are no objections, I'd like to further
>> refine it and create another set of RFCs based on my and others'
>> experiences of using recognizers.
>>
>> This may not match some of the current implementations in the wild;
>> for example, decisions made by gforth to do with an "automatic" action
>> for postpone actions based on the compile action (that is, having two
>> actions rather than three).
>>
>> It will also attempt, in no specific order: to reduce the number of
>> words required to implement the proposal; do some significant bike
>> shedding around names (for example, removing the ambiguity of RECTYPE
>> and avoiding words like or containing NULL); remove the requirement
>> for a fixed name REC-NUM REC-FLOAT; be less prescriptive and more
>> descriptive to allow greater implementation flexibility; and so on.
>>
>> Polite comments welcome.
>>
>

--
Alex

A. K.

unread,

May 27, 2020, 3:07:50 PM5/27/20

to

Thank you for taking this up, in memoriam of Matthias and for the sake of
Forth which is a fascinating never boring language.

Seen from my turf I welcome to make the proposal simpler than it is today,
particularly to get rid of the "postpone stuff". Although I can acknowledge
that some people need it for their special field of work, I am also of the
opinion that one should be careful to standardize rarely used or exotic things
without a broad user base.

Ruvim

unread,

May 27, 2020, 3:12:17 PM5/27/20

to

On 2020-05-27 21:44, Alex McDonald wrote:
> On 27-May-20 19:29, Ruvim wrote:
>> On 2020-05-27 21:15, Alex McDonald wrote:
>>> After Matthias' untimely passing, his proposal for recognizers has
>>> stalled and I'm not aware of any discussion since his death.
>>
>> UlrichHoffmann proposed "Recognizer RfD rephrase" on 2020-02-24
>>
>> See at https://forth-standard.org/standard/intro#contribution-131
>
> Thank you very much, I missed that. (The website is a great place to
> document the standard but a horrible place to propose & discuss as this
> kind of effort gets completely lost.)

I would suggest (and suggested in the past) to use the GitHub platform
for proposals and discussions on them. I believe that versions history,
commits, reviews and comments are very useful to design a specification.

Regarding forth-standard.org — it supports mail notifications (with some
lag) and an Atom feed at https://forth-standard.org/feeds/contributions

--
Ruvim

Jurgen Pitaske

unread,

May 27, 2020, 4:13:34 PM5/27/20

to

I sent him an email, copying in the text here;
let's see how busy he is
and whaen he has time to answer.

Anton Ertl

unread,

May 28, 2020, 3:58:46 AM5/28/20

to

Alex McDonald <al...@rivadpm.com> writes:
>After Matthias' untimely passing, his proposal for recognizers has
>stalled and I'm not aware of any discussion since his death. It would be
>a shame if the work he did went to waste and wasn't carried forward.
>
>http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
>the last proposal he made, and I would like to take it forward on his
>behalf.

You are very brave.

Bernd Paysan has been tasked with picking it up
<http://www.forth200x.org/meetings/2019-notes.html#a-recognizers>.
However, AFAIK he has not done anything yet. In any case, it would be
a good idea to contact him.

>It will also attempt, in no specific order: to reduce the number of
>words required to implement the proposal; do some significant bike
>shedding around names (for example, removing the ambiguity of RECTYPE
>and avoiding words like or containing NULL);

Note that the current names are the result of a bikeshedding session
at a Forth 200x meeting. The fact that you are not happy with the
result and that even a committee member who was present has second
thoughts about it shows the pointlessness of discussing names (that's
why it's called bikeshedding).

And the big problem is that it detracts from more substantial issues.

>remove the requirement for
>a fixed name REC-NUM REC-FLOAT

I think that one of the failures of Forth-94 (and 2012) is that there
is no default search order (and that's despite having standard
wordlist names). The problem is that there are cases where a program
wants to change the existing search order, but the result is
system-specific.

I hope that the recognizer proposal will eventually be tight enough to
avoid this problem. I think we need fixed-name standard recognizers
to achieve this.

I am pretty unhappy about REC-NUM, and would rather prefer that
single-cell and double-cell recognizers be separated, but REC-NUM is
the easiest way to transition from NUMBER?; separate recognizers would
require more substantial refactorings in many Forth systems, possibly
affecting more lines than implementing recognizers themselves. Given
how many years it has taken until, e.g., Stephen Pelc has looked at
recognizers, I am not sure we want to add another five years to get
rid of REC-NUM.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2019: http://www.euroforth.org/ef19/papers/

Anton Ertl

unread,

May 28, 2020, 4:15:22 AM5/28/20

to

Ruvim <ruvim...@gmail.com> writes:
>On 2020-05-27 21:15, Alex McDonald wrote:
>> After Matthias' untimely passing, his proposal for recognizers has
>> stalled and I'm not aware of any discussion since his death.
>
>UlrichHoffmann proposed "Recognizer RfD rephrase" on 2020-02-24
>
>See at https://forth-standard.org/standard/intro#contribution-131

I don't think it's a good idea to rename concepts in the middle of the
discussion. This has been done earlier, and as a result, it is now
difficult to compare different versions of the proposal.

Ulrich Hoffmann thinks that the discussion is at the end (in that case
one might think about renaming; it still means that people will have
difficulty when they try to understand the development of the
proposal), but Alex McDonald apparently does not think so.

Anton Ertl

unread,

May 28, 2020, 4:51:04 AM5/28/20

to

Alex McDonald <al...@rivadpm.com> writes:
>http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
>the last proposal he made, and I would like to take it forward on his
>behalf. If there are no objections, I'd like to further refine it and
>create another set of RFCs based on my and others' experiences of using
>recognizers.
>
>This may not match some of the current implementations in the wild; for
>example, decisions made by gforth to do with an "automatic" action for
>postpone actions based on the compile action (that is, having two
>actions rather than three).

Gforth has RECTYPE>INT RECTYPE>COMP RECTYPE>POST, just as described in
<http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html>.

If you just want to recognize literals, you need just one action: The
time-shifting action like POSTPONE LITERAL (that's coming out of
RECTYPE>POST); you don't do anything to the literal (that the
recognizer left, e.g., on the data stack) interpretively, you
time-shift it once for compilation, and you time-shift it twice for
postponing. SwiftForth has a recognizer-like extension mechanism that
provides a single xt: that for the time-shifting action.

If you want a recognizer for normal words, or, e.g., for TO
replacement syntax ->X, you als need a run-time action; for normal
words the run-time action is EXECUTE, for ->X, where X is a VALUE, it
is "!". I.e., after time-shifting the xt (for a word recognizer) or
address (for a ->X recognizer) 0-2 times, you perform the run-time
action (also appropriately time-shifted). In the proposal we have a
one-time-shifted run-time action in the form of RECTYPE>COMP (for
literals the result is the same action as the time-shifting action).

If you want a recognizer that can also handle immediate and combined
words, these two actions are insufficient. For a combined word, the
interpretation semantics is not systematically related to the
compilation semantics, so it needs a separate action. And that's what
RECTYPE>INT provides. For literals it is noop, for ->X it's the
run-time action without time-shift (i.e., "!" for a VALUE), for words
(both normal and combined) it's performing the interpretation
semantics (i.e., for an nt-based word recognizer the action consists
essentially of NAME>INTERPRET EXECUTE); for combined words the
RECTYPE>COMP action performs the compilation semantics.

Alex McDonald

unread,

May 28, 2020, 8:43:15 AM5/28/20

to

On 28-May-20 08:25, Anton Ertl wrote:
> Alex McDonald <al...@rivadpm.com> writes:
>> After Matthias' untimely passing, his proposal for recognizers has
>> stalled and I'm not aware of any discussion since his death. It would be
>> a shame if the work he did went to waste and wasn't carried forward.
>>
>> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
>> the last proposal he made, and I would like to take it forward on his
>> behalf.
>
> You are very brave.

Not really. I'm a big twitter user. In contrast with that cesspool, clf
is more like a chat with elderly aunts with strong opinions about
everyone else's taste in wallpaper. I'll survive (although my proposal
may not).

>
> Bernd Paysan has been tasked with picking it up
> <http://www.forth200x.org/meetings/2019-notes.html#a-recognizers>.
> However, AFAIK he has not done anything yet. In any case, it would be
> a good idea to contact him.

I will, thanks.

>
>> It will also attempt, in no specific order: to reduce the number of
>> words required to implement the proposal; do some significant bike
>> shedding around names (for example, removing the ambiguity of RECTYPE
>> and avoiding words like or containing NULL);
>
> Note that the current names are the result of a bikeshedding session
> at a Forth 200x meeting. The fact that you are not happy with the
> result and that even a committee member who was present has second
> thoughts about it shows the pointlessness of discussing names (that's
> why it's called bikeshedding).
>
> And the big problem is that it detracts from more substantial issues.

I'm not alone in thinking that describing it as bikeshedding is not
helpful. I would find a words like NON_AGNITA obfuscatory when
discussing unrecognized tokens, much as I do for this description:

RECTYPE-NULL ( -- RECTYPE-NULL ) RECOGNIZER
The null data type id. It is to be used if no other data type id is
applicable but one is needed.

A "null data type id" is close to word salad, and the name RECTYPE-NULL
confirms this. The problem starts with the "data type", is compounded by
"null", it uses RECTYPE as an abbreviation for RECOGNIZER TYPE when many
would read RECORD TYPE, and it runs off downhill from there.

Well chosen words can aid clarity. Within reason and where appropriate
I'll pursue it.

>
>> remove the requirement for
>> a fixed name REC-NUM REC-FLOAT
>
> I think that one of the failures of Forth-94 (and 2012) is that there
> is no default search order (and that's despite having standard
> wordlist names). The problem is that there are cases where a program
> wants to change the existing search order, but the result is
> system-specific.

Given a scenario where I have a recognizer for assembler opcodes that
start with V for vector ops, I might wish to have ALSO ASMVEX use this
recognizer but only when searching ASMVEX. Is this what you mean?

Or is it a more general observation about the search order wordset, a
lack of a default order, and repeating its consequent design shortcomings?

If it's the latter, then yes, it would be possible to specify a default
recognizer order; but that assumes a fixed recognizer nameset if there
is to be only one list. Perhaps two lists are required; a default list
of unnamed recognizers that provide base functionality, and a list of
user defined recognizers which is run through first and can replace the
default set's token recognition.

>
> I hope that the recognizer proposal will eventually be tight enough to
> avoid this problem. I think we need fixed-name standard recognizers
> to achieve this.

OK, but it's possible to write a perfectly adequate specification
without mentioning them by name.

>
> I am pretty unhappy about REC-NUM, and would rather prefer that
> single-cell and double-cell recognizers be separated, but REC-NUM is
> the easiest way to transition from NUMBER?; separate recognizers would
> require more substantial refactorings in many Forth systems, possibly
> affecting more lines than implementing recognizers themselves. Given
> how many years it has taken until, e.g., Stephen Pelc has looked at
> recognizers, I am not sure we want to add another five years to get
> rid of REC-NUM.

There's nothing preventing REC-SNUM and REC-DNUM being factors of
REC-NUM if that's what you wish.

>
> - anton
>

--
Alex

Ruvim

unread,

May 28, 2020, 11:02:24 AM5/28/20

to

On 2020-05-28 15:43, Alex McDonald wrote:
> On 28-May-20 08:25, Anton Ertl wrote:
>> Alex McDonald <al...@rivadpm.com> writes:
>>> After Matthias' untimely passing, his proposal for recognizers has
>>> stalled and I'm not aware of any discussion since his death. It would be
>>> a shame if the work he did went to waste and wasn't carried forward.
>>>
>>> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
>>> the last proposal he made, and I would like to take it forward on his
>>> behalf.

[...]

> Well chosen words can aid clarity. Within reason and where appropriate
> I'll pursue it.

+1

Moreover, they can be defined via REC-NUM

: rec-snum ( addr len -- n tt-snum | 0 )
rec-num ( 0 | n tt-snum | d tt-dnum )
dup rec-snum = if exit then
dup rec-dnum = if drop 2drop 0 exit then
dup 0= if exit then -21 throw
;
: rec-dnum ( addr len -- d tt-dnum | 0 )
rec-num ( 0 | n tt-snum | d tt-dnum )
dup rec-dnum = if exit then
dup rec-snum = if drop drop 0 exit then
dup 0= if exit then -21 throw
;

I would suggest to distinct a simple recognizer and a compound recognizer.

A simple recognizer is a recognizer that may return tokens of some
single type only.

A compound recognizer is a recognizer that can return tokens of the
different types.

In the example above, rec-snum and rec-dnum are simple recognizers, and
rec-num is a compound recognizer.

Any list of recognizers L is semantically equal to some compound
recognizer R.
I.e. the phrase " L RECOGNIZE " is equal to the phrase " R "

NB: for a compound recognizer the priority of returned types should be
specified.
Since, in some cases a lexeme can be resolved into tokens of several
(different) types. So, we have a conflict: what type should be returned
by the corresponding compound recognizer. For such compound recognizer
the priority of the returned types should be specified.

For simplicity, in the basic specification we can choose either compound
recognizers or lists of simple recognizers. No need to have both, since
one is semantically equal to another.

I would suggest to take compound recognizers. I.e., all about list of
recognizers can be removed from the basic specification. The lists are
just a one approach to create and manage compound recognizers.

Regarding the default recognizer (that is a compound recognizer).
Yes, we should specify the types that it can return and they priority.

A one question is: may the default recognizer return a type beyond the
specified types? (i.e., as an implementation defined option).

If yes, they should have less priority then the types specified in the
standard. If no, a Forth system may always provide an option (a switch)
to recognize additional (non standard) types. Any variant is compatible
with the current Standard (since if a lexeme is not a word and not a
number — it is an ambiguous condition).

--
Ruvim

Anton Ertl

unread,

May 28, 2020, 1:11:41 PM5/28/20

to

Alex McDonald <al...@rivadpm.com> writes:
>On 28-May-20 08:25, Anton Ertl wrote:
>> Alex McDonald <al...@rivadpm.com> writes:
>>> It will also attempt, in no specific order: to reduce the number of
>>> words required to implement the proposal; do some significant bike
>>> shedding around names (for example, removing the ambiguity of RECTYPE
>>> and avoiding words like or containing NULL);
>>
>> Note that the current names are the result of a bikeshedding session
>> at a Forth 200x meeting. The fact that you are not happy with the
>> result and that even a committee member who was present has second
>> thoughts about it shows the pointlessness of discussing names (that's
>> why it's called bikeshedding).
>>
>> And the big problem is that it detracts from more substantial issues.
>
>I'm not alone in thinking that describing it as bikeshedding is not
>helpful. I would find a words like NON_AGNITA obfuscatory when
>discussing unrecognized tokens, much as I do for this description:
>
>RECTYPE-NULL ( -- RECTYPE-NULL ) RECOGNIZER
>The null data type id. It is to be used if no other data type id is
>applicable but one is needed.

Nobody proposed NON_AGNITA. However, I am sure that Matthias Trute
originally did not intend to obfuscate when he called this word
R:FAIL. Neither did the Forth-200x committee when we agreed on
RECTYPE-NULL. What makes you think that you will find a name that
nobody finds obfuscatory? I am sure you will produce another set of
names that lots of people disagree with. And the end result will be
that the whole discussion becomes more and more confusing, because old
contributions to the discussion become incomprehensible.

>A "null data type id" is close to word salad

"data type id" is defined earlier, so no, it is not. I find the
definition of "data type id" suboptimal, however. In any case,
compare with the version before the renaming
<http://amforth.sourceforge.net/pr/Recognizer-rfc-C.html>. IIRC the
committee only produced word names, but Bernd Paysan (who was the
committee's contact with Matthias Trute) may be able to tell you what
he wrote to him after that meeting.

>>> remove the requirement for
>>> a fixed name REC-NUM REC-FLOAT
>>
>> I think that one of the failures of Forth-94 (and 2012) is that there
>> is no default search order (and that's despite having standard
>> wordlist names). The problem is that there are cases where a program
>> wants to change the existing search order, but the result is
>> system-specific.
>
>Given a scenario where I have a recognizer for assembler opcodes that
>start with V for vector ops, I might wish to have ALSO ASMVEX use this
>recognizer but only when searching ASMVEX. Is this what you mean?
>
>Or is it a more general observation about the search order wordset, a
>lack of a default order, and repeating its consequent design shortcomings?

I don't remember a scenario, but I remember that I found this a
hindrance several times over the years.

Your example mixes up recognizers and vocabularies. I meant that this
was already a problem without recognizers.

For recognizers, I expect even more such problems. E.g., if I want to
replace the system's integer recognizer or float recognizer with
something else (e.g., something that does not accept system-specific
extensions), I need to know what the recognizer I want to replace is
called. Or if I want to recreate a recognizer sequence from
recognizers, I need the names of the recognizers.

>If it's the latter, then yes, it would be possible to specify a default
>recognizer order; but that assumes a fixed recognizer nameset if there
>is to be only one list. Perhaps two lists are required; a default list
>of unnamed recognizers that provide base functionality, and a list of
>user defined recognizers which is run through first and can replace the
>default set's token recognition.

That sounds overly complicated.

Currently the standard describes exactly what is recognized: words,
integers (singles and doubles), floats. What's wrong if we specify
named recognizers for that?

>> I hope that the recognizer proposal will eventually be tight enough to
>> avoid this problem. I think we need fixed-name standard recognizers
>> to achieve this.
>
>OK, but it's possible to write a perfectly adequate specification
>without mentioning them by name.

Maybe, but I am doubtful. But anyway, why would you want to avoid
standardizing the names of the recognizers that every Forth system
with recognizers has?

>> I am pretty unhappy about REC-NUM, and would rather prefer that
>> single-cell and double-cell recognizers be separated, but REC-NUM is
>> the easiest way to transition from NUMBER?; separate recognizers would
>> require more substantial refactorings in many Forth systems, possibly
>> affecting more lines than implementing recognizers themselves. Given
>> how many years it has taken until, e.g., Stephen Pelc has looked at
>> recognizers, I am not sure we want to add another five years to get
>> rid of REC-NUM.
>
>There's nothing preventing REC-SNUM and REC-DNUM being factors of
>REC-NUM if that's what you wish.

The question is what is standardized. If we standardize REC-NUM, we
probably will not standardize REC-SNUM and REC-DNUM (because there is
no common practice). If we standardize REC-SNUM and REC-DNUM, there
is hardly any need for REC-NUM. Ok, you might argue that REC-NUM is
agnostic to whether the system supports the double-number wordset or
not, but is there any system that implements Forth-2012 that does not
support the double-number wordset.

Anton Ertl

unread,

May 28, 2020, 1:39:29 PM5/28/20

to

Ruvim <ruvim...@gmail.com> writes:
>On 2020-05-28 15:43, Alex McDonald wrote:
>> On 28-May-20 08:25, Anton Ertl wrote:
>>> I am pretty unhappy about REC-NUM, and would rather prefer that
>>> single-cell and double-cell recognizers be separated, but REC-NUM is
>>> the easiest way to transition from NUMBER?; separate recognizers would
>>> require more substantial refactorings in many Forth systems, possibly
>>> affecting more lines than implementing recognizers themselves. Given
>>> how many years it has taken until, e.g., Stephen Pelc has looked at
>>> recognizers, I am not sure we want to add another five years to get
>>> rid of REC-NUM.
>>
>> There's nothing preventing REC-SNUM and REC-DNUM being factors of
>> REC-NUM if that's what you wish.
>
>Moreover, they can be defined via REC-NUM
>
> : rec-snum ( addr len -- n tt-snum | 0 )
> rec-num ( 0 | n tt-snum | d tt-dnum )
> dup rec-snum = if exit then
> dup rec-dnum = if drop 2drop 0 exit then
> dup 0= if exit then -21 throw
> ;
> : rec-dnum ( addr len -- d tt-dnum | 0 )
> rec-num ( 0 | n tt-snum | d tt-dnum )
> dup rec-dnum = if exit then
> dup rec-snum = if drop drop 0 exit then
> dup 0= if exit then -21 throw
> ;

You probably mean:

: rec-snum ( c-addr u -- n RECTYPE-NUM | RECTYPE-NULL )
rec-num
\ dup rectype-num = if exit then \ unnecessary
dup rectype-dnum = if drop 2drop rectype-null exit then
\ no special handling for rectype-null
;

: rec-dnum ( c-addr u -- d RECTYPE-DNUM | RECTYPE-NULL )
rec-num
dup rectype-num = if 2drop rectype-null exit then
\ no special handling for rectype-dnum and rectype-null

;

>I would suggest to distinct a simple recognizer and a compound recognizer.
>
>A simple recognizer is a recognizer that may return tokens of some
>single type only.
>
>A compound recognizer is a recognizer that can return tokens of the
>different types.
>
>In the example above, rec-snum and rec-dnum are simple recognizers, and
>rec-num is a compound recognizer.

The question is: should we standardize REC-NUM, or REC-SNUM and
REC-DNUM, or all three?

What speaks for only standardizing REC-NUM? You can implement REC-NUM
based on NUMBER? (which exists in many systems) without refactoring
NUMBER?.

What speaks for standardizing REC-SNUM and REC-DNUM? It's
conceptually cleaner.

>NB: for a compound recognizer the priority of returned types should be
>specified.
>Since, in some cases a lexeme can be resolved into tokens of several
>(different) types. So, we have a conflict: what type should be returned
>by the corresponding compound recognizer. For such compound recognizer
>the priority of the returned types should be specified.

If you build the compound recognizer (aka recognizer sequence) from
simple recognizers, you get a priority of simple recognizers based on
the order of recognizers, not of the types. E.g., you could have a
sequence of three recognizers, the first returns RECTYPE-NUM, the
second RECTYPE-DNUM, the third RECTYPE-NUM. The third is only tried
if the first two fail.

>A one question is: may the default recognizer return a type beyond the
>specified types? (i.e., as an implementation defined option).

And may it contain other or more accepting recognizers? A proposal
that would forbid that would probably fail to find acceptance.
However, the proposal needs to be tight enough that one does not
always bump into problems when writing portable programs.

Ruvim

unread,

May 29, 2020, 11:08:02 AM5/29/20

to

On 2020-05-28 20:11, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2020-05-28 15:43, Alex McDonald wrote:

>>> There's nothing preventing REC-SNUM and REC-DNUM being factors of
>>> REC-NUM if that's what you wish.
>>
>> Moreover, they can be defined via REC-NUM
>>
>> : rec-snum ( addr len -- n tt-snum | 0 )
>> rec-num ( 0 | n tt-snum | d tt-dnum )
>> dup rec-snum = if exit then
>> dup rec-dnum = if drop 2drop 0 exit then
>> dup 0= if exit then -21 throw
>> ;
>> : rec-dnum ( addr len -- d tt-dnum | 0 )
>> rec-num ( 0 | n tt-snum | d tt-dnum )
>> dup rec-dnum = if exit then
>> dup rec-snum = if drop drop 0 exit then
>> dup 0= if exit then -21 throw
>> ;
>
> You probably mean:
>
> : rec-snum ( c-addr u -- n RECTYPE-NUM | RECTYPE-NULL )
> rec-num
> \ dup rectype-num = if exit then \ unnecessary
> dup rectype-dnum = if drop 2drop rectype-null exit then
> \ no special handling for rectype-null
> ;

I just promote the conception that RECTYPE-NULL should be equal to 0 and
so it doesn't need to have a distinct name =)

And I use the opaque 'tt-' prefix instead of the confusing 'rectype-'
prefix in the token type notation.

Also, although I didn't show it in the rec-num stack signature, I
supposed that rec-num can also return other types, e.g. a float number.
So, if it returns some other type with unknown size, we can only throw
an exception.

>> I would suggest to distinct a simple recognizer and a compound recognizer.
>>
>> A simple recognizer is a recognizer that may return tokens of some
>> single type only.
>>
>> A compound recognizer is a recognizer that can return tokens of the
>> different types.

Alternative variants for the terms: "single-type recognizer" and
"multi-type recognizer". For me, the former variants sound better.

>> In the example above, rec-snum and rec-dnum are simple recognizers, and
>> rec-num is a compound recognizer.
>
> The question is: should we standardize REC-NUM, or REC-SNUM and
> REC-DNUM, or all three?
>
> What speaks for only standardizing REC-NUM? You can implement REC-NUM
> based on NUMBER? (which exists in many systems) without refactoring
> NUMBER?.

Well, as we show above, you can also implement REC-SNUM and REC-DNUM
based on "NUMBER?".

Also, they all can be implemented based on standard ">NUMBER" word.

> What speaks for standardizing REC-SNUM and REC-DNUM? It's
> conceptually cleaner.

The question: should they understand prefixes (according to 3.4.1.3 Text
interpreter input number conversion)?

Perhaps, even more conceptually cleaner to have recognizers for numbers
in the simplest form (without prefixes for radix, dot for double, etc).

>> NB: for a compound recognizer the priority of returned types should be
>> specified.
>> Since, in some cases a lexeme can be resolved into tokens of several
>> (different) types. So, we have a conflict: what type should be returned
>> by the corresponding compound recognizer. For such compound recognizer
>> the priority of the returned types should be specified.
>
> If you build the compound recognizer (aka recognizer sequence) from
> simple recognizers, you get a priority of simple recognizers based on
> the order of recognizers, not of the types. E.g., you could have a
> sequence of three recognizers, the first returns RECTYPE-NUM, the
> second RECTYPE-DNUM, the third RECTYPE-NUM. The third is only tried
> if the first two fail.

Yes, it is the same idea in the different wording.
But in the general case a "recognizer sequence" is not essential.

For example:

: rec-num ( a u -- i*x tt | 0 ) { d:s }
s rec-snum
dup if exit then
s rec-dnum
;

We don't have a recognizer sequence (in Recognizer-rfc-D notion) here.

A recognizer for the classic text interpreter (i.e., the default
recognizer) may be defined as:

: rec-word-or-number ( a u -- i*x tt | 0 ) { d:s }
s rec-word
dup if exit then
s rec-number
;

How *this* recognizer can be formally described?
I can suggest something like the following.

Take (c-addr u). Recognize the lexeme identified by (c-addr u). From
the possible tokens select the token of the first matched type from
the following: tt-word, tt-snum, tt-dnum, tt-fnum. If neither type
is matched, return 0. Otherwise return the selected token and its
type.

(I use the 'tt-' prefix instead of the confusing 'rectype-' prefix in
the token type notation).

So, we have formally specified the relative priority of the returned
token types. E.g. if a lexeme can be resolved as a Forth word, and as a
number, the preferred type is a Forth word.

>> A one question is: may the default recognizer return a type beyond the
>> specified types? (i.e., as an implementation defined option).
>
> And may it contain other or more accepting recognizers?

What do you mean by "more accepting"?

> A proposal
> that would forbid that would probably fail to find acceptance.

> However, the proposal needs to be tight enough that one does not
> always bump into problems when writing portable programs.

Yes. Also we should show examples of such problems in the rationale for
certain choices.

--
Ruvim

Alex McDonald

unread,

May 29, 2020, 11:09:56 AM5/29/20

to

On 28-May-20 17:25, Anton Ertl wrote:
> Alex McDonald <al...@rivadpm.com> writes:
>> On 28-May-20 08:25, Anton Ertl wrote:
>>> Alex McDonald <al...@rivadpm.com> writes:
>>>> It will also attempt, in no specific order: to reduce the number of
>>>> words required to implement the proposal; do some significant bike
>>>> shedding around names (for example, removing the ambiguity of RECTYPE
>>>> and avoiding words like or containing NULL);
>>>
>>> Note that the current names are the result of a bikeshedding session
>>> at a Forth 200x meeting. The fact that you are not happy with the
>>> result and that even a committee member who was present has second
>>> thoughts about it shows the pointlessness of discussing names (that's
>>> why it's called bikeshedding).
>>>
>>> And the big problem is that it detracts from more substantial issues.
>>
>> I'm not alone in thinking that describing it as bikeshedding is not
>> helpful. I would find a words like NON_AGNITA obfuscatory when
>> discussing unrecognized tokens, much as I do for this description:
>>
>> RECTYPE-NULL ( -- RECTYPE-NULL ) RECOGNIZER
>> The null data type id. It is to be used if no other data type id is
>> applicable but one is needed.
>
> Nobody proposed NON_AGNITA. However, I am sure that Matthias Trute

I didn't say they did.

> originally did not intend to obfuscate when he called this word
> R:FAIL. Neither did the Forth-200x committee when we agreed on
> RECTYPE-NULL.

So although Matthias did not mean to obfuscate with R:FAIL, the
committee decided to change it to RECTYPE-NULL. Was it not clear enough?
Did it cause confusion?

I seem to remember yours was the dissenting voice but others saw fit to
change it. Ergo, if I am taking this forward (see the foot of this post)
then I propose to do the same, and you will continue to point out that
they are bikeshed plans. I'll have to live with it.

> What makes you think that you will find a name that
> nobody finds obfuscatory? I am sure you will produce another set of
> names that lots of people disagree with. And the end result will be
> that the whole discussion becomes more and more confusing, because old
> contributions to the discussion become incomprehensible.

I proposed some time back UNRECOGNIZED for RECTYPE-NULL. Let's try it:

===
A system provided data type information is called RECTYPE-NULL. It is
used if no other one [sic] is applicable.

There is a system provided data type named UNRECOGNIZED. It is returned
by the system if the parsed word is not recognized by any recognizer.

===
RECOGNIZE ( addr len rec-seq-id -- i*x RECTYPE-DATATYPE | RECTYPE-NULL )
RECOGNIZER
Apply the string at "addr/len" to the elements of the recognizer set
identified by rec-seq-id. Terminate the iteration if either a parsing
word returns a data type id that is different from RECTYPE-NULL or the
set is exhausted. In this case return RECTYPE-NULL.

RECOGNIZE ( addr len <rec-seq-id> -- i*x <rectype> | UNRECOGNIZED )
RECOGNIZER
Apply the string at "addr len" to the elements of the recognizer set
identified by <rec-seq-id>. Terminate the iteration if either a parsing
word returns a <rectype> that is different from UNRECOGNIZED. If the set
REC-SEQ-ID is exhausted, return UNRECOGNIZED.

===

RECTYPE-NULL ( -- RECTYPE-NULL ) RECOGNIZER
The null data type id. It is to be used if no other data type id is

applicable but one is needed. Its associated methods perform system
specific error actions. The actual numeric value is system dependent.

UNRECOGNIZED ( -- <rectype> ) RECOGNIZER
The <rectype> returned to the system by a recognizer when it fails to
recognize the string in the parse area. The UNRECOGNIZED <rectype> is an
system specific opaque value.

===
REC-NT ( addr len -- NT RECTYPE-NT | RECTYPE-NULL )

REC-NT ( addr len -- NT RECTYPE-NT | UNRECOGNIZED )
===

As you might expect, I find the word UNRECOGNIZED much clearer, as it
says exactly what the outcome has been; the string is unrecognized.

>
>> A "null data type id" is close to word salad
>
> "data type id" is defined earlier, so no, it is not. I find the

"null data type id" IMHO (to be contrasted with your HO) is word salad.
The words are, of course, individually understandable and the phrase
"data type id" gets an explanation. But putting them together does not
ensure that it makes sense as a whole.

> definition of "data type id" suboptimal, however. In any case,
> compare with the version before the renaming
> <http://amforth.sourceforge.net/pr/Recognizer-rfc-C.html>. IIRC the
> committee only produced word names, but Bernd Paysan (who was the
> committee's contact with Matthias Trute) may be able to tell you what
> he wrote to him after that meeting.

I hope Bernd pops up here; it would be useful to know his intentions.

>
>>>> remove the requirement for
>>>> a fixed name REC-NUM REC-FLOAT
>>>
>>> I think that one of the failures of Forth-94 (and 2012) is that there
>>> is no default search order (and that's despite having standard
>>> wordlist names). The problem is that there are cases where a program
>>> wants to change the existing search order, but the result is
>>> system-specific.
>>
>> Given a scenario where I have a recognizer for assembler opcodes that
>> start with V for vector ops, I might wish to have ALSO ASMVEX use this
>> recognizer but only when searching ASMVEX. Is this what you mean?
>>
>> Or is it a more general observation about the search order wordset, a
>> lack of a default order, and repeating its consequent design shortcomings?
>
> I don't remember a scenario, but I remember that I found this a
> hindrance several times over the years.
>
> Your example mixes up recognizers and vocabularies. I meant that this
> was already a problem without recognizers.

I see.

(As an aside, recognizers can do all of what the search order wordset
does, and I remember vaguely a discussion to this effect. I have
associated a recognizer with a wordlist for some experiments, but at
this early stage I can't say whether it works or not. It's certainly not
of interest until we discuss REC-FIND.)

>
> For recognizers, I expect even more such problems. E.g., if I want to
> replace the system's integer recognizer or float recognizer with
> something else (e.g., something that does not accept system-specific
> extensions), I need to know what the recognizer I want to replace is
> called. Or if I want to recreate a recognizer sequence from
> recognizers, I need the names of the recognizers.
>
>> If it's the latter, then yes, it would be possible to specify a default
>> recognizer order; but that assumes a fixed recognizer nameset if there
>> is to be only one list. Perhaps two lists are required; a default list
>> of unnamed recognizers that provide base functionality, and a list of
>> user defined recognizers which is run through first and can replace the
>> default set's token recognition.
>
> That sounds overly complicated.
>
> Currently the standard describes exactly what is recognized: words,
> integers (singles and doubles), floats. What's wrong if we specify
> named recognizers for that?

Will that push us to fixed names for complex numbers (REC-COMPLEX) and
so on?

>
>>> I hope that the recognizer proposal will eventually be tight enough to
>>> avoid this problem. I think we need fixed-name standard recognizers
>>> to achieve this.
>>
>> OK, but it's possible to write a perfectly adequate specification
>> without mentioning them by name.
>
> Maybe, but I am doubtful. But anyway, why would you want to avoid
> standardizing the names of the recognizers that every Forth system
> with recognizers has?

Perhaps the proposal needs to be in two parts; the recognizer itself and
the standard names for system recognizers. It would allow us to focus on
what is important first.

>
>>> I am pretty unhappy about REC-NUM, and would rather prefer that
>>> single-cell and double-cell recognizers be separated, but REC-NUM is
>>> the easiest way to transition from NUMBER?; separate recognizers would
>>> require more substantial refactorings in many Forth systems, possibly
>>> affecting more lines than implementing recognizers themselves. Given
>>> how many years it has taken until, e.g., Stephen Pelc has looked at
>>> recognizers, I am not sure we want to add another five years to get
>>> rid of REC-NUM.
>>
>> There's nothing preventing REC-SNUM and REC-DNUM being factors of
>> REC-NUM if that's what you wish.
>
> The question is what is standardized. If we standardize REC-NUM, we
> probably will not standardize REC-SNUM and REC-DNUM (because there is
> no common practice). If we standardize REC-SNUM and REC-DNUM, there
> is hardly any need for REC-NUM. Ok, you might argue that REC-NUM is
> agnostic to whether the system supports the double-number wordset or
> not, but is there any system that implements Forth-2012 that does not
> support the double-number wordset.

A good argument for splitting the current proposal and making that part
of a second proposal.

>
> - anton
>

As to proposing version 5 (or E) of the recognizer proposal, I'm going
to wait a little while to see if Bernd or Ulrich turns up, but I don't
want to let that stop the discussion.

--
Alex

Ruvim

unread,

May 29, 2020, 11:57:05 AM5/29/20

to

On 2020-05-29 18:09, Alex McDonald wrote:
[...]

> I proposed some time back UNRECOGNIZED for RECTYPE-NULL.

Let's try it:
>
> ===
> A system provided data type information is called RECTYPE-NULL. It is
> used if no other one [sic] is applicable.
>
> There is a system provided data type named UNRECOGNIZED. It is returned
> by the system if the parsed word is not recognized by any recognizer.

NB: this text mixes notations and Forth words, and conflicts to the
language of the Standard.

data type:
An identifier for the set of values that a data object may have.

This "identifier" is not a Forth word, and not a number, but a notation
in the specifications.

See also 3.1 Data types
https://forth-standard.org/standard/usage#usage:data

[...]

> As you might expect, I find the word UNRECOGNIZED much clearer, as it
> says exactly what the outcome has been; the string is unrecognized.

For me, as well as for you, UNRECOGNIZED sounds better than
RECTYPE-NULL. But far better to eliminate this "data type" at all.

[...]

>>> A "null data type id" is close to word salad
>>
>> "data type id" is defined earlier, so no, it is not. I find the
>
> "null data type id" IMHO (to be contrasted with your HO) is word salad.

+1

[...]

> Perhaps the proposal needs to be in two parts; the recognizer itself and
> the standard names for system recognizers. It would allow us to focus on
> what is important first.

+1

I use this approach in my proposal too.
A slightly outdated version can be found at
https://github.com/ruv/forth-design-exp/blob/master/docs/resolver-api.md

[...]

>> The question is what is standardized. If we standardize REC-NUM, we
>> probably will not standardize REC-SNUM and REC-DNUM (because there is
>> no common practice). If we standardize REC-SNUM and REC-DNUM, there
>> is hardly any need for REC-NUM. Ok, you might argue that REC-NUM is
>> agnostic to whether the system supports the double-number wordset or
>> not, but is there any system that implements Forth-2012 that does not
>> support the double-number wordset.
>
> A good argument for splitting the current proposal and making that part
> of a second proposal.

+1

We should extract the essential most basic part.

--
Ruvim

Anton Ertl

unread,

May 29, 2020, 1:17:36 PM5/29/20

to

Even if we assume that RECTYPE-NULL is 0, your version contains bugs.
In particular, it calls recognizers where it should return rectypes.

>And I use the opaque 'tt-' prefix instead of the confusing 'rectype-'
>prefix in the token type notation.

It does not help the discussion if everyone uses his own favourite
naming and favourite terminology. In particular, if you want to make
point A, and you use your favourite terminology and names (your points
B and C), you will fail to get point A across. And of course, you
will not make progress on points B and C, because everybody else has
their own, different names and terminology, and will ignore this
aspect of your posting anyway.

>Also, although I didn't show it in the rec-num stack signature, I
>supposed that rec-num can also return other types, e.g. a float number.

The proposed REC-NUM does not.

>So, if it returns some other type with unknown size, we can only throw
>an exception.

That would be wrong for implementing a recognizer; it should return
RECTYPE-NULL when the string is not recognized.

>>> I would suggest to distinct a simple recognizer and a compound recognizer.
>>>
>>> A simple recognizer is a recognizer that may return tokens of some
>>> single type only.
>>>
>>> A compound recognizer is a recognizer that can return tokens of the
>>> different types.
>
>
>Alternative variants for the terms: "single-type recognizer" and
>"multi-type recognizer". For me, the former variants sound better.

Certainly, especially because it's not about the types. In
particular, in Gforth there are a number of recognizers that produce
RECTYPE-NUM, e.g., REC-TICK (which produces the xt of a word).

>> What speaks for only standardizing REC-NUM? You can implement REC-NUM
>> based on NUMBER? (which exists in many systems) without refactoring
>> NUMBER?.
>
>Well, as we show above, you can also implement REC-SNUM and REC-DNUM
>based on "NUMBER?".

Yes, by defining it on top of REC-NUM. It would be pretty perverse to
have REC-SNUM and REC-DNUM in the recognizer sequence if they are
implemented in that way.

>Also, they all can be implemented based on standard ">NUMBER" word.

Or on standard C@. >NUMBER is much lower-level than NUMBER? or
REC-NUM.

>> What speaks for standardizing REC-SNUM and REC-DNUM? It's
>> conceptually cleaner.
>
>The question: should they understand prefixes (according to 3.4.1.3 Text
>interpreter input number conversion)?

Yes.

>Perhaps, even more conceptually cleaner to have recognizers for numbers
>in the simplest form (without prefixes for radix, dot for double, etc).

That's also possible. The question is if this results in a good
factoring.

However, I advocate requiring a prefix for doubles (because the
treatment of such numbers as doubles confuses people used to other
languages, where the same syntax means a float). If we have separate
recognizers for prefixless and prefixful doubles, the user could
control whether he wants such doubles or not.

In development Gforth prefixless doubles produce warnings by default,
and this is part of REC-NUM.

>But in the general case a "recognizer sequence" is not essential.
>
>For example:
>
> : rec-num ( a u -- i*x tt | 0 ) { d:s }
> s rec-snum
> dup if exit then
> s rec-dnum
> ;
>
>We don't have a recognizer sequence (in Recognizer-rfc-D notion) here.

It is a hand-coded recognizer sequence. Given that one needs to
construct a new recognizer sequence if you want to use a new
recognizer, it is useful to have support for defining recognizer
sequences.

Bernd Paysan wants to use his dynamically resizable "stacks" abstract
data type for this. I think that allotted recognizer sequences (which
may even be implemented as colon definition like you do by hand above)
are good enough.

>>> A one question is: may the default recognizer return a type beyond the
>>> specified types? (i.e., as an implementation defined option).
>>
>> And may it contain other or more accepting recognizers?
>
>What do you mean by "more accepting"?

It accepts more strings. E.g., does it accept "1.0" as double number?
Does it accept "1,"? SwiftForth accepts both, so a standard REC-NUM
or REC-DNUM should be allowed to accept them, too.

>> However, the proposal needs to be tight enough that one does not
>> always bump into problems when writing portable programs.
>
>Yes. Also we should show examples of such problems in the rationale for
>certain choices.

Yes.

Ruvim

unread,

May 29, 2020, 3:14:30 PM5/29/20

to

On 2020-05-29 19:36, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2020-05-28 20:11, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:
>>>> On 2020-05-28 15:43, Alex McDonald wrote:
>>
>>>>> There's nothing preventing REC-SNUM and REC-DNUM being factors of
>>>>> REC-NUM if that's what you wish.
>>>>
>>>> Moreover, they can be defined via REC-NUM
>>>>
>>>> : rec-snum ( addr len -- n tt-snum | 0 )
>>>> rec-num ( 0 | n tt-snum | d tt-dnum )
>>>> dup rec-snum = if exit then
>>>> dup rec-dnum = if drop 2drop 0 exit then
>>>> dup 0= if exit then -21 throw
>>>> ;

[...]

>>>
>>> You probably mean:
>>>
>>> : rec-snum ( c-addr u -- n RECTYPE-NUM | RECTYPE-NULL )
>>> rec-num
>>> \ dup rectype-num = if exit then \ unnecessary
>>> dup rectype-dnum = if drop 2drop rectype-null exit then
>>> \ no special handling for rectype-null
>>> ;
>>
>> I just promote the conception that RECTYPE-NULL should be equal to 0 and
>> so it doesn't need to have a distinct name =)
>
> Even if we assume that RECTYPE-NULL is 0, your version contains bugs.
> In particular, it calls recognizers where it should return rectypes.

You are right, my typo. I meant rectype-*

>> And I use the opaque 'tt-' prefix instead of the confusing 'rectype-'
>> prefix in the token type notation.
>
> It does not help the discussion if everyone uses his own favourite
> naming and favourite terminology. In particular, if you want to make
> point A, and you use your favourite terminology and names (your points
> B and C), you will fail to get point A across. And of course, you
> will not make progress on points B and C, because everybody else has
> their own, different names and terminology, and will ignore this
> aspect of your posting anyway.

Well, I agree.

So, how can we fix the issues in the terminology?

I think they should be fixed even before discussion about mechanisms,
semantics and names. Perhaps we should start from the scratch, and
gradually append things from the current proposals step by step.

>> Also, although I didn't show it in the rec-num stack signature, I
>> supposed that rec-num can also return other types, e.g. a float number.
>
> The proposed REC-NUM does not.

OK, in such case no need for other checks certainly.

[...]

>>> What speaks for standardizing REC-SNUM and REC-DNUM? It's
>>> conceptually cleaner.
>>
>> The question: should they understand prefixes (according to 3.4.1.3 Text
>> interpreter input number conversion)?
>
> Yes.
>
>> Perhaps, even more conceptually cleaner to have recognizers for numbers
>> in the simplest form (without prefixes for radix, dot for double, etc).
>
> That's also possible. The question is if this results in a good
> factoring.

It seems for me that such factoring is reasonable. Not to implement the
default recognizer, but to provide a useful library of recognizers.
I implemented this approach, see at https://git.io/JfKcd

There resolvers are implemented, but the factoring idea for recognizers
is the same. The default resolver is defined at https://git.io/JfKCr

> However, I advocate requiring a prefix for doubles (because the
> treatment of such numbers as doubles confuses people used to other
> languages, where the same syntax means a float). If we have separate
> recognizers for prefixless and prefixful doubles, the user could
> control whether he wants such doubles or not.

Yes, it is worthwhile.

>
> In development Gforth prefixless doubles produce warnings by default,
> and this is part of REC-NUM.
>

>> But in the general case a "recognizer sequence" is not essential.
>>
>> For example:
>>
>> : rec-num ( a u -- i*x tt | 0 ) { d:s }
>> s rec-snum
>> dup if exit then

correction: dup if exit then drop

>> s rec-dnum
>> ;
>>
>> We don't have a recognizer sequence (in Recognizer-rfc-D notion) here.
>
> It is a hand-coded recognizer sequence.

But it is shorter than create a recognizer sequence!
Using the useful "?et" control flow word it becomes:

: rec-num ( a u -- i*x tt ) { d:s }
s rec-snum ?et s rec-dnum
;

VS

2 new-recognizer-sequence value seq-req-num
' rec-dnum ' rec-snum 2 seq-req-num set-recognizer
: rec-num ( a u -- i*x tt ) seq-req-num recognize ;

Yes, we can also have a special defining word that makes things shorter.
Something like

' rec-dnum ' rec-snum 2 create-recognizer rec-num

Or

recognizer: rec-num rec-snum rec-dnum ;

But I'm not convinced that it is worthwhile.

> Given that one needs to construct a new recognizer sequence
> if you want to use a new recognizer,
> it is useful to have support for defining recognizer
> sequences.

I don't catch your argument, could you please elaborate an example?

> Bernd Paysan wants to use his dynamically resizable "stacks" abstract
> data type for this. I think that allotted recognizer sequences (which
> may even be implemented as colon definition like you do by hand above)
> are good enough.

I agree with you.

--
Ruvim

Ruvim

unread,

May 29, 2020, 4:28:39 PM5/29/20

to

On 2020-05-29 22:14, Ruvim wrote:
> On 2020-05-29 19:36, Anton Ertl wrote:

[...]

>> Bernd Paysan wants to use his dynamically resizable "stacks" abstract
>> data type for this. I think that allotted recognizer sequences (which
>> may even be implemented as colon definition like you do by hand above)
>> are good enough.
>
> I agree with you.

By the way, nothing prevent a user to use dynamically resizable "stacks"
to manage his own compound recognizers. Also, he can incorporate the
system's default recognizer (or other recognizers from the library) into
his stack and set the corresponding compound recognizer as system's
recognizer. But, IMO, the tools to manage compound recognizers just are
out of the scope of the Recognizers specification.

--
Ruvim

none albert

unread,

May 30, 2020, 5:47:15 AM5/30/20

to

In article <rar8gg$r1p$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
<SNIP>

>Perhaps, even more conceptually cleaner to have recognizers for numbers
>in the simplest form (without prefixes for radix, dot for double, etc).

First of all we should scrap the picture of starting Forth where
the input stream is separated by words, i.e. blank space.
(The handsome fellow with the moustache.)
Instead we have tokens. A token is in the dictionary but it is
recognized without being followed by blank space.
So the guy is replaced by a database engine that looks to the
remaining part of the input stream.

My ideal still is:
0 1 .. 9 are tokens. If found they execute (NUMBER).
They parse to next blank space, *THEMSELVES* .
So the interpreter no longer parses, parsing becomes modular.
If the exponent sign `` _ '' 1) is present it is converted to floating point
else if a `` . '' is present it is converted to double else it is converted
to single.
Loading a floating point extension, also revectors (NUMBER), not
a big deal.

The basic loop becomes easier
BEGIN find-token
immediate-or-executing IF EXECUTE ELSE COMPILE THEN
AGAIN
You can add more of those self-parsing tokens, e.g. " for strings,
' for dictionary entries, % for binary numbers etc.
I have added : for labels in my assembler.

1]
We can't have an exponent sign that could be mistaken for a digit
in this system.
Other advantages:
- not introducing the nasty case sensitivity issue
- hex floats are possible, Hex floats mostly are exact!
Even better is ' allowing a huge value for BASE.
123'0 instead of 123E0 . It looks as good.

>
>--
>Ruvim

Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

none albert

unread,

May 30, 2020, 5:53:08 AM5/30/20

to

I hate seeing separate name spaces for recognizers apart from the
wordlist mechanism. If we can't load a facility recognizers and all,
then using it after putting it in the search order, and then not using it
after removing it from the search order, IMO we are on the wrong trail.
There is a minimal search order (ONLY) which is a perfect place
for default recognizers.
>
>
>--
>Ruvim

Anton Ertl

unread,

May 30, 2020, 12:37:58 PM5/30/20

to

Alex McDonald <al...@rivadpm.com> writes:
>On 28-May-20 17:25, Anton Ertl wrote:
>> Alex McDonald <al...@rivadpm.com> writes:
>>> On 28-May-20 08:25, Anton Ertl wrote:
>>>> Alex McDonald <al...@rivadpm.com> writes:
>>>>> It will also attempt, in no specific order: to reduce the number of
>>>>> words required to implement the proposal; do some significant bike
>>>>> shedding around names (for example, removing the ambiguity of RECTYPE
>>>>> and avoiding words like or containing NULL);
>>>>
>>>> Note that the current names are the result of a bikeshedding session
>>>> at a Forth 200x meeting. The fact that you are not happy with the
>>>> result and that even a committee member who was present has second
>>>> thoughts about it shows the pointlessness of discussing names (that's
>>>> why it's called bikeshedding).
>>>>
>>>> And the big problem is that it detracts from more substantial issues.
>>>
>>> I'm not alone in thinking that describing it as bikeshedding is not
>>> helpful. I would find a words like NON_AGNITA obfuscatory when
>>> discussing unrecognized tokens, much as I do for this description:
>>>
>>> RECTYPE-NULL ( -- RECTYPE-NULL ) RECOGNIZER
>>> The null data type id. It is to be used if no other data type id is
>>> applicable but one is needed.
>>
>> Nobody proposed NON_AGNITA. However, I am sure that Matthias Trute
>
>I didn't say they did.

So why mention it?

>> originally did not intend to obfuscate when he called this word
>> R:FAIL. Neither did the Forth-200x committee when we agreed on
>> RECTYPE-NULL.
>
>So although Matthias did not mean to obfuscate with R:FAIL, the
>committee decided to change it to RECTYPE-NULL. Was it not clear enough?
>Did it cause confusion?

The committee is as prone to bikeshedding as everybody else. My hope
was that by doing it in that session, this nonsense would be behind
us, and we could focus on the content. Well, it seemed to work for
two years or so.

>I seem to remember yours was the dissenting voice but others saw fit to
>change it.

I don't remember dissenting. I remember relaxing and letting the
unavoidable happen.

And I actually found having both R:FLOAT and REC:FLOAT confusing. Is
it better now?

R:FLOAT became RECTYPE-FLOAT
REC:FLOAT became REC-FLOAT

>I hope Bernd pops up here; it would be useful to know his intentions.

Better send him an email. I don't think he reads Usenet regularly
these days. Neither does Ulrich Hoffmann.

OTOH, you could also lead this discussion on forth-standard.org, which
may increase their willingness to participate.

>(As an aside, recognizers can do all of what the search order wordset
>does, and I remember vaguely a discussion to this effect. I have
>associated a recognizer with a wordlist for some experiments, but at
>this early stage I can't say whether it works or not. It's certainly not
>of interest until we discuss REC-FIND.)

Sure, if I did a Chuck Moore and ignored existing practice, I would
make each wordlist a recognizer, and have a search order or somesuch
for recognizers. However, you want to work on a Forth-200x proposal,
and you have to consider existing practice. And the existing practice
is that we have systems that do not have recognizers in the search
order, and programs that would break if the existing recognizers were
in the search order. The solution in the proposal has been to
separate the sequence of recognizers from the search order; and I
don't see how we could make an integrated solution
backwards-compatible.

If you want to read about some other design considerations,
<http://amforth.sourceforge.net/pr/Recognizer-rfc-C.html> contains a
lot of discussion (much of which was eliminated in
<http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html> after a
complaint by Stephen Pelc about the length. Of course, he then
complained that there is not enough explanatory material in the
proposal).

You can find even more design decision in
<http://www.euroforth.org/ef16/papers/ertl-recognizers.pdf>

>> Currently the standard describes exactly what is recognized: words,
>> integers (singles and doubles), floats. What's wrong if we specify
>> named recognizers for that?
>
>Will that push us to fixed names for complex numbers (REC-COMPLEX) and
>so on?

If you want to standardize a complex-number recognizer, yes, I would
suggest that you give it a name.

If somebody writes a library for recognizing complex numbers, then
they will pick a name of their choosing.

>Perhaps the proposal needs to be in two parts; the recognizer itself and
>the standard names for system recognizers.

It already is: The proposal contains recognizer and recognizer
extension words, and the various recognizers as well as all rectypes
(except RECTYPE-NULL) are in the recognizer extension words.

>It would allow us to focus on
>what is important first.

You can do that anyway: Identify a number of things that you consider
open questions (Matthias Trute considered the proposal finished), and
than start discussing them one by one.

>> The question is what is standardized. If we standardize REC-NUM, we
>> probably will not standardize REC-SNUM and REC-DNUM (because there is
>> no common practice). If we standardize REC-SNUM and REC-DNUM, there
>> is hardly any need for REC-NUM. Ok, you might argue that REC-NUM is
>> agnostic to whether the system supports the double-number wordset or
>> not, but is there any system that implements Forth-2012 that does not
>> support the double-number wordset.
>
>A good argument for splitting the current proposal and making that part
>of a second proposal.

My experience with trying to split the uncontroversial parts of a
proposal into a separate proposal is that the uncontroversial proposal
found exactly zero support.

Resisted: http://www.forth200x.org/directories.html
Uncontroversial, and unsupported: http://www.forth200x.org/directories1.html

As I wrote, without standard recognizers, the whole proposal is much
less useful, just as the search order is less useful because it lacks
a default search order (but at least it standardized FORTH-WORDLIST
and FORTH).

Why would anybody implement or use a proposal that is incomplete?
Then, why vote for it?

If you don't know whether to choose REC-NUM or REC-SNUM and REC-DNUM,
go for existing practice: Gforth has REC-NUM, not REC-SNUM nor
REC-DNUM.

Alex McDonald

unread,

May 30, 2020, 5:32:16 PM5/30/20

to

Dear God. You're meta-bikeshedding. Bravo.

I'll address the other more pertinent issues over the next few weeks.

--
Alex

Ruvim

unread,

May 30, 2020, 7:51:28 PM5/30/20

to

On 2020-05-30, albert wrote:
> In article <rar8gg$r1p$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
> <SNIP>
>> Perhaps, even more conceptually cleaner to have recognizers for numbers
>> in the simplest form (without prefixes for radix, dot for double, etc).
>
> First of all we should scrap the picture of starting Forth where
> the input stream is separated by words, i.e. blank space.

Why do you think that we should? It will not change that Forth source
code is words separated by blanks, and you parse (scan) these words in
one place or in another place.

> (The handsome fellow with the moustache.)
> Instead we have tokens. A token is in the dictionary but it is
> recognized without being followed by blank space.

You use the "token" term in a manner that conflicts with the language of
the Standard. Do you suggest to change this language?

> So the guy is replaced by a database engine that looks to the
> remaining part of the input stream.

> My ideal still is:
> 0 1 .. 9 are tokens. If found they execute (NUMBER).
> They parse to next blank space, *THEMSELVES* .
> So the interpreter no longer parses, parsing becomes modular.
> If the exponent sign `` _ '' 1) is present it is converted to floating point
> else if a `` . '' is present it is converted to double else it is converted
> to single.
> Loading a floating point extension, also revectors (NUMBER), not
> a big deal.
>
> The basic loop becomes easier
> BEGIN find-token
> immediate-or-executing IF EXECUTE ELSE COMPILE THEN
> AGAIN
> You can add more of those self-parsing tokens, e.g. " for strings,
> ' for dictionary entries, % for binary numbers etc.
> I have added : for labels in my assembler.

1. How can you specify chaining? I.e., if %ABC is not a binary number,
can it be passed into other recognizers?

2. How to handle the formats that don't have a fixed prefix?
E.g. the names partially qualified by word lists as
module::sub-module::word
?

3. Can your mechanism do something that the recognizers mechanism cannot do?

4. Could you give a link to your implementation of the "find-token" and
"immediate-or-executing" words?

--
Ruvim

Ruvim

unread,

May 30, 2020, 8:42:10 PM5/30/20

to

On 2020-05-30 12:53, albert wrote:
> In article <rarr9l$207$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> On 2020-05-29 22:14, Ruvim wrote:
>>> On 2020-05-29 19:36, Anton Ertl wrote:
>> [...]
>>>> Bernd Paysan wants to use his dynamically resizable "stacks" abstract
>>>> data type for this.

[...]

>> By the way, nothing prevent a user to use dynamically resizable "stacks"
>> to manage his own compound recognizers. Also, he can incorporate the
>> system's default recognizer (or other recognizers from the library) into
>> his stack and set the corresponding compound recognizer as system's
>> recognizer. But, IMO, the tools to manage compound recognizers just are
>> out of the scope of the Recognizers specification.
>
> I hate seeing separate name spaces for recognizers apart from the
> wordlist mechanism.

What is your rationale?

I implemented/used a handful various recognizer mechanisms, and, by my
experience, it is better to not bound additional functionality to the
word lists.

> If we can't load a facility recognizers and all,
> then using it after putting it in the search order, and then not using it
> after removing it from the search order, IMO we are on the wrong trail.

Don't understand what do you mean here.

> There is a minimal search order (ONLY) which is a perfect place
> for default recognizers.

Yes, if recognizers are bound to word lists and the search order.

--
Ruvim

none albert

unread,

May 31, 2020, 5:49:09 AM5/31/20

to

In article <raurhu$8td$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>On 2020-05-30, albert wrote:
>> In article <rar8gg$r1p$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> <SNIP>
>>> Perhaps, even more conceptually cleaner to have recognizers for numbers
>>> in the simplest form (without prefixes for radix, dot for double, etc).
>>
>> First of all we should scrap the picture of starting Forth where
>> the input stream is separated by words, i.e. blank space.
>
>Why do you think that we should? It will not change that Forth source
>code is words separated by blanks, and you parse (scan) these words in
>one place or in another place.

It changes the way you look at things.

>
>
>> (The handsome fellow with the moustache.)
>> Instead we have tokens. A token is in the dictionary but it is
>> recognized without being followed by blank space.
>
>You use the "token" term in a manner that conflicts with the language of
>the Standard. Do you suggest to change this language?

The hell I am! Of course. This is a totally different paradigm.
Resulting programs can be quite ISO compatible though,
showing that it is to a large extent an implementation issue.
(Kudo's to the writers of the 1994 ISO standard.)

The handsome fellow is quite large but he sits "under the hood".

>> So the guy is replaced by a database engine that looks to the
>> remaining part of the input stream.
>
>
>
>> My ideal still is:
>> 0 1 .. 9 are tokens. If found they execute (NUMBER).
>> They parse to next blank space, *THEMSELVES* .
>> So the interpreter no longer parses, parsing becomes modular.
>> If the exponent sign `` _ '' 1) is present it is converted to floating point
>> else if a `` . '' is present it is converted to double else it is converted
>> to single.
>> Loading a floating point extension, also revectors (NUMBER), not
>> a big deal.
>>
>> The basic loop becomes easier
>> BEGIN find-token
>> immediate-or-executing IF EXECUTE ELSE COMPILE THEN
>> AGAIN
>> You can add more of those self-parsing tokens, e.g. " for strings,
>> ' for dictionary entries, % for binary numbers etc.
>> I have added : for labels in my assembler.
>
>1. How can you specify chaining? I.e., if %ABC is not a binary number,
>can it be passed into other recognizers?

Chaining is impossible. I consider that an advantage.
If you have seen %, and the remainder is not a binary number,
it is an error, full stop. If the word starts with a decimal
digit, and it is not a number, it is an error, full stop.
Maybe I have seen too many other languages, where I get
quite used to that behaviour. Imagine how much
better your life will be, once you're used to it.

What this means is that if you handle reals and insist in
writing 1E0i0E0 instead of 1E0 0E0 i or 1E0 i 0E0 ,
you have to replace (revector) that handling.
We have a mechanism for that: ALSO COMPLEX.
That supposes the recognition mechanism obeys search order,
which is integral to the idea.

Suppose you could change the meaning of the preceeding part of
a c-program by inserting #include some-file in the middle of
char aap[] = {
'a' ,
'b' ,
/* insert you favorite include file here */
'c' ,
'd' ,
}
There are some things you don't want to want, or need to need.

>
>2. How to handle the formats that don't have a fixed prefix?
>E.g. the names partially qualified by word lists as
> module::sub-module::word
>?

I doubt one must insist `` handled module::sub-module::word ''
handled by the guy with the moustache as one word.
If module:: is a token that maybe install a search order where
sub-module:: is a token that is recognized by the database
engine, this could work. Without detail it is guessing.
I want to keep the sequential aspect of Forth.
If the last ::word suddenly changes the meaning of the preceeding
module:: etc. I would loose track of what is going on.
Most other people and compilers would also.

An example of what you don't want is Intel's
1234 ..... (wait for it) .... H
where the H changes the preceeding 1234 to a hex number

>
>3. Can your mechanism do something that the recognizers mechanism cannot do?

Never forget, we have leeway to write a program to solve problems.
There are myriad's of ways to solve a problem, and the simplest
one is preferable.

So the answer is probably not, but I reject the question.
I can't think of a parsing problem that couldn't be handled.
- It can parse Pascal (I published a demonstration program).
I mean Pascal as defined in a language definition document,
not a Forth adapted syntax.
- I've used it for labels in a two pass assembler.
(My ciasdis can reverse engineer my 64 bit Forth lina64,
generating labels and Forth headers, to a true assembler file.
It has labels where appropriate such that code can be
inserted or deleted anywhere. The assembler is two pass,
naturally. )

>
>4. Could you give a link to your implementation of the "find-token" and
>"immediate-or-executing" words?

This is pseudo code. I'm afraid you have to use your imagination.
You can run lina or wina, and use SEE to inspect code of
an actual implementation, which is more messy.

Thanks for actually looking into my idea's .

m...@iae.nl

unread,

Jun 1, 2020, 2:02:14 AM6/1/20

to

Thanks. Please do this type of thoughtful explanation
more often.

Given alternatives, I'd always go for the simplest
alternative. In an working implementation unforeseen
problems will make it an ugly mess anyway (at least
it will be under the hood then).

Your way undisputably can be plugged into a working Forth
without running into nasty surprises down the track, while
diminishing complexity already in the kernel. Associating
special linguistic behavior to VOCABULARYs looks wonderfully
natural and Forth-like to me.

Do you have some examples of what Forth source looks like
using this paradigm? (Please keep it basic -- Forth itself
is more than able at creating complexity from very simple
concepts.)

-marcel

Ruvim

unread,

Jun 1, 2020, 10:15:49 AM6/1/20

to

On 2020-05-31 12:49, albert wrote:
> In article <raurhu$8td$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> On 2020-05-30, albert wrote:

[...]

>>> My ideal still is:
>>> 0 1 .. 9 are tokens. If found they execute (NUMBER).

>>> They parse to next blank space, *THEMSELVES* .[...]

>> 1. How can you specify chaining? I.e., if %ABC is not a binary number,
>> can it be passed into other recognizers?
>
> Chaining is impossible. I consider that an advantage.
> If you have seen %, and the remainder is not a binary number,
> it is an error, full stop.

> If the word starts with a decimal
> digit, and it is not a number, it is an error, full stop.

How do you handle 2DROP ?

> What this means is that if you handle reals and insist in
> writing 1E0i0E0 instead of 1E0 0E0 i or 1E0 i 0E0 ,
> you have to replace (revector) that handling.
> We have a mechanism for that: ALSO COMPLEX.

So, we have to define all 10 prefixes, since 0 to 9, in the COMPLEX
vocabulary, am I right understand you?

How 2DROP will be handled after ALSO COMPLEX ? It seems, it will be not
found.

> That supposes the recognition mechanism obeys search order,
> which is integral to the idea.

[...]

>> 2. How to handle the formats that don't have a fixed prefix?
>> E.g. the names partially qualified by word lists as
>> module::sub-module::word
>> ?
> I doubt one must insist `` handled module::sub-module::word ''
> handled by the guy with the moustache as one word.

If we don't have another guy, we have to get this guy to do this.

> If module:: is a token that maybe install a search order where
> sub-module:: is a token that is recognized by the database
> engine, this could work. Without detail it is guessing.
> I want to keep the sequential aspect of Forth.

> If the last ::word suddenly changes the meaning of the preceeding
> module:: etc. I would loose track of what is going on.

Last ::word does not change the meaning of the preceding module::

This concept can be also implemented as parsing words:

module:: sub-module:: word

But it is more cumbersome in both implementation and use.

E.g.

forth-wordlist:: previous

(depending on implementation, it can produce an unexpected state of the
search-order).

Or, to take xt:

module:: sub-module:: 'word

vs

'module::sub-module::word

BTW. Conceptually, in this case (as well as in the case of numbers) we
have a hierarchy of lexemes.

On the level of the Forth text interpreter it is a single lexeme (blank
delimited sequence of non-blank characters):

'module::sub-module::word

But the quote recognizer gets rid of the leading tick and extracts a sub
lexeme:

module::sub-module::word

Further, the recognizer of qualified names extracts from
"module::sub-module::word" three lexemes:

module
sub-module
word

and resolves them as Forth words.

--
Ruvim

none albert

unread,

Jun 1, 2020, 11:45:14 AM6/1/20

to

In article <rb32ij$k23$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>On 2020-05-31 12:49, albert wrote:
>> In article <raurhu$8td$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>>> On 2020-05-30, albert wrote:
>[...]
>>>> My ideal still is:
>>>> 0 1 .. 9 are tokens. If found they execute (NUMBER).
>>>> They parse to next blank space, *THEMSELVES* .[...]
>>> 1. How can you specify chaining? I.e., if %ABC is not a binary number,
>>> can it be passed into other recognizers?
>>
>> Chaining is impossible. I consider that an advantage.
>> If you have seen %, and the remainder is not a binary number,
>> it is an error, full stop.
>
>> If the word starts with a decimal
>> digit, and it is not a number, it is an error, full stop.
>
>How do you handle 2DROP ?

2DROP is in the Forth wordlist
2 is in the ONLY wordlist , which is later in the search order.

AMDX86 ciforth 5.3.0
: test ONLY WORDS FORTH ;
OK
test
' & ^ 0 1 2 3 4 5 6 7 8 9 A B C
D E F - + " FORTH OK

100 Euro question: why not just
ONLY WORDS FORTH

If I designed Forth today I would have DDROP D>R DR> DOVER
And I'd have SDSWAP instead of ROT , DSSWAP instead of -ROT.
And above all D, instead of 2, .

Anton Ertl

unread,

Jun 1, 2020, 1:19:54 PM6/1/20

to

Ruvim <ruvim...@gmail.com> writes:
>On 2020-05-29 19:36, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:

[...]

>> It does not help the discussion if everyone uses his own favourite
>> naming and favourite terminology. In particular, if you want to make
>> point A, and you use your favourite terminology and names (your points
>> B and C), you will fail to get point A across. And of course, you
>> will not make progress on points B and C, because everybody else has
>> their own, different names and terminology, and will ignore this
>> aspect of your posting anyway.
>
>Well, I agree.
>
>So, how can we fix the issues in the terminology?
>
>I think they should be fixed even before discussion about mechanisms,
>semantics and names. Perhaps we should start from the scratch, and
>gradually append things from the current proposals step by step.

I don't think we will get anywhere with this approach: You will make a
proposal with your favourite terminology, which everybody else will
find flawed, so they want to fix the terminology before discussion
about mechanisms, semantics and names; and start from scratch. So the
next one will throw away what you did, just as you want to throw away
what Matthias Trute did, and in the end we will be nowhere.

The bottom line is whether you can live with the existing terminology.
I can. And the Forth-200x committee can (I certainly have not heard
complaints about terminology from them). So IMO "fixing" terminology
is at best a waste of time, at worst (and likely) it will result in
never standardizing recognizers.

>>> Perhaps, even more conceptually cleaner to have recognizers for numbers
>>> in the simplest form (without prefixes for radix, dot for double, etc).
>>
>> That's also possible. The question is if this results in a good
>> factoring.
>
>
>It seems for me that such factoring is reasonable. Not to implement the
>default recognizer, but to provide a useful library of recognizers.
>I implemented this approach, see at https://git.io/JfKcd

Looks nice (apart from using idiosyncratic naming conventions). So
this would be cool for a system that was implemented with recognizers
from the get-go. A system with historical baggage like NUMBER? (that
has to be kept because applications may call it) will still prefer to
go for REC-NUM.

But of course, once we have standardized recognizers, people can
provide and use such libraries.

>Yes, we can also have a special defining word that makes things shorter.
>Something like
>
> ' rec-dnum ' rec-snum 2 create-recognizer rec-num

Looks good. IMO, if we have that, I would not use
NEW-RECOGNIZER-SEQUENCE, or SET-RECOGNIZER. I would still use
something like GET-RECOGNIZER to read the constituent recognizers
(e.g., to construct a new recognizer sequence with some additional
recognizer). A testing word RECOGNIZER-SEQUENCE? ( xt -- f ) is
probably also useful (or GET-RECOGNIZER could return 0 if the xt does
not correspond to a recognizer sequence).

>> Given that one needs to construct a new recognizer sequence
>> if you want to use a new recognizer,
>> it is useful to have support for defining recognizer
>> sequences.
>
>I don't catch your argument, could you please elaborate an example?

I define a new recognizer REC-FOO. How to use it? A simple approach
is to append it at the end of the recognizers:

' REC-FOO FORTH-RECOGNIZER 2 create-recognizer REC-FORTH+FOO
' rec-forth+foo is forth-recognizer

This assumes that instead of rec-seq-ids we use xts of recognizers.

Anton Ertl

unread,

Jun 1, 2020, 1:23:26 PM6/1/20

to

Ruvim <ruvim...@gmail.com> writes:
>But, IMO, the tools to manage compound recognizers just are
>out of the scope of the Recognizers specification.

Given that every Forth system that implements recognizer will have to
deal with recognizer sequences (at least containing REC-WORD and
REC-NUM), I think they are in the scope.

Anton Ertl

unread,

Jun 1, 2020, 1:37:48 PM6/1/20

to

albert@cherry.(none) (albert) writes:
>There is a minimal search order (ONLY) which is a perfect place
>for default recognizers.

Not in a standard system. A standard system has to recognize numbers
even if there are no word lists in the search order. On a standard
system this must work:

: foo get-order 0 set-order s" 5" evaluate . set-order ;
foo \ prints 5

works on gforth, VFX, and iForth. It produces an "Invalid memory
address" exception in SwiftForth.

Cecil - k5nwa

unread,

Jun 1, 2020, 2:16:26 PM6/1/20

to

On 6/1/20 12:24 PM, Anton Ertl wrote:
> albert@cherry.(none) (albert) writes:
>> There is a minimal search order (ONLY) which is a perfect place
>> for default recognizers.
>
> Not in a standard system. A standard system has to recognize numbers
> even if there are no word lists in the search order. On a standard
> system this must work:
>
> : foo get-order 0 set-order s" 5" evaluate . set-order ;
> foo \ prints 5
>
> works on gforth, VFX, and iForth. It produces an "Invalid memory
> address" exception in SwiftForth.
>
> - anton
>

Works on MinForth v3.4 no crashing.

--
Cecil - k5nwa

Anton Ertl

unread,

Jun 1, 2020, 6:16:29 PM6/1/20

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>albert@cherry.(none) (albert) writes:
>>There is a minimal search order (ONLY) which is a perfect place
>>for default recognizers.
>
>Not in a standard system. A standard system has to recognize numbers
>even if there are no word lists in the search order. On a standard
>system this must work:
>
>: foo get-order 0 set-order s" 5" evaluate . set-order ;
>foo \ prints 5
>
>works on gforth, VFX, and iForth. It produces an "Invalid memory
>address" exception in SwiftForth.

The following works in SwiftForth:

wordlist constant my-wordlist
: foo get-order my-wordlist 1 set-order s" 5" evaluate . set-order ;
foo \ prints 5

Apparently it cannot work with an empty search order, but a search
order consisting of an empty wordlist is no problem.

Ruvim

unread,

Jun 1, 2020, 7:44:55 PM6/1/20

to

On 2020-06-01 19:40, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2020-05-29 19:36, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:
> [...]
>>> It does not help the discussion if everyone uses his own favourite
>>> naming and favourite terminology. In particular, if you want to make
>>> point A, and you use your favourite terminology and names (your points
>>> B and C), you will fail to get point A across. And of course, you
>>> will not make progress on points B and C, because everybody else has
>>> their own, different names and terminology, and will ignore this
>>> aspect of your posting anyway.
>>
>> Well, I agree.
>>
>> So, how can we fix the issues in the terminology?

It seems this my question was slightly misunderstood due to my unlucky
context. I was thinking about an issue in terminology that I mentioned
in response to Alex [1]. See below.

[1]. news:rarbcg$fsl$1...@dont-email.me

>> I think they should be fixed even before discussion about mechanisms,
>> semantics and names. Perhaps we should start from the scratch, and
>> gradually append things from the current proposals step by step.

(*)

>
> I don't think we will get anywhere with this approach: You will make a
> proposal with your favourite terminology, which everybody else will
> find flawed, so they want to fix the terminology before discussion
> about mechanisms, semantics and names; and start from scratch. So the
> next one will throw away what you did, just as you want to throw away
> what Matthias Trute did, and in the end we will be nowhere.

I don't want to throw away what Matthias (or anybody else) did.
See above: "should [...] append things from the *current proposals* step
by step".

> The bottom line is whether you can live with the existing terminology.
> I can. And the Forth-200x committee can (I certainly have not heard
> complaints about terminology from them). So IMO "fixing" terminology
> is at best a waste of time, at worst (and likely) it will result in
> never standardizing recognizers.

When I talk about *fixing* (i.e., correcting), it is not about
favourites, and not about promotion that I did (e.g. by examples that
show that the best rectype-null value is 0).
It is about that I point to a problem and insist that this problem
should be solved. Also, I may suggest some variant of solving, but I
don't insist on this variant. I think we should consider all available
variants, their pros and cons, and find a consensus.

Regarding the terminology. Some problems were already mentioned by Alex,
Ulrich, and also me.

Just one example of a problem in the terminology.

An excerpt from "Recognizer RfD D", XY.3.1
| A data type id is a single cell value that identifies
| a certain data type.

An excerpt from the Standard

| data type: An identifier for the set of values that
| a data object may have.

So, "data type id" identifies an identifier (i.e. an English title!). Is
it an author's intention? Obviously, not. It's just an incompatibility
to the language of the Standard. I think, a specification cannot be
included into the standard if it conflicts with this standard.

The same issue in Ulrich redaction [2]:
| Recognizer Information Token: An implementation-dependent
| single-cell value that identifies the data type [...]

I.e., "Recognizer Information Token" identifies an identifier.

[2] https://forth-standard.org/standard/intro#contribution-131

(*) I suggest to start the next iteration from the terms definitions,
i.e., find consensus in the terms definitions at the first, since only
such definitions will allow us to use the same clear terminology, and
take care from the start that this terminology is compatible to the
standard.

I have the impression that Recognizer is the most complicated
specification among the proposals after 94. One reason is that this
specification is really difficult to formally express at the level of
the Standard.

--
Ruvim

A. K.

unread,

Jun 2, 2020, 2:37:22 AM6/2/20

to

>
> I have the impression that Recognizer is the most complicated
> specification among the proposals after 94. One reason is that this
> specification is really difficult to formally express at the level of
> the Standard.
>

My impression is that there is too much in the package. AFAIR it started with
recognizing literals, and that was easy to follow, given that a literal
recognizer is not much more than a syntax checker with provision of int/comp
behavior. Also literals are usually not postponed.

Then it has evolved into a big salad of word recognizers in a modified text
interpreter loop. So new recognizers have to allow postponing, macro capability,
etc. and should be able to be used in OO packages, for instance to 'recognize'
common dot syntax patterns to address structure elements or object vars/methods.

Pure literal recognizers are easy to understand and usable even in state-smart
legacy Forth systems. They fit nicely into a classic text interpreter loop as
natural extension of number recognition.

OTOH those word- or super-recognizers are actually new pattern-matching parsers,
modernizing the classic Forth simple whitespace-delimited token scanner,
unfortunately called parser too.

Perhaps it helps to get a small agreement on literal recognizers first, and
focus the other discussion more on parsing techniques. My gut feeling is that
new parsing and text interpreter loop should better be factored out to reduce
the a.m. "salad".

none albert

unread,

Jun 2, 2020, 4:11:23 AM6/2/20

to

In article <2020Jun...@mips.complang.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>albert@cherry.(none) (albert) writes:
>>There is a minimal search order (ONLY) which is a perfect place
>>for default recognizers.
>
>Not in a standard system. A standard system has to recognize numbers
>even if there are no word lists in the search order. On a standard
>system this must work:
>
>: foo get-order 0 set-order s" 5" evaluate . set-order ;
>foo \ prints 5
>
>works on gforth, VFX, and iForth. It produces an "Invalid memory
>address" exception in SwiftForth.

What makes you think `` 0 set-order '' would remove ONLY from the search order?
Were I to implement SET-ORDER , I certainly would make it behave like ISO.
The non-standard word .VOCS shows the namespaces:
.VOCS
ENVIRONMENT ONLY FORTH OK
The standard word ORDER works as required
ORDER
FORTH [ FORTH ] OK

I just do away with a separate mechanism for numbers, if a
perfectly suitable mechanism that is already present,
can be reused.

>
>- anton

Groetjes Albert

dxforth

unread,

Jun 2, 2020, 6:55:28 AM6/2/20

to

On Tuesday, June 2, 2020 at 6:11:23 PM UTC+10, none albert wrote:
> In article <2020Jun...@mips.complang.tuwien.ac.at>,
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> >albert@cherry.(none) (albert) writes:
> >>There is a minimal search order (ONLY) which is a perfect place
> >>for default recognizers.
> >
> >Not in a standard system. A standard system has to recognize numbers
> >even if there are no word lists in the search order. On a standard
> >system this must work:
> >
> >: foo get-order 0 set-order s" 5" evaluate . set-order ;
> >foo \ prints 5
> >
> >works on gforth, VFX, and iForth. It produces an "Invalid memory
> >address" exception in SwiftForth.
>
> What makes you think `` 0 set-order '' would remove ONLY from the search order?
> Were I to implement SET-ORDER , I certainly would make it behave like ISO.

> …

But isn't that what 'ISO' says? BTW is there a use for '0 SET-ORDER' ?
Been so long I've had to worry about such things that I've forgotten :)

NN

unread,

Jun 2, 2020, 8:51:03 AM6/2/20

to

0 set-order ==> Sets the context vector to empty.
You cant search for words and thus you cant compile new ones.
A way to stop someone adding new definitions I would guess.
Not that I have ever used it.

dxforth

unread,

Jun 2, 2020, 9:09:12 AM6/2/20

to

Or anything useful?

> Not that I have ever used it.

Nor apparently has SwiftForth - at least not the specified test. The bug
appears to be in SwiftForth (FIND) - uses DO instead of ?DO . Perhaps
'bug' is too harsh - 'non-compliance with ANS' under 'cruel and unusual'
circumstances might be more apt :)

none albert

unread,

Jun 2, 2020, 11:59:35 AM6/2/20

to

In article <d068f119-a479-45e7...@googlegroups.com>,

An interesting application is to render gforth inoperable, something
you cannot accomplish by dividing by zero, addressing outside of a
stack or storing something at 0xFFFF,FFFF,FFFF,FFFF

Anton Ertl

unread,

Jun 2, 2020, 12:38:30 PM6/2/20

to

Ruvim <ruvim...@gmail.com> writes:
>On 2020-06-01 19:40, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:
>>> On 2020-05-29 19:36, Anton Ertl wrote:
>>>> Ruvim <ruvim...@gmail.com> writes:
>> [...]
>>>> It does not help the discussion if everyone uses his own favourite
>>>> naming and favourite terminology. In particular, if you want to make
>>>> point A, and you use your favourite terminology and names (your points
>>>> B and C), you will fail to get point A across. And of course, you
>>>> will not make progress on points B and C, because everybody else has
>>>> their own, different names and terminology, and will ignore this
>>>> aspect of your posting anyway.
>>>
>>> Well, I agree.
>>>
>>> So, how can we fix the issues in the terminology?
>
>It seems this my question was slightly misunderstood due to my unlucky
>context. I was thinking about an issue in terminology that I mentioned
>in response to Alex [1]. See below.
>
>[1]. news:rarbcg$fsl$1...@dont-email.me

I reread this posting, but find it as interesting as a discussion
about the bike shed colour.

>>> I think they should be fixed even before discussion about mechanisms,
>>> semantics and names. Perhaps we should start from the scratch, and
>>> gradually append things from the current proposals step by step.
>(*)
>>
>> I don't think we will get anywhere with this approach: You will make a
>> proposal with your favourite terminology, which everybody else will
>> find flawed, so they want to fix the terminology before discussion
>> about mechanisms, semantics and names; and start from scratch. So the
>> next one will throw away what you did, just as you want to throw away
>> what Matthias Trute did, and in the end we will be nowhere.
>
>I don't want to throw away what Matthias (or anybody else) did.
>See above: "should [...] append things from the *current proposals* step
>by step".

But first you want to start from scratch. Why? And why add things
"step by step".

>Just one example of a problem in the terminology.
>
>An excerpt from "Recognizer RfD D", XY.3.1
>| A data type id is a single cell value that identifies
>| a certain data type.
>
>An excerpt from the Standard
>| data type: An identifier for the set of values that
>| a data object may have.
>
>So, "data type id" identifies an identifier (i.e. an English title!). Is
>it an author's intention? Obviously, not. It's just an incompatibility
>to the language of the Standard. I think, a specification cannot be
>included into the standard if it conflicts with this standard.

The actual changes to the standard are rarely as suggested in the
proposal. They have to be revised for fitting with the way the
standard is written. However, this revision is something that should
be done after the proposal has been decided on; it makes no sense to
do all this work for every RfD, just in case this version is accepted
into the standard.

>I have the impression that Recognizer is the most complicated
>specification among the proposals after 94. One reason is that this
>specification is really difficult to formally express at the level of
>the Standard.

Everything is difficult to express at the level of the standard.
That's why you do not do it before the proposal is frozen and
accepted.

The recognizer proposal is one of the more complex proposals. But the
Forth-94 committee not just managed to describe the text interpreter
(which is refactored by the recognizer proposal), but a lot of other
new stuff. So it's not like the recognizer proposal is requires
superhuman powers.

Anton Ertl

unread,

Jun 2, 2020, 1:25:50 PM6/2/20

to

"A. K." <minf...@arcor.de> writes:
>>
>> I have the impression that Recognizer is the most complicated
>> specification among the proposals after 94. One reason is that this
>> specification is really difficult to formally express at the level of
>> the Standard.
>>
>
>My impression is that there is too much in the package. AFAIR it started with
>recognizing literals, and that was easy to follow, given that a literal
>recognizer is not much more than a syntax checker with provision of int/comp
>behavior.

Where can I find the thing that you find easy to follow?

>Also literals are usually not postponed.

Why not? Because until now POSTPONE is only defined for words. This
requires people to postpone the sequence

over 5 +

by writing

postpone over 5 postpone literal postpone +

instead of the more straightforward

postpone over postpone 5 postpone +

And yes, I remember someone asking what to do instead of
POSTPONE <num>.

>Then it has evolved into a big salad of word recognizers in a modified text
>interpreter loop. So new recognizers have to allow postponing, macro capability,

What is the difference between "postponing" and "macro capability"?

>etc. and should be able to be used in OO packages, for instance to 'recognize'
>common dot syntax patterns to address structure elements or object vars/methods.

That is a problem that Forth applications have. Forth systems (in
particular VFX) have grown hooks in the text interpreter to satisfy
this and other requirements.

A refactoring of the text interpreter, such as the recognizer
proposal, should be able to satisfy such requirements.

>Pure literal recognizers are easy to understand and usable even in state-smart
>legacy Forth systems.

The proposed recognizers are also usable in state-smart Forth systems.

>They fit nicely into a classic text interpreter loop as
>natural extension of number recognition.

The proposed recognizers nicely replace the whole text interpreter,
not just number recognition.

>OTOH those word- or super-recognizers are actually new pattern-matching parsers,
>modernizing the classic Forth simple whitespace-delimited token scanner,
>unfortunately called parser too.

The whitespace-delimited scanner (PARSE-NAME) is one of the things
that is unchanged during the refactoring. Note that a recognizer
expects a string, and that string is normally produced by PARSE-NAME
(or, if you are an old-timer, BL WORD COUNT).

>Perhaps it helps to get a small agreement on literal recognizers first, and
>focus the other discussion more on parsing techniques.

How should that help?

>My gut feeling is that
>new parsing and text interpreter loop should better be factored out to reduce
>the a.m. "salad".

What factoring do you have in mind?

Anton Ertl

unread,

Jun 2, 2020, 1:38:45 PM6/2/20

to

albert@cherry.(none) (albert) writes:
>In article <2020Jun...@mips.complang.tuwien.ac.at>,
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>albert@cherry.(none) (albert) writes:
>>>There is a minimal search order (ONLY) which is a perfect place
>>>for default recognizers.
>>
>>Not in a standard system. A standard system has to recognize numbers
>>even if there are no word lists in the search order. On a standard
>>system this must work:
>>
>>: foo get-order 0 set-order s" 5" evaluate . set-order ;
>>foo \ prints 5
>>
>>works on gforth, VFX, and iForth. It produces an "Invalid memory
>>address" exception in SwiftForth.
>
>What makes you think `` 0 set-order '' would remove ONLY from the search order?

<https://forth-standard.org/standard/search/SET-ORDER>

In particular:

|If n is zero, empty the search order.

>Were I to implement SET-ORDER , I certainly would make it behave like ISO.
>The non-standard word .VOCS shows the namespaces:
>.VOCS
>ENVIRONMENT ONLY FORTH OK
>The standard word ORDER works as required
>ORDER
>FORTH [ FORTH ] OK
>
>I just do away with a separate mechanism for numbers, if a
>perfectly suitable mechanism that is already present,
>can be reused.

As mentioned, if you are Chuck Moore or albert, and do not have to
keep compatibility with legacy code, having only one order is be nicer
than having a (wordlist) search order and a recognizer order.

But we started out with a text interpreter that contains (hard-coded)
a word recognizer (FIND or FIND-NAME) that uses the search order, and
a number recognizer (e.g., NUMBER?) that is outside the search order.
And, e.g., a user-defined text interpreter that just FINDs has to work
even with recognizers.

As mentioned, I went down this rabbit hole and explored it in
<http://www.euroforth.org/ef16/papers/ertl-recognizers.pdf>.

Ruvim

unread,

Jun 3, 2020, 4:27:12 AM6/3/20

to

On 2020-06-02 19:12, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2020-06-01 19:40, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:

[...]

>>>> So, how can we fix the issues in the terminology?

>>>> I think they should be fixed even before discussion about mechanisms,
>>>> semantics and names. Perhaps we should start from the scratch, and
>>>> gradually append things from the current proposals step by step.
>> (*)
>>>
>>> I don't think we will get anywhere with this approach: You will make a
>>> proposal with your favourite terminology, which everybody else will
>>> find flawed, so they want to fix the terminology before discussion
>>> about mechanisms, semantics and names; and start from scratch. So the
>>> next one will throw away what you did, just as you want to throw away
>>> what Matthias Trute did, and in the end we will be nowhere.
>>
>> I don't want to throw away what Matthias (or anybody else) did.
>> See above: "should [...] append things from the *current proposals* step
>> by step".
>
> But first you want to start from scratch. Why? And why add things
> "step by step".

I would like to reach a consensus among the people who are interested in
design (and implement) this proposal and are working on it.
At the moment, I see 4-5 persons who are suggesting particular wording,
terms, names, signatures, etc, and also not happy with the current
proposals (i.e., the version D from Matthias, the rephrase from Ulrich
(actually, in progress), the different API from me). It would be a big
deal if we write a specification that we all can live with.

Therefore I suggest to go step by step: to add every next part when we
find consensus on the previous part.

At the first, we should speak on the same language: use the same
terminology and understand the terms in the same way. Without that we
just be not properly understood by each other.

So, we should start from the terms definitions.

I have suggested some set of terms (with definitions). You can suggest
another set of terms, or correction to my set, or agree with this set.

When we find a consensus in terms, we can find the proper names for
words. I think, we want to have a pretty good etymology for names. But
a good etymology and good names is a result of a clear terminology.

I don't like to have yet more just strange historical things like ">IN"
[1]. But we have a risk that "RECTYPE" will become such strange thing
with etymology that is not connected to actual meaning.

Can we rely on the committee that he will find the good names? As shown
by history, we cant. "RECTYPE" is the result of changing names by the
committee. Yes, it is a better variant than "R:" or "TABLE", but not
good enough. NB: "RECTYPE" is just a result of the incorrect terminology
that was used. Another example of a very strange name choice that the
committee accepted is UNESCAPE word [2].

[1] https://forth-standard.org/standard/core/toIN#contribution-110
[2] https://forth-standard.org/standard/string/UNESCAPE#contribution-85

--
Ruvim

Anton Ertl

unread,

Jun 3, 2020, 8:02:12 AM6/3/20

to

m...@iae.nl writes:
>On Sunday, May 31, 2020 at 11:49:09 AM UTC+2, none albert wrote:
>> In article <raurhu$8td$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> >1. How can you specify chaining? I.e., if %ABC is not a binary number,
>> >can it be passed into other recognizers?
>>
>> Chaining is impossible. I consider that an advantage.

This means that one denotation word must be able to handle single-cell
and double-cell integers, and FP numbers. Not a good factoring.

And I think a dot-parser is not possible with this approach (except if
you make all printable characters prefixes for a denotation word that
covers regular words, dot-separated compounds, and numbers). Really
bad factoring.

>> >2. How to handle the formats that don't have a fixed prefix?
>> >E.g. the names partially qualified by word lists as
>> > module::sub-module::word
>> >?
>>
>> I doubt one must insist `` handled module::sub-module::word ''
>> handled by the guy with the moustache as one word.
>> If module:: is a token that maybe install a search order where
>> sub-module:: is a token that is recognized by the database
>> engine, this could work. Without detail it is guessing.

The idea of having each component as a separate word is seductive, and
I have been on this mental path for about two decades. I imagined
words with a parse-time action of changing the search order for just
the next word; this is similar (or the same as) Manfred Mahlow's
preludes. We even implemented a prelude bit in Gforth in 2009.
But I never implemented the actual thing, because there were always
problems for which I had no solution, e.g.:

postpone module:: submodule:: word

Or should it be

module:: submodule:: postpone word

In either case, a straighforward implementation would not work. What
about

module:: ( comment ) submodule:: word

What if I want to use MODULE:: scope for more than one word? Do I
need a separate word for that?

Many of these problems are solved by having MODULE::SUB-MODULE::WORD
as one PARSE-NAME unit: POSTPONE can just work on this unit, and it's
obvious that you cannot insert a comment in the middle.

There are still some problems remaining: How do we implement C struct
fields in a foreign language interface. C struct fields have the
following problematic properties:

1) Every struct has its own name space.

2) A struct can be an anonymous part of a variable, struct, or array
definition, nested arbitrarily deep.

Given that deeply nested anonymous structs are rare, it's probably
good enough to require using a dot-sequence starting from a named
entity. E.g., if we have

struct {
...
struct { ...; short bar; ... } flip[20];
...
} foo;

Then reading a bar element might look as follows:

7 foo.flip.[].bar.@

Note that the [] and @ parts are not Forth words in the
FORTH-WORDLIST, but words in the wordlists corresponding to
the C name spaces.

Hmm, I originally had thought that one would need to split it in two
at the array index operation, but it seems we can do it without.

Anyway, if we need to split it for some reason, it might go as follows:

7 foo.flip.[] ( addr ) dup foo::flip::bar.@

>> An example of what you don't want is Intel's
>> 1234 ..... (wait for it) .... H
>> where the H changes the preceeding 1234 to a hex number

I don't want it, but if I have these as input, in want a mechanism
that can handle them. Recognizers can.

And then there are cases that I want, like recognizing 2020-06-03 as a
date.

>Given alternatives, I'd always go for the simplest
>alternative.

I think you balance simplicity against functionality. Even Chuck
Moore does. Recognizers provide a good balance. I don't think
albert's denotations do. Not only are they missing functionality, an
efficient implementation (more efficient than following the chain of
links in a wordlist) is complicated.

>Your way undisputably can be plugged into a working Forth
>without running into nasty surprises down the track, while
>diminishing complexity already in the kernel.

What makes you think so? Has albert written an implementation of his
denotations for a different system than his own? Bernd Paysan
implemented recognizers for Gforth *and* VFX. Matthias Trute built
recognizers for amForth. Stephen Pelc implemented (AFAIK an older
recognizer proposal) recognizers for VFX.

Stephen Pelc

unread,

Jun 3, 2020, 8:12:17 AM6/3/20

to

On Wed, 3 Jun 2020 11:27:09 +0300, Ruvim <ruvim...@gmail.com>
wrote:

>Can we rely on the committee that he will find the good names? As shown
>by history, we cant. "RECTYPE" is the result of changing names by the
>committee. Yes, it is a better variant than "R:" or "TABLE", but not
>good enough. NB: "RECTYPE" is just a result of the incorrect terminology
>that was used. Another example of a very strange name choice that the
>committee accepted is UNESCAPE word [2].

UNESCAPE arose at the end of long discussion and we were all
very tired. We're humans. As a name UNESCAPE is better than
many others. Are you any better than the rest of us?

Instead of just telling us that "stuff" is wrong, try making
a short cogent solution free of all the philosophical
verbiage.

Stephen

--
Stephen Pelc, ste...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612
web: http://www.mpeforth.com - free VFX Forth downloads

Ruvim

unread,

Jun 3, 2020, 9:31:27 AM6/3/20

to

On 2020-06-03 15:12, Stephen Pelc wrote:
> On Wed, 3 Jun 2020 11:27:09 +0300, Ruvim <ruvim...@gmail.com>
> wrote:
>
>> Can we rely on the committee that he will find the good names? As shown
>> by history, we cant.

Well, I just wanted to say that we cant put bad names into a proposal
and expect that the committee will change them into good names.
Perhaps I'm wrong.

>> "RECTYPE" is the result of changing names by the
>> committee. Yes, it is a better variant than "R:" or "TABLE", but not
>> good enough. NB: "RECTYPE" is just a result of the incorrect terminology
>> that was used. Another example of a very strange name choice that the
>> committee accepted is UNESCAPE word [2].
>
> UNESCAPE arose at the end of long discussion and we were all
> very tired. We're humans. As a name UNESCAPE is better than
> many others. Are you any better than the rest of us?

We all make mistakes (and me too), it's OK. Please do not take it
personally.

What matters is not a mistake, but a reaction to the mistake.

If we all agree that there was an unsuitable name choice, we should just
correct it. E.g., add another name, and retain the old name for backward
compatibility.

> Instead of just telling us that "stuff" is wrong, try making
> a short cogent solution free of all the philosophical
> verbiage.

Agree. It is just off the topic of this message.

--
Ruvim

Alex McDonald

unread,

Jun 3, 2020, 9:54:14 AM6/3/20

to

On 27-May-20 19:15, Alex McDonald wrote:
> After Matthias' untimely passing, his proposal for recognizers has
> stalled and I'm not aware of any discussion since his death. It would be
> a shame if the work he did went to waste and wasn't carried forward.
>
> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html was (I believe)
> the last proposal he made, and I would like to take it forward on his
> behalf. If there are no objections, I'd like to further refine it and
> create another set of RFCs based on my and others' experiences of using
> recognizers.
>
> This may not match some of the current implementations in the wild; for
> example, decisions made by gforth to do with an "automatic" action for
> postpone actions based on the compile action (that is, having two
> actions rather than three).
>
> It will also attempt, in no specific order: to reduce the number of
> words required to implement the proposal; do some significant bike
> shedding around names (for example, removing the ambiguity of RECTYPE
> and avoiding words like or containing NULL); remove the requirement for
> a fixed name REC-NUM REC-FLOAT; be less prescriptive and more
> descriptive to allow greater implementation flexibility; and so on.
>
> Polite comments welcome.
>

I'm not going to be working on this proposal in the near term as I'd
hoped, due to a recent health issue. I'll be lurking until I've sorted
it out.

--
Alex

none albert

unread,

Jun 3, 2020, 10:42:03 AM6/3/20

to

In article <2020Jun...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>m...@iae.nl writes:
>>On Sunday, May 31, 2020 at 11:49:09 AM UTC+2, none albert wrote:
>>> In article <raurhu$8td$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>>> >1. How can you specify chaining? I.e., if %ABC is not a binary number,
>>> >can it be passed into other recognizers?
>>>
>>> Chaining is impossible. I consider that an advantage.
>
>This means that one denotation word must be able to handle single-cell
>and double-cell integers, and FP numbers. Not a good factoring.
>
>And I think a dot-parser is not possible with this approach (except if
>you make all printable characters prefixes for a denotation word that
>covers regular words, dot-separated compounds, and numbers). Really
>bad factoring.

So at the same time as loading the fp point package
(NUMBER) is revectored to
: NEW-NUMBER ... 'OLD-NUMBER CATCH ERROR-NOT-A-DIGIT = IF
..... THEN ;
I can't loose any sleep over that.

C is designed by an american, cludgy and irregular.
Why not start with Pascal?
>
<SNIP> Further discussion of C.

>
>>> An example of what you don't want is Intel's
>>> 1234 ..... (wait for it) .... H
>>> where the H changes the preceeding 1234 to a hex number
>
>I don't want it, but if I have these as input, in want a mechanism
>that can handle them. Recognizers can.
>
>And then there are cases that I want, like recognizing 2020-06-03 as a
>date.

Aren't we just talking about implementing input routines here?
Or is the goal a natural language interface?

>
>>Given alternatives, I'd always go for the simplest
>>alternative.
>
>I think you balance simplicity against functionality. Even Chuck
>Moore does. Recognizers provide a good balance. I don't think
>albert's denotations do. Not only are they missing functionality, an
>efficient implementation (more efficient than following the chain of
>links in a wordlist) is complicated.

15 lines for a Pascal parser. Is that lack of functionality?
My problem is I can't feel the urge for those full-blown
recognizers.

Combining prefixes with a dictionary hash is not hard,
as long as you don't store entries that are hidden by other
entries, prefixes or not. Then if you have a miss, you look
for subsequently shorter strings.
My problem is I don't feel the urge to speed up the dictionary
search.

>
>>Your way undisputably can be plugged into a working Forth
>>without running into nasty surprises down the track, while
>>diminishing complexity already in the kernel.
>
>What makes you think so? Has albert written an implementation of his
>denotations for a different system than his own? Bernd Paysan
>implemented recognizers for Gforth *and* VFX. Matthias Trute built
>recognizers for amForth. Stephen Pelc implemented (AFAIK an older
>recognizer proposal) recognizers for VFX.

That is an interesting remark. The PREFIX mechanism cannot be built on
top of a system, they are a modification right in the heart of
a system. What mhx says is that the usage is straightforward,
which I won't dispute, but it is a totally different point.

It is probably impossible to write a module to load onto Gforth.
It is about whether Gforth kills the man with the moustache.
Maybe Gforth already did that, if your "preludes" are similar
to prefixes.
I guess I could modify 5 lines in Gforth to accommodate prefixes,
if they're not already there. This would take me ages though.
mhx could change his 5 lines in iforth in a heartbeat.
The PREFIX facility does no harm; if you don't use it,
it is totally invisible. So I wouldn't be surprised if
mhx is going to implement it.

I find it strange to talk about implementing recognizers.
Once gForth or iforth supports the PREFIX modifier, all the
rest is just portable Forth code, e.g. the aforementioned
Pascal syntax checker.

>
>- anton

Cecil - k5nwa

unread,

Jun 3, 2020, 12:52:39 PM6/3/20

to

But the beauty of Forth is that we can change the name ourselves, if you
don't like it create a SYNONYM for it. Put it at the beginning of a
program so it will work on all versions.

It would be better to choose a "better" name before release but we are
all so imperfect and different, what you find pleasing I might not so
correct it to your liking.

--
Cecil - k5nwa

pahihu

unread,

Jun 3, 2020, 12:54:44 PM6/3/20

to

Hi,

In a personal email communication with Matthias Trute in 2017 I provided a link to him,
which discusses the parse step and prefix support in FiCL. FiCL has both since version 2.05
(around 2001).

Here is it again:

http://ficl.sourceforge.net/parsesteps.html

Good luck!
pahihu

dxforth

unread,

Jun 3, 2020, 7:10:14 PM6/3/20

to

On Thursday, June 4, 2020 at 12:42:03 AM UTC+10, none albert wrote:
> ...

> C is designed by an american, cludgy and irregular.

So was Forth. Have you noticed how it's the Americans who have the
strangest ideas, religions etc and suffer the worst from them? Must be
something in the water :)

Ruvim

unread,

Jun 3, 2020, 9:27:58 PM6/3/20

to

On 2020-06-01 19:40, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2020-05-29 19:36, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:
[...]

>>>> Perhaps, even more conceptually cleaner to have recognizers for numbers
>>>> in the simplest form (without prefixes for radix, dot for double, etc).
>>>
>>> That's also possible. The question is if this results in a good
>>> factoring.
>>
>> It seems for me that such factoring is reasonable. Not to implement the
>> default recognizer, but to provide a useful library of recognizers.
>> I implemented this approach, see at https://git.io/JfKcd
>
> Looks nice (apart from using idiosyncratic naming conventions).

Yes, the names there are too mnemonic.

> So this would be cool for a system that was implemented with recognizers
> from the get-go. A system with historical baggage like NUMBER? (that
> has to be kept because applications may call it) will still prefer to
> go for REC-NUM.
>
> But of course, once we have standardized recognizers, people can
> provide and use such libraries.

[...]

>> Yes, we can also have a special defining word that makes things shorter.
>> Something like
>>
>> ' rec-dnum ' rec-snum 2 create-recognizer rec-num
>
> Looks good.

An important thing is that we still can use a colon definition or even a
quotation as [: {: d:s :} s rec-snum ?et s rec-dnum ;]

Ow. I have just remembered that currently rec-* doesn't return 0 on
fail. So, instead of the general "?et" word we have to use a special
variant:
: ?rec ]] dup unrecognized <> if exit then drop [[ ; immediate
\ exit if a recognizer succeeds

> IMO, if we have that, I would not use
> NEW-RECOGNIZER-SEQUENCE, or SET-RECOGNIZER. I would still use
> something like GET-RECOGNIZER to read the constituent recognizers
> (e.g., to construct a new recognizer sequence with some additional
> recognizer).

As you show below, GET-RECOGNIZER is not used to construct a new
recognizer from several other.

> A testing word RECOGNIZER-SEQUENCE? ( xt -- f ) is
> probably also useful (or GET-RECOGNIZER could return 0 if the xt does
> not correspond to a recognizer sequence).

An author's choice (for both a system and a program) may be to use only
colon definitions to define recognizers. And GET-RECOGNIZERS (as well as
a testing word) will be utterly useless.

OTOH, an author is freed to design own tool or use a library to have
such capabilities as testing and unrolling (of course, the certain API
should be used to create recognizers with these capabilities).

But a very simple implementation of create-recognizer (without
additional features) may look like the following:

: create-recognizer ( i*xt i "name" -- )
: "{: d:s :}" evaluate
0 ?do "s" evaluate compile, postpone ?rec loop unrecognized lit,
postpone ;
;

>>> Given that one needs to construct a new recognizer sequence
>>> if you want to use a new recognizer,
>>> it is useful to have support for defining recognizer
>>> sequences.
>>
>> I don't catch your argument, could you please elaborate an example?
>
> I define a new recognizer REC-FOO. How to use it? A simple approach
> is to append it at the end of the recognizers:
>
> ' REC-FOO FORTH-RECOGNIZER 2 create-recognizer REC-FORTH+FOO
> ' rec-forth+foo is forth-recognizer

It seems you mean either "action-of forth-recognizer"
or "to forth-recognizer"

Somebody else might define it just as a colon:

: rec-forth+foo {: d:s :}
s [ action-of forth-recognizer compile, ] ?rec s rec-foo ;

Regarding of use a DEFER (or VALUE) to set the current recognizer, as it
was already mentioned, - it is unsuitable choice. A Forth system might
want to perform additional internal actions on changing the current
recognizer. And it is difficult if the current recognizer is changed via
IS or TO.

Regarding appending another recognizer to the current recognizer.
I think (according to my practice) it is more important to have
standard methods to append a recognizer to the current one, than a
method like CREATE-RECOGNIZER.

If you can get and set the current recognizer, all these methods can be
implemented anyway. But appending to the current recognizer is far more
frequently used operation than creating a composition from several
recognizers.

I can suggest something like:
enqueue-recognizer ( xt -- ) \ append to the end
preempt-recognizer ( xt -- ) \ append to the head
undo-recognizer ( -- ) \ discard the last appending

--
Ruvim

Anton Ertl

unread,

Jun 4, 2020, 4:14:44 AM6/4/20

to

dxforth <dxf...@gmail.com> writes:
>BTW is there a use for '0 SET-ORDER' ?

Exactly what is shown here: Using EVALUATE to convert numbers. There
have been several postings here where people wanted to do such things
with data coming from files. One problem I see is that EVALUATE could
convert the string into a single-cell or double-cell integer, or an FP
value, and the code has to deal with that. But for integers the
highest-level thing that has been standardized is >NUMBER, which is
relatively low-level and may require more code than starting from
EVALUATE (depending on exact requirements).

Anton Ertl

unread,

Jun 4, 2020, 4:17:54 AM6/4/20

to

dxforth <dxf...@gmail.com> writes:
>On Tuesday, June 2, 2020 at 10:51:03 PM UTC+10, NN wrote:

>> 0 set-order =3D=3D> Sets the context vector to empty.=20
>> You cant search for words and thus you cant compile new ones.=20

>> A way to stop someone adding new definitions I would guess.
>
>Or anything useful?

In that vein:

app-wordlist 1 set-order

seals the system to only provide access to APP-WORDLIST and numbers.
On albert's system numbers would not work.

Ruvim

unread,

Jun 4, 2020, 4:34:19 AM6/4/20

to

On 2020-06-01 20:20, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:

>> But, IMO, the tools to manage compound recognizers just are
>> out of the scope of the Recognizers specification.
>
> Given that every Forth system that implements recognizer will have to
> deal with recognizer sequences (at least containing REC-WORD and
> REC-NUM), I think they are in the scope.

The context: before that it was about a dynamically resizable stack that
contains a sequence of recognizers. This stack is equivalent to a
compound recognizer.

When we append something to this stack, we change the corresponding
recognizer. It is the same recognizer (in the sense of xt), but its
behavior is changed. It now returns the different result for the same
lexemes in some cases. Yes, it is due to change some internal state.

So, by managing compound recognizers I meant such changing their
behavior (and their internal state). Particular implementation may
differ, there could be lists or even alloted quotations instead of a
dynamic stack. But the functionality that I mean is the same: changing
behavior of a recognizer.

General tools or API for namely this functionality are out of the
proposal scope, I believe.

Actually, we have at least two such recognizers. It is REC-WORD (that
depends on the search order) and REC-NUM (that depends on BASE). Each of
them has its own unique mechanism of changing its state.

OTOH, the chain of responsibility [1] is a common pattern that can
exists as a library (even in many libraries with different
requirements). And such a library can be used to create and manage a
compound recognizer.

I used the following API in one such library.

advice-before ( xt -- )
\ append xt at the head of the current chain

advice-after ( xt -- )
\ append xt at the tail of the current chain

perform-chain ( i*x -- j*x flag )
\ execute every item in the current chain starting from the head
\ and up to one that returns a nonzero

perform-chain-next ( i*x -- j*x flag )
\ execute every item starting from the item after the current one
\ and up to one that returns a nonzero

\ xt ( i*x -- j*x 0|x )

Every next item in the chain is executed if the previous one returns
false (zero) only.

This example shows one more argument why UNREGONIZED should be 0. We
cant use such a common library if recognizers returns special nonzero
value on fail. But if their return 0 on fail — they can be used in
various libraries, since distinction zero and nonzero values is very
common and convenient approach that used everywhere.

[1] https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern

--
Ruvim

none albert

unread,

Jun 4, 2020, 8:20:49 AM6/4/20

to

In article <69996e69-88df-4d64...@googlegroups.com>,

dxforth <dxf...@gmail.com> wrote:
>On Thursday, June 4, 2020 at 12:42:03 AM UTC+10, none albert wrote:
>> ...
>> C is designed by an american, cludgy and irregular.
>
>So was Forth.

No argue with that. I conclude however that Forth is not the
ideal breeding ground for clean information technology concepts.
[Once a concept is developed Forth may be the easiest way to
experiment with it.]

>Have you noticed how it's the Americans who have the
>strangest ideas, religions etc and suffer the worst from them? Must be
>something in the water :)

You mean the "Kerk van Jezus Christus van de Heiligen der Laatste Dagen"
as we call them here? Their nickname is "moron's" IIRMC.

dxforth

unread,

Jun 4, 2020, 7:28:46 PM6/4/20

to

On Thursday, June 4, 2020 at 10:20:49 PM UTC+10, none albert wrote:
> In article <69996e69-88df-4d64...@googlegroups.com>,
> dxforth <dxf...@gmail.com> wrote:
> >On Thursday, June 4, 2020 at 12:42:03 AM UTC+10, none albert wrote:
> >> ...
> >> C is designed by an american, cludgy and irregular.
> >
> >So was Forth.
>
> No argue with that. I conclude however that Forth is not the
> ideal breeding ground for clean information technology concepts.
> [Once a concept is developed Forth may be the easiest way to
> experiment with it.]
>
> >Have you noticed how it's the Americans who have the
> >strangest ideas, religions etc and suffer the worst from them? Must be
> >something in the water :)
>
> You mean the "Kerk van Jezus Christus van de Heiligen der Laatste Dagen"
> as we call them here? Their nickname is "moron's" IIRMC.

Not specifically. But when presidents are required to put on show of
religiosity it shows the public preoccupation. If there be a separation
between church and state, it's tenuous at best.

Alex McDonald

unread,

Jun 15, 2020, 5:29:53 PM6/15/20

to

I am much improved, and much wiser about the electrical workings of the
human heart.

I'm updating version D of the proposal, since it was the last to be
submitted. I'm including this section;

---
Matthias Trute
It was with great sadness that we learned that on 25th March 2020,
Matthias Trute, the original author of this proposal, passed away. His
work on this and other proposals, his own Forth system AmForth, and his
support of the Forth community, is much appreciated. It would seem to be
remiss and careless of us not to carry on with the development of this
proposal, something many consider to be one of the most significant yet
elegantly simple additions to Forth. I am sure Matthias would have
approved. May he rest in peace.
---

We may have to approach Matthias' wife to get the copyright sorted,
unless the document was part of a work item of the committee. What is
the consensus on this? If the copyright is unclear, here is what I propose:

---
The license for the original version of this document is unclear, but
copyright specifically resided with Matthias Trute and possibly with
those mentioned in the Acknowledgements section of this document. No
permission from the authors has been sought. The current author (Alex
McDonald) wishes to license his contribution under the [Creative Commons
Attribution 4.0 International Public License] and will assume all future
contributions are under the same terms.

https://creativecommons.org/licenses/by/4.0/legalcode

This license is summarised here:
https://creativecommons.org/licenses/by/4.0/
---

The acknowledged contributions include

• Bernd Paysan
• Jenny Brien
• Andrew Haley
• Alex McDonald
• Anton Ertl
• Forth 200x Committee

That would require that each contributor acknowledge that

(a) the license is acceptable to you & the Forth committee
(b) anything you may have contributed is under that license
(c) you or anyone else will make future contributions under these terms.

The code appendices should be covered in the same was as code in the
Forth standard and the tester. (The CC4 licence isn't suitable for
code.) I'm not sure what it is, or even if any is specified.

Thanks.

--
Alex

Ruvim

unread,

Jun 18, 2020, 5:27:35 AM6/18/20

to

On 2020-06-16 00:29, Alex McDonald wrote:
[...]

> The current author (Alex McDonald)
> wishes to license his contribution under the [Creative Commons
> Attribution 4.0 International Public License] and will assume all future
> contributions are under the same terms.
>
> https://creativecommons.org/licenses/by/4.0/legalcode
>
> This license is summarised here:
> https://creativecommons.org/licenses/by/4.0/

Why do you prefer CC BY to CC BY-SA ?

CC BY-SA -- Attribution-ShareAlike
-- https://creativecommons.org/licenses/by-sa/4.0/

The difference is that CC BY-SA has "share alike" point:

| ShareAlike — If you remix, transform, or build upon the material,
| you must distribute your contributions under the same license
| as the original.

--
Ruvim

Alex McDonald

unread,

Jun 18, 2020, 7:16:54 AM6/18/20

to

Thank for responding. You make a good point. I believe that this could
be resolved by having the work & modifications as a work item of the
Forth technical committee and the copyright assigned to it, but I can't
find any references as to whether that is the case, or what under what
copyright terms contributions are made.

Lawyer stuff is incredibly boring, but the real world consequences of it
can be serious, so I'd rather start on the right foot.

https://www.w3.org/Legal/copyright-myths.txt

3) If it's posted to Usenet it's in the public domain.

False. Nothing is in the public domain anymore unless the
owner explicitly puts it in the public domain(*). Explicitly,
as in you have a note from the author/owner saying, "I grant
this to the public domain." Those exact words or words very
much like them.

Some argue that posting to Usenet implicitly grants
permission to everybody to copy the posting within fairly
wide bounds, and others feel that Usenet is an automatic store and
forward network where all the thousands of copies made are
done at the command (rather than the consent) of the
poster. This is a matter of some debate, but even if the
former is true (and in this writer's opinion we should all pray
it isn't true) it simply would suggest posters are implicitly
granting permissions "for the sort of copying one might expect
when one posts to Usenet" and in no case is this a placement
of material into the public domain. Furthermore it is very
difficult for an implicit licence to supersede an explicitly
stated licence that the copier was aware of.

Note that all this assumes the poster had the right to post
the item in the first place. If the poster didn't, then all
the copies are pirate, and no implied licence or theoretical
reduction of the copyright can take place.

(*) It's also usually in the public domain if the creator has
been dead for 50 years. If anybody dead for 50 years is posting
to the net, let me know. There are some other fine points
to this issue -- check more detailed documents for info.

It would also be helpful to have a free patent clause, as they too can
cause pain down the line. An extract from the Apache 2.0 licence

6. Grant of Patent License. Subject to the terms and conditions of this
License, each Contributor hereby grants to You a perpetual, worldwide,
non-exclusive, no-charge, royalty-free, irrevocable (except as stated in
this section) patent license to make, have made, use, offer to sell,
sell, import, and otherwise transfer the Work, where such license
applies only to those patent claims licensable by such Contributor that
are necessarily infringed by their Contribution(s) alone or by
combination of their Contribution(s) with the Work to which such
Contribution(s) was submitted….

--
Alex

NN

unread,

Jun 18, 2020, 9:51:11 PM6/18/20

to

Rules and regulations change, And the email in the link is dated 1995.
Given that 25 years have passed I am curious to know,

Is the copyright notice (above) interpreted the same by all countries ?

If there was an issue ( for whatever reason ) would the claimaint
have to fight this through the US courts. Or can he/she expect
the same rights in say the EU courts?

Alex McDonald

unread,

Jun 19, 2020, 6:47:44 AM6/19/20

to

Please stop cutting out the message you're replying to. I presume you
mean https://www.w3.org/Legal/copyright-myths.txt is 25 years old. That
makes no difference; it still applies.

Copyright as per the Berne Convention applies to a large number of
countries
https://www.wipo.int/treaties/en/ShowResults.jsp?lang=en&search_what=B&bo_id=7

It is rare for copyright cases to make it to the courts. Look at the
number of GPL cases; a handful, and the last two iirc in the German courts.

Most people abide by the rules without being threatened by legal action.

--
Alex

dxforth

unread,

Jun 19, 2020, 7:50:40 AM6/19/20

to

Perhaps it has to do with the consequences of threatening legal
action over the use of 'free software' and falling into disfavour.
Irrational behaviour isn't restricted to 'copyright criminals'.

Ruvim

unread,

Jun 19, 2020, 10:25:07 AM6/19/20

to

On 2020-06-18 14:16, Alex McDonald wrote:
> On 18-Jun-20 10:27, Ruvim wrote:
>> On 2020-06-16 00:29, Alex McDonald wrote: [...]
>>
>>> The current author (Alex McDonald) wishes to license his
>>> contribution under the [Creative Commons Attribution 4.0
>>> International Public License] and will assume all future
>>> contributions are under the same terms.
>>>
>>> https://creativecommons.org/licenses/by/4.0/legalcode
>>>
>>> This license is summarised here:
>>> https://creativecommons.org/licenses/by/4.0/
>>
>> Why do you prefer CC BY to CC BY-SA ?
>>
>> CC BY-SA -- Attribution-ShareAlike --
>> https://creativecommons.org/licenses/by-sa/4.0/
>>
>>
>> The difference is that CC BY-SA has "share alike" point:
>>
>> | ShareAlike — If you remix, transform, or build
>> | upon the material, you must distribute your
>> | contributions under the same license
>> | as the original.

>

> Thank for responding. You make a good point. I believe that this could
> be resolved by having the work & modifications as a work item of the
> Forth technical committee and the copyright assigned to it, but I can't
> find any references as to whether that is the case, or what under what
> copyright terms contributions are made.
>

I see. "CC BY" permits the committee to use your work and publish the
result under another more restrictive license.

"CC BY-SA" doesn't permit an essentially more restrictive license for a
derivative work (for example see W3C bellow).

But I just had in my mind a popular and simple case that inbound =
outbound, i.e. the same license "implicitly serves as both the inbound
(from contributors) and outbound (to other contributors and users)
license" [1]

And in this case, "CC BY-SA" seems more appropriate than "CC BY". Since
"CC BY" does not guarantee that an author (e.g. a committee) has
indisputable permission to incorporate changes from a derivative work:
「 Do you want to be even more open and use CC-by, without the "Share
Alike" requirement? Possibly not, if you want to guarantee that you are
able to incorporate modifications and additions by other people, when
you want to; 」 [2]

For comparison, W3C uses a more restrictive license for their
specifications:
「 the publication of derivative works of this document for use as a
technical specification is expressly prohibited 」 [3]

And for that WHATWG cannot use "CC BY-SA", but uses "CC BY" for their
publications [4], — it allows W3C to take a work of WHATWG and issue a
derivative work under their more restrictive license. OTOH, for inbound
contributions, WHATWG uses the different license that is even more open
than "CC BY" [5].

[...]

> It would also be helpful to have a free patent clause, as they too can
> cause pain down the line. An extract from the Apache 2.0 licence

Yes, the issue regarding patents should be carefully treated, "CC" does
not resolve this issue [7]. WHATWG Intellectual Property Rights Policy
treats this issue separately [6].

[1] Opensource guide http://tiny.cc/7zp0qz
[2] CC BY-SA vs GFDL http://tiny.cc/j9p0qz
[3] https://www.w3.org/Consortium/Legal/2015/doc-license
[4] https://whatwg.org/ipr-policy#7-publications
[5] https://whatwg.org/ipr-policy#42-inbound-copyright-license-grant
[6] https://whatwg.org/ipr-policy#5-patents
[7] What good is a CC licensed specification? http://tiny.cc/h7s0qz

--
Ruvim

none albert

unread,

Jun 19, 2020, 10:55:01 AM6/19/20

to

In article <7fe3df74-eb65-49f3...@googlegroups.com>,

It is very rational to not try a legal action that is guaranteed to fail.
Richard Stallman had his legal buddies rallied to make the GPL as
unassailable as any small print in the industry.

Alex McDonald

unread,

Jun 19, 2020, 11:16:06 AM6/19/20

to

More to the point, I'm still to see any response from anyone on the
Forth200x committee. Perhaps I'm posting in the wrong place.

I will try the Forth200x mailing list on Yahoo, but there hasn't been
any activity on that since 18-sep-2019.

There's also the forth-standard.org website which only supports linear
conversations. Although they can be categorised as proposals, it is
pretty useless for carrying on multi way discussions, extracting said
proposals into a whole, and lacks even basic version control. I have
zero interest in dealing with it.

--
Alex

JennyB

unread,

Jun 19, 2020, 1:37:44 PM6/19/20

to

On Monday, 15 June 2020 22:29:53 UTC+1, Alex McDonald wrote:

> I am much improved, and much wiser about the electrical workings of the
> human heart.

Glad to hear it!

My own contribution was minimal, and I have no objection to a, b, or c above.

Jenny Brien

Alex McDonald

unread,

Jun 21, 2020, 6:21:10 PM6/21/20

to

Thanks Jenny.

--
Alex

Ruvim

unread,

Jun 23, 2020, 7:08:58 AM6/23/20

to

On 2020-05-27, Alex McDonald wrote:

> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html

> I'd like to further refine it and create another set of RFCs
> based on my and others' experiences of using recognizers.
>
> This may not match some of the current implementations in the wild; for
> example, decisions made by gforth to do with an "automatic" action for
> postpone actions based on the compile action (that is, having two
> actions rather than three).
>
> It will also attempt, in no specific order: to reduce the number of
> words required to implement the proposal; do some significant bike
> shedding around names (for example, removing the ambiguity of RECTYPE
> and avoiding words like or containing NULL); remove the requirement for
> a fixed name REC-NUM REC-FLOAT; be less prescriptive and more
> descriptive to allow greater implementation flexibility; and so on.
>
> Polite comments welcome.
>

By your view, what is a minimal essential part of Recognizer API?

It is a part that can be a basis to define other parts in a portable
manner, and that should be standardized since it cannot be implemented
in a portable manner.

I mean other parts may be standardized despite they can be implemented
in a portable manner — e.g., due to high demand and reusing factor.

--
Ruvim

Alex McDonald

unread,

Jun 23, 2020, 11:43:50 AM6/23/20

to

On 23-Jun-20 12:08, Ruvim wrote:
> On 2020-05-27, Alex McDonald wrote:
>
>> http://amforth.sourceforge.net/pr/Recognizer-rfc-D.html
>
>
>> I'd like to further refine it and create another set of RFCs
>> based on my and others' experiences of using recognizers.
>>
>> This may not match some of the current implementations in the wild;
>> for example, decisions made by gforth to do with an "automatic" action
>> for postpone actions based on the compile action (that is, having two
>> actions rather than three).
>>
>> It will also attempt, in no specific order: to reduce the number of
>> words required to implement the proposal; do some significant bike
>> shedding around names (for example, removing the ambiguity of RECTYPE
>> and avoiding words like or containing NULL); remove the requirement
>> for a fixed name REC-NUM REC-FLOAT; be less prescriptive and more
>> descriptive to allow greater implementation flexibility; and so on.
>>
>> Polite comments welcome.
>>
>
>
> By your view, what is a minimal essential part of Recognizer API?

The following 3 parts;

1. Action-set
A 3 cell sized contiguous area, where each cell is an execution token
for an action. The first is executed during interpretation; the second
during compilation; and the third while POSTPONEing. No word is provided
for the creation or decomposition of the action-set.

2. Recognizer
A string parsing word that returns converted values and an action-set if
successful. A recognizer that requires the string to be part of SOURCE
(for instance, if it refers to or modifies >IN) must document this
requirement, otherwise the string can come from anywhere.

3. Recognizer Sequence
An ordered set of recognizers. A recognizer sequence is identified with
a cell sized opaque number.

(1) An action-set requires no supporting words, since

:noname plit postpone plit ;
' plit
' noop
create ACT-NUM , , ,

is sufficient to describe an action-set. There is a predefined
action-set ACT-FAIL

:noname -13 throw ; dup dup
create ACT-FAIL , , ,

This I originally called UNRECOGNIZED.

(I can't decide whether named action-sets for such things as ACT-NUM
ACT-DNUM or ACT-STRING are really required or should be supplied by the
recognizer writer. Will add-on libraries use any of a default set?
Debatable.)

(2) A recognizer ss a template REC-SOMETHING of the form

REC-SOMETHING ( addr len -- i*x ACT-SOMETYPE )

where ACT-SOMETYPE is one of a number of user supplied action-sets or
the system supplied ACT-FAIL.

(3) A recognizer sequence has

GET-RECSEQ ( rec-seq -- rec-n .. rec-1 n )
SET-RECSEQ ( rec-n .. rec-1 n rec-seq -- )
NEW-RECSEQ ( n -- rec-seq )

To execute through a sequence;

RECOGNIZE ( addr len rec-seq -- i*x ACT-SOMETYPE )

A system, if it provides them, must provide system recognizers named

REC-NUM ( addr len -- n ACT-NUM | d ACT-DNUM | ACT-FAIL )
REC-NAME ( addr len -- nt ACT-NAME | ACT-FAIL )

and action-sets for ACT-NUM ACT-DNUM and ACT-NAME that match the stack
signatures above.

The float wordset can provide with a corresponding action-set

REC-FLOAT ( addr len -- ACT-FLOAT | ACT-FAIL ) (F: -- f | )

The string wordset can provide with a corresponding action-set

REC-STRING ( addr len -- ACT-STRING | ACT-FAIL )

Other wordsets might reserve certain names; for example the locals
wordset might provide for the parse of locals in {: type: name :}

REC-LOCAL ( addr len -- addr len type ACT-LOCAL | ACT-FAIL )

There's no REC-FIND; why encourage deprecated behaviour?

>
> It is a part that can be a basis to define other parts in a portable
> manner, and that should be standardized since it cannot be implemented
> in a portable manner.
>
> I mean other parts may be standardized despite they can be implemented
> in a portable manner — e.g., due to high demand and reusing factor.
>
>
>
> --
> Ruvim

--
Alex

Ruvim

unread,

Jun 23, 2020, 1:07:46 PM6/23/20

to

Thank you for this.

As I can see, the recognizer sequence and the corresponding RECOGNIZE
word can be implemented in a portable way. So, they are not in the
essential part.

Also, all mentioned ACT-* words except ACT-FAIL and ACT-LOCAL can be
implemented in a portable way in accordance to (1). So, they are not in
the essential part.

Also, all mentioned REC-* words except REC-LOCAL can be implemented in a
portable way. So, they are not in the essential part.

What is missed is the method that affects the Forth text interpreter.
It is "TO FORTH-RECOGNIZER" in Recognizer API v4, "SET-RECOGNIZERS" in
Recognizer API rephrase 2020, "SET-PERCEPTOR" in my proposal.

Also I see a quite fuzzy and various terminology. I would like us to use
the same more accurate terminology.

Could you write me back an email?

--
Ruvim

A. K.

unread,

Jun 23, 2020, 4:25:49 PM6/23/20

to

Thinking about macro expansion ie. modfying source.
Where does that fit in?

Alex McDonald

unread,

Jun 23, 2020, 5:57:41 PM6/23/20

to

I don't understand what you mean by a Forth macro. Anyhow, the proposal
doesn't address modification of the source. There is no guarantee made
that the string passed to a recognizer is modifiable. SOURCE certainly
isn't.

--
Alex

a...@littlepinkcloud.invalid

unread,

Jun 24, 2020, 5:28:30 AM6/24/20

to

Alex McDonald <al...@rivadpm.com> wrote:
> A recognizer that requires the string to be part of SOURCE
> (for instance, if it refers to or modifies >IN) must document this
> requirement, otherwise the string can come from anywhere.

I take it, then, that there's been no progress on making recognizers
can work with e.g. LOCATE without side effects? I haven't been keeping
up.

Andrew.

Ruvim

unread,

Jun 24, 2020, 5:29:18 AM6/24/20

to

E.g.

: [macro1] "long expansion" evaluate ; immediate

But it doesn't modify the source.

A hygienic compiling macro

: [macro2] ['] long compile, ['] expansion compile, ; immediate

A hygienic translating macro

: [macro3] ['] long tt-xt ['] expansion tt-xt ; immediate

Ruvim

unread,

Jun 24, 2020, 5:42:49 AM6/24/20

to

Actually, any recognizer with side effects can be converted into a
recognizer without side effect.

So the corresponding permission of side effects isn't necessary.

I showed an example of recognizer for string literals:
https://git.io/JfpKX

--
Ruvim

a...@littlepinkcloud.invalid

unread,

Jun 24, 2020, 6:01:16 AM6/24/20

to

How does it work? There's no way AFAICS to tell a recognizer that you
don't want any side effects, you just want to know if that recognizer
would, given the chance, recognize a word. So how do you call it?

"NB: POSTPONE is not applicable to a string literal containing
blanks." is a bit worrying.

Andrew.

m...@iae.nl

unread,

Jun 24, 2020, 6:51:25 AM6/24/20

to

Does that mean 'POSTPONE S" abort" 1 0 ! ABORT" huh"' works,
but 'POSTPONE S" "' does not? Good to know.

-marcel

Alex McDonald

unread,

Jun 24, 2020, 6:56:40 AM6/24/20

to

I haven't made any such progress, no, since I've only picked this up in
the last few weeks. But I will add it to the pile marked "Please
Consider This".

--
Alex

Alex McDonald

unread,

Jun 24, 2020, 7:13:45 AM6/24/20

to

Having a string that terminates on a blank and does not support \
escaped characters is not really what I would expect from a quality
quoted string implementation.

POSTPONE " Give me a break"
POSTPONE "abort\" 1 0 ! abort\" huh"

These should all work, and they do in my system. The restriction lies
with this specific implementation, not the specification.

IMO as to POSTPONE S" PINGPONG" it should not postpone the entire
string; it should POSTPONE S" and fail on PINGPONG" not found.

--
Alex

Ruvim

unread,

Jun 24, 2020, 7:51:59 AM6/24/20

to

1. Side effects of recognizers are either permitted or prohibited by the
specification.

If they are prohibited, and a system or program provide a recognizer
with side effects — this system/program just non standard. And we don't
have to consider non standard programs at all.

So, it is the specification who tells a recognizer may it have side
effect or not.

On the other hand, there is no any necessity to permit side effects for
recognizers.

2. "recognizable?" word was suggested in the Recognizer API v4 comments
https://vee.gg/gVXmN

In the API that I currently suggest, some "recognizable-by" word can be
implemented as:

: recognizable-by ( c-addr u xt-recognizer -- flag )
[: execute dup 0<> throw ;] catch if drop 2drop true then
;

In assumption that recognizers are prohibited to have any side effects,
and their results are located on the data and floating-point stacks. So
THROW clears these results.

> "NB: POSTPONE is not applicable to a string literal containing
> blanks." is a bit worrying.

Standard POSTPONE is applicable to a Forth word only.

Recognizers *can* extend it to any space-delimited lexeme, and even to
multiple such lexemes, or to a lexeme delimited by anything else.

But I think we should stop on a space-delimited lexeme and not further.

For example, we have a construct
foo{ bar }
that works in compiling.

If we know that recognizers don't have side effects (and don't change or
read SOURCE), we know that
POSTPONE foo{
works correctly and append compilation semantics for "foo{" into the
current definition.

But if recognizers may do additional parsing, you don't know in advance
what is correct:
POSTPONE foo{
or
POSTPONE foo{ bar }

And what is compilation semantics for "foo{" in the latter case?

By the Standard, in the glossary entry for POSTPONE
https://forth-standard.org/standard/core/POSTPONE

Skip leading space delimiters. Parse _name_ delimited by a space.
Find _name_. Append the compilation semantics of _name_
to the current definition.

If the Recognizers word set is provided, this specification can be
updated to something like the following:

Skip leading space delimiters. Parse _lexeme_ delimited by a space.
Recognize _lexeme_. Append the compilation semantics for _lexeme_
to the current definition.

It should append compilations semantics for *the same* _lexeme_ that was
parsed.

Let _lexeme_ is "foo{". So it should append compilation semantics for
lexeme "foo{". Not for lexeme "foo{ bar }" or anything else.

The same for string literals.
A code like
POSTPONE "foo bar"
should be incorrect. It should not be exclusion from the general rule.

If we want to postpone (compile) fragments of code, the right ways is
something like c{ ... }c construct that properly works for *any* code.

: postpone-my-fancy-code
c{ "foo bar" ( c-addr u ) type }c

c{
foo{ bar } [defined] x [if] x [then]
}c
;

: foo [ c{ "test passed" }c ] type ; foo

https://github.com/ruv/forth-on-forth/blob/master/c-state.readme.txt

--
Ruvim

Ruvim

unread,

Jun 24, 2020, 8:10:53 AM6/24/20

to

On 2020-06-24 14:13, Alex McDonald wrote:
> On 24-Jun-20 11:51, m...@iae.nl wrote:
>> On Wednesday, June 24, 2020 at 12:01:16 PM UTC+2,
>> a...@littlepinkcloud.invalid wrote:
>>> Ruvim <ruvim...@gmail.com> wrote:

[...]

>>>> Actually, any recognizer with side effects can be converted into a
>>>> recognizer without side effect.
>>>>
>>>> So the corresponding permission of side effects isn't necessary.
>>>>
>>>> I showed an example of recognizer for string literals:
>>>> https://git.io/JfpKX

>>> "NB: POSTPONE is not applicable to a string literal containing
>>> blanks." is a bit worrying.
>>
>> Does that mean 'POSTPONE S" abort" 1 0 ! ABORT" huh"' works,
>> but 'POSTPONE S" "' does not? Good to know.
>>
>> -marcel
>>
>
> Having a string that terminates on a blank and does not support \
> escaped characters is not really what I would expect from a quality
> quoted string implementation.
>
> POSTPONE " Give me a break"
> POSTPONE "abort\" 1 0 ! abort\" huh"
>
> These should all work, and they do in my system.

What about
POSTPONE foo{ bar }

should it work or not? — you cannot know it in advance. It depends on a
nuance in the specific recognizer implementation. I suggest to eliminate
this nuance.

> The restriction lies
> with this specific implementation, not the specification.

This implementation was specially designed to show that the
specification can by more tight.

>
> IMO as to POSTPONE S" PINGPONG" it should not postpone the entire
> string; it should POSTPONE S" and fail on PINGPONG" not found.

Yes.

The problem is that your POSTPONE in some cases parses only one
space-delimited lexeme, in other cases it parses many lexemes, and user
don't know in advance, how many lexemes it will parse.

And why does
POSTPONE S"
work in one way (i.e. works), but
POSTPONE "
work in another way (i.e. doesn't work)???

--
Ruvim

none albert

unread,

Jun 24, 2020, 8:29:26 AM6/24/20

to

In article <rcvch7$uqk$1...@dont-email.me>,

Alex McDonald <al...@rivadpm.com> wrote:
>On 24-Jun-20 11:51, m...@iae.nl wrote:
>> On Wednesday, June 24, 2020 at 12:01:16 PM UTC+2, a...@littlepinkcloud.invalid wrote:

<SNIP>

>>>
>>> "NB: POSTPONE is not applicable to a string literal containing
>>> blanks." is a bit worrying.
>>
>> Does that mean 'POSTPONE S" abort" 1 0 ! ABORT" huh"' works,
>> but 'POSTPONE S" "' does not? Good to know.
>>
>> -marcel
>>
>
>Having a string that terminates on a blank and does not support \
>escaped characters is not really what I would expect from a quality
>quoted string implementation.
>
>POSTPONE " Give me a break"
>POSTPONE "abort\" 1 0 ! abort\" huh"
>
>These should all work, and they do in my system. The restriction lies
>with this specific implementation, not the specification.
>
>IMO as to POSTPONE S" PINGPONG" it should not postpone the entire
>string; it should POSTPONE S" and fail on PINGPONG" not found.

I can hardly follow this. POSTPONE is getting way too smart for
my taste. I *hate* smart words.

>
>--
>Alex

Alex McDonald

unread,

Jun 24, 2020, 10:04:08 AM6/24/20

to

It's possible in a specification to be too tight and restrict behaviour
that might be considered poor practice by some and essential by others.
The proposal can and should remain completely silent on this subject.

To suggest that the quoted string " something with \nan escape" should
be eliminated seems obtuse, and flies in the face of current practice.

It is certainly true that POSTPONE S" should not parse and postpone the
rest of the string since that is the current behaviour of POSTPONE S"
and I would not recommend that we consider changing that behaviour. It
may break existing applications.

>
>
>
>> The restriction lies with this specific implementation, not the
>> specification.
>
> This implementation was specially designed to show that the
> specification can by more tight.

It does not need to be.

>
>
>
>>
>> IMO as to POSTPONE S" PINGPONG" it should not postpone the entire
>> string; it should POSTPONE S" and fail on PINGPONG" not found.
>
> Yes.
>
> The problem is that your POSTPONE in some cases parses only one
> space-delimited lexeme, in other cases it parses many lexemes, and user
> don't know in advance, how many lexemes it will parse.

Could I ask that you stop calling words "lexemes"? There is enough Forth
specific jargon around words and parsing without needing to introduce more.

In the case of quoted strings, the provider of such a system can
document exactly how such a thing is to be parsed. My implementation
supports UTF-8 and escapes between the quotes, but not multi-line input,
and so my string recognizer allows for (where ]] is the postpone "state")

]] "a string\naon two lines" type [[

This seems reasonable and useful, and I have no idea why you might want
to disallow it.

>
> And why does
> POSTPONE S"
> work in one way (i.e. works), but
> POSTPONE "
> work in another way (i.e. doesn't work)???

There is no word " in the standard, so how it parses is entirely up to
the implementer of the Forth system. It is, to coin a not too helpful
phrase, an ambiguous condition.

>
>
> --
> Ruvim

--
Alex

Alex McDonald

unread,

Jun 24, 2020, 10:06:13 AM6/24/20

to

On 24-Jun-20 13:28, albert wrote:
> In article <rcvch7$uqk$1...@dont-email.me>,
> Alex McDonald <al...@rivadpm.com> wrote:

>>
>> IMO as to POSTPONE S" PINGPONG" it should not postpone the entire
>> string; it should POSTPONE S" and fail on PINGPONG" not found.
>
> I can hardly follow this. POSTPONE is getting way too smart for
> my taste. I *hate* smart words.

This is exactly what any current Forth system should do today when
presented with POSTPONE S" PINGPONG" so I'm not sure what it is you're
not understanding.

--
Alex

Anton Ertl

unread,

Jun 24, 2020, 11:14:52 AM6/24/20

to

a...@littlepinkcloud.invalid writes:
>I take it, then, that there's been no progress on making recognizers
>can work with e.g. LOCATE without side effects? I haven't been keeping
>up.

Why should anybody make any progress on that? Various Forth systems
have allowed adding new recognizers in non-standard ways, and adding a
recognizer does not make the things recognized by it LOCATEable (if
they have LOCATE at all). So obviously few or no users of these
systems have asked for LOCATEable recognized things, and consequently
the system implementors did not implement such a feature. Given that,
why worry about LOCATEing things recognized by recognizers with side
effects? "Don't speculate!"

Moreover, LOCATE has not even been proposed for standardization. Why
should anyone worry about the interaction of a proposal with such a
feature?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2020: https://euro.theforth.net/2020

a...@littlepinkcloud.invalid

unread,

Jun 24, 2020, 12:35:45 PM6/24/20

to

Well, there kind-of is, if S" is to be a recognizer, surely?

> 2. "recognizable?" word was suggested in the Recognizer API v4 comments
> https://vee.gg/gVXmN

Yes. And that's a good idea, unlike much of the proposal. (IMVHO,
YMMV, &c.)

>> "NB: POSTPONE is not applicable to a string literal containing
>> blanks." is a bit worrying.
>
> Standard POSTPONE is applicable to a Forth word only.
>
> Recognizers *can* extend it to any space-delimited lexeme, and even to
> multiple such lexemes, or to a lexeme delimited by anything else.
>
> But I think we should stop on a space-delimited lexeme and not further.

That sounds sensible to me, but I fear it will not satisfy the
Recognizer maximalists.

Andrew.

a...@littlepinkcloud.invalid

unread,

Jun 24, 2020, 12:43:30 PM6/24/20

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> a...@littlepinkcloud.invalid writes:
>>I take it, then, that there's been no progress on making recognizers
>>can work with e.g. LOCATE without side effects? I haven't been keeping
>>up.
>
> Why should anybody make any progress on that?

Because, as I have said many times, it's a very important part of
interaction that LOCATE (or its variations) go to the source
code of a word. This will not change.

> So obviously few or no users of these systems have asked for
> LOCATEable recognized things, and consequently the system
> implementors did not implement such a feature.

Well, no. I suspect they've never used a Forth system with LOCATE to
do any actual work on code they didn't write themselves.

> Given that, why worry about LOCATEing things recognized by
> recognizers with side effects? "Don't speculate!"

See above, Para 1.

> Moreover, LOCATE has not even been proposed for standardization. Why
> should anyone worry about the interaction of a proposal with such a
> feature?

See above, Para 1.

Right now, in any sufficiently high-quality implementation you can
LOCATE more or less any word you see in Forth source code and find out
what it does. This isn't true of numeric literals, but they are mostly
obvious. The recognizer proposal breaks this fundamental poperty.
Therefore, in order for Forth code using recognizers to be reasonably
maintainable in a way that has worked for decades, this is required.

The Recognizer proponents are proposing to break a fundamental
property in a way that cannot be fixed.

Andrew.

Alex McDonald

unread,

Jun 24, 2020, 12:51:01 PM6/24/20

to

On 24-Jun-20 17:35, a...@littlepinkcloud.invalid wrote:
>>> "NB: POSTPONE is not applicable to a string literal containing
>>> blanks." is a bit worrying.
>> Standard POSTPONE is applicable to a Forth word only.
>>

>> Recognizers*can* extend it to any space-delimited lexeme, and even to

>> multiple such lexemes, or to a lexeme delimited by anything else.
>>
>> But I think we should stop on a space-delimited lexeme and not further.
> That sounds sensible to me, but I fear it will not satisfy the
> Recognizer maximalists.

The proposal doesn't need to address it at all, so it's doesn't need
considered (with the exception of noting that >IN can be modified either
directly or by subsequent parsing words in the recognizer).

It's up to those who wish to provide recognizers to do such a maximalist
thing. Some have implemented them; I have one for quoted strings, and if
I understand the so does gforth with its STRING-RECOGNIZER word.

But if you don't like or want it, there's always the traditional S" and
S\". Just don't expect ]] S" won't work" TYPE [[ to work.

--
Alex

Alex McDonald

unread,

Jun 24, 2020, 12:53:30 PM6/24/20

to

Well, right now we can't LOCATE 5 ; that part of the existing
interpreter can't be fixed either.

--
Alex

m...@iae.nl

unread,

Jun 24, 2020, 12:56:33 PM6/24/20

to

On Wednesday, June 24, 2020 at 5:14:52 PM UTC+2, Anton Ertl wrote:
> a...@littlepinkcloud.invalid writes:
> >I take it, then, that there's been no progress on making recognizers
> >can work with e.g. LOCATE without side effects? I haven't been keeping
> >up.
>
> Why should anybody make any progress on that? Various Forth systems
> have allowed adding new recognizers in non-standard ways, and adding a
> recognizer does not make the things recognized by it LOCATEable (if
> they have LOCATE at all). So obviously few or no users of these
> systems have asked for LOCATEable recognized things, and consequently
> the system implementors did not implement such a feature. Given that,
> why worry about LOCATEing things recognized by recognizers with side
> effects? "Don't speculate!"
>
> Moreover, LOCATE has not even been proposed for standardization. Why
> should anyone worry about the interaction of a proposal with such a
> feature?

Why would I want a recognizer to help me with
'LOCATE pneumonoultramicroscopicsilicovolcanoconiosis' ?

-marcel

Anton Ertl

unread,

Jun 24, 2020, 12:58:31 PM6/24/20

to

Ruvim <ruvim...@gmail.com> writes:
>What about
> POSTPONE foo{ bar }
>
>should it work or not? — you cannot know it in advance.

If

foo{ bar }

is one recognized thing, then yes, POSTPONEing it should work. We can
discuss whether it's a good idea to have a recognizer that recognizes
"foo{ bar }".

>The problem is that your POSTPONE in some cases parses only one
>space-delimited lexeme, in other cases it parses many lexemes, and user
>don't know in advance, how many lexemes it will parse.
>
>And why does
> POSTPONE S"
>work in one way (i.e. works), but
> POSTPONE "
>work in another way (i.e. doesn't work)???

POSTPONE S"

does what the standard describes, but that is often not what the user
wants. If the user wants to postpone S" bla", he has to write

S" bla" POSTPONE SLITERAL

Not very intuitive, but for S", this cannot be changed. For the
string recognizer, if you want to postpone "bla", I write

POSTPONE "bla"

And of course this also works if the string contains a space.

m...@iae.nl

unread,

Jun 24, 2020, 1:12:38 PM6/24/20

to

This explains the problem, but like '5', '"bla"' is a literal.
These have a special place in Forth in that they can, but
usually aren't, in the dictionary. It is a matter of
convenience that we expect to POSTPONE such things with the
same magic crackle that makes them findable.

Are there examples of other existing objects that need
this approach?

-marcel

Alex McDonald

unread,

Jun 24, 2020, 1:35:04 PM6/24/20

to

LOCATE pneumonoultramicroscopicsilicovolcanoconiosis
. ^
Error -19 definition name too long; programs with definition names
longer than 31 characters have an environmental dependency.

--
Alex

a...@littlepinkcloud.invalid

unread,

Jun 24, 2020, 2:06:35 PM6/24/20

to

Alex McDonald <al...@rivadpm.com> wrote:
>
> Well, right now we can't LOCATE 5 ; that part of the existing
> interpreter can't be fixed either.

Yes, I know, and I even said so in the material you quoted, but
recognizers bring with them some evil properties, in particular the
fact that

locate ->foo

may or may not work. It is not recognizable as a literal to someone
maintaining a program. You, Joe maintenace programmer, really need at
this point to be told that -> is a recognizer prefix. And preferably
need to be delivered to the source code of that recognizer.

The fact that the Recognizer proponents can't (or won't) see this
suggests to me that they're not realistic about maintaining other
people's code or that they've never worked in this way.

Andrew.

Alex McDonald

unread,

Jun 24, 2020, 2:29:40 PM6/24/20

to

On 24-Jun-20 19:06, a...@littlepinkcloud.invalid wrote:
> Alex McDonald <al...@rivadpm.com> wrote:
>>
>> Well, right now we can't LOCATE 5 ; that part of the existing
>> interpreter can't be fixed either.
>
> Yes, I know, and I even said so in the material you quoted, but

I see that now, apologies.

> recognizers bring with them some evil properties, in particular the
> fact that
>
> locate ->foo
>
> may or may not work. It is not recognizable as a literal to someone
> maintaining a program. You, Joe maintenace programmer, really need at
> this point to be told that -> is a recognizer prefix. And preferably
> need to be delivered to the source code of that recognizer.

For example an action-set provides interpret compile and postpone
actions; a fourth action could provide LOCATE information.

>
> The fact that the Recognizer proponents can't (or won't) see this
> suggests to me that they're not realistic about maintaining other
> people's code or that they've never worked in this way.

I'll add it to the "Please Consider This" pile.

>
> Andrew.
>

--
Alex

none albert

unread,

Jun 24, 2020, 7:52:13 PM6/24/20

to

In article <PI6dnc2YQsy3C27D...@supernews.com>,

I'm and I find it totally unrealistic, that I would not start by
knowing all recognizers by heart, lest the code becomes unreadable.
It is like reading c and not knowing what { is supposed to mean.

>
>Andrew.

Ruvim

unread,

Jun 25, 2020, 4:54:58 AM6/25/20

to

On 2020-06-24 21:06+03, a...@littlepinkcloud.invalid wrote:
> Alex McDonald <al...@rivadpm.com> wrote:

>> On 2020-06-24 19:43+03, a...@littlepinkcloud.invalid wrote:
[...]

>>> Right now, in any sufficiently high-quality implementation you can
>>> LOCATE more or less any word you see in Forth source code and find out
>>> what it does. This isn't true of numeric literals, but they are mostly
>>> obvious. The recognizer proposal breaks this fundamental poperty.
>>> Therefore, in order for Forth code using recognizers to be reasonably
>>> maintainable in a way that has worked for decades, this is required.
>>>
>>> The Recognizer proponents are proposing to break a fundamental
>>> property in a way that cannot be fixed.

By my view, the mentioned fundamental property *can* be even better with
recognizers (in the certain variant of the specification).

All these variant can properly work:

locate {:
locate ]]
locate ->foo
locate foo{
locate bar"
locate "baz
locate forth-wordlist::drop
locate 'dup
locate 'a'
locate $F
locate 5

I.e., even for a number it can show some corresponding information.

Please refer to the terms that I meticulously defined:
https://git.io/JfhaI (comments are welcome)

A recognizer produces a token descriptor.
Every token descriptor is created by a method that a Forth system
provides. So the Forth system can know the exactly place (in the source
codes) where each descriptor was defined.

For some predefined descriptors a Forth system can show additional
information (e.g. that it is a number, or word and its source location,
etc). And for any descriptor it can show the places where this
descriptor was defined.

A sketch is following.
It assumes that side effects for recognizers are prohibited by the
specification.

\ take i, drop i items from the data stack
\ ndrop ( i*x i -- )

\ take i, drop i items from the floating-point stack
\ nfdrop ( F: i*f -- ) ( i -- )

\ take xt, execute xt,
\ return changes in the data stack and the floating-point stack
\ execute-balance2 ( ... xt -- ... n-data n-float )

\ show source location for a name token nt
\ located-nt ( nt -- )

\ show source location for an execution token xt
\ located-xt ( xt -- )

\ show source location for a general token descriptor td
\ located-td ( td -- )

\ show some information for ccc
: locate ( "ccc" -- )
[:
parse-lexeme recognize-lexeme ( token{k*x} td )
;] execute-balance2 ( ... td u-data u-float ) 2>r
case ( token{k*x} td )
0 of ." unknown" endof
dt-lit of ." number: " . endof
dt-2lit of ." double number: " d. endof
dt-flit of ." float number: " f. endof
dt-slit of ." string literal: " type endof
dt-nt of ." word: " dup name>string type located-nt endof
dt-xt of find-nt-for-xt ( xt -- xt false | nt true )
if ." xt for word: " dup name>string type located-nt
else ." anonymous definition: " dup . located-xt then
endof
\ other implementation dependent variants, if any
\ ...

( token{k*x} td ) ( R: u-data u-float )
\ drop token components
2r@ rot >r nfdrop 1- ndrop r> ( td )

\ general case
." recognizable lexeme, the descriptor is following: "
located-td

0 endcase rdrop rdrop cr
;

--
Ruvim