Hi,
following a summary of the topics discussed so far and
what I comment on them. The management summary is simple
The proposal in the 1st RFD is not changed.
In the following text I try to discuss every topic
and the reasons why I think that the proposal is at
least good enough.
The text is copy'n'paste-ed from my master document
at
http://amforth.sourceforge.net/pr/Recognizer-rfc-B.pdf
and
http://amforth.sourceforge.net/pr/Recognizer-rfc-B.text
(plain ascii). The links remain stable for the forseeable
future, but until the status is changed to "final" the content
may change.
You may also note that I mention some things not really discussed
yet (e.g. tick-ing a word). This second version is still
work-in-progress, so I hope for and welcome feedback.
Extended Rationale from the discussion of Version 1
There is an almost common agreement that recognizers shall
replace the default command interpreter behaviour if provided
by the system implementer. Andrew Haley suggests that
recognizers should be used as a least resort tool only if the
standard text interpreter cannot deal with the input data. That
means that the interpreter will always handle the dictionary
searches and the number checks itself and only if they fail
activates the recognizer stack. This leaves the interpreter
untouched but removes the full flexibility. The final wordings
may find a solution for that. The majority questions the
usefulness of such a 2 class interpreter design.
Name Tokens
Name Tokens (nt) are part of the Forth 2012 Programming Tools
word set.
The words found in the dictionary with FIND return the
execution token and the immediate flag. Using the Programming
Tools word set, the dictionary lookup can be done based on
TRAVERSE-WORDLIST called e.g. REC:NAME ( addr len -- nt
R:NAME|R:FAIL). The major difference to FIND is that all header
information is available to handle the token:
:NONAME NAME>INTERPRET EXECUTE ; ( nt -- ) \ interpret
:NONAME NAME>COMPILE EXECUTE ; ( nt -- ) \ compile
:NONAME NAME>COMPILE SWAP POSTPONE LITERAL COMPILE, ; \ postpone
RECOGNIZER: R:NAME
To handle a set of word lists like the order stack additional
steps have to be made. E.g. separate recognizing word for earch
word list that in turn get combined in the recognizer stack.
Search Order Word Set
A large part of the Search Order word set is close to what
recognizers do for dictionary searches. The Order stack can be
seen as a subset of the recognizer stack. The words handling
the order stack (ALSO, PREVIOUS, FORTH, ONLY etc) may be
extended/changed to handle the recognizer stack too/instead.
On the other hand, ALSO is essentially DUP on a different
stack. ONLY and FORTH set a predefined stack content. With the
GET/SET-RECOGNIZERS words all changes can be prepared on the
data stack with the usual data stack words.
A further difference between word lists and recognizers is that
their identification tokens are not interchangable. There is no
relation between a wordlist identifier and a recognizer
identifier (the execution token of a REC: word).
Completely unrelated is SET/GET-CURRENT. Recognizers don't deal
with the places, new words are put into. Possible changes here
are not considered part of the recognizer word set proposal.
A complete redesign of the Search Order word set affects many
programs, worth an own RFD. The common tools to actually
implement both recognizer and search order word sets may be
useful for themselves.
GET/SET-RECOGNIZERS
An alternative solution are words inspired by those that link
the data stack and return stack: >R and R>. Likewise a
>RECOGNIZER would put the new item on the top of the recognizer
stack. Since this element is processed first in DO-RECOGNIZER,
the action prepends to the recognizer stack, which is less
convenient. Having the recognizer loop acting the other way
(bottom up) is no less confusing and therefore not an option
too. Furthermore I expect that most changes to the recognizer
stack take place at the end (bottom) of it appending a new
recognizer. Since there is no commonly agreed way to access a
stack at its bottom, words like N>R and NR>, that are in fact
the proposed GET/SET-RECOGNIZERS words, are needed and all
changes taking place on the data stack. Even more difficult is
the task to insert or remove a recognizer in the middle. Again
the standard data stack words are the simplest way to do it.
Postpone and '
Adding a POSTPONE method has been seen as overly complex. A big
issue is that POSTPONE is defined for wordlist entries only.
Unless a common agreement is found what POSTPONE means to other
data or other actions, the POSTPONE shall be applied to named
entries from wordlist only. All other data types should default
to "-48 THROW". (An ambigous situation).
To drop the POSTPONE method is not an option either. Consider a
recognizer that searches a hidden word list when certain
criteria are met (e.g. a prefix is found). These words could be
interpreted and compiled correctly, but could not be postponed
since POSTPONE would not find them. A solution like
: POSTPONE
PARSE-NAME REC:WORD R:FAIL =
IF
\ system specific error action
ELSE
\ system specific postpone action
THEN
; IMMEDIATE
makew little sense unless system specific knowledge is used.
Name tokens and REC:NAME instead of REC:WORD can greatly
simplify this task.
Implementing ' (tick) is a related topic, but much simpler. It
shall use the REC:WORD internally to achieve a consistent
behaviour.
: ' PARSE-NAME REC:WORD R:FAIL =
IF
\ system specific error action if not found
ELSE
DROP \ ignore immediate flag
THEN
;
The name token based recognizer would be close
: ' PARSE-NAME REC:NAME R:FAIL =
IF
\ system specific error action if not found
ELSE
NAME>INTERPRET
THEN
;
2-Method API
Anton Ertl suggested an alternatitive implementation of the
recognizer. Basically all text data is converted into a literal
at parse time. Later the interpreter decides whether to either
to do an execute or a compilation action with the literal data,
depending on STATE. POSTPONE is a combination of storing the
literal data together with their compile time action.
System Message: WARNING/2 (Recognizer-rfc-B.txt, line 801)
Cannot analyze code. No Pygments lexer found for "none".
interpretation: conv final-action
compilation: conv literal-like postpone final-action
postpone:
conv literal-like postpone literal-like postpone final-action
The conv-action is what is done inside the DO-RECOGNIZERS
action (REC:* words) and the literal-like and final-action set
replaces the proposed 3 method set in R:*. It is not yet clear
whether this approach covers the same range of possibilities as
the proposed one. Another side effect is that postponing
literals like numbers becomes possible without further notice.
A complete reference implementation does not yet exist, some
aspects were published at comp.lang.forth by Jenny Brien.
Stateless interpreter
An alternative implementation of the interpreter without STATE.
For legacy applications a STATE variable is maintained but not
used.
The code depends on DEFER and IS to be present. Similiar code
can be found in gforth and win32forth.
\ legacy state support
VARIABLE STATE
: on ( addr -- ) -1 SWAP ! ;
: off ( addr -- ) 0 SWAP ! ;
\ the two states of the interpreter
: (interpret-i) _R>INT EXECUTE ;
: (interpret-c) _R>COMP EXECUTE ;
DEFER (interpret) ' (interpret-i) IS (interpret)
\ switch interpreter modes
: ] STATE on ['] (interpret-c) IS (interpret) ;
: [ STATE off ['] (interpret-i) IS (interpret) ; IMMEDIATE
: interpret
BEGIN
PARSE-NAME DUP \ get something
WHILE
DO-RECOGNIZER \ analyze it
(interpret) \ act on data, maybe leave the loop
?stack \ simple housekeeping
REPEAT 2DROP
;
R:FAIL and Exceptions
The R:FAIL word has two purposes. One is to deliver a boolean
information whether a parsing word could deal with a word. The
other task is the method table of for the interpreter to
actually handle the parsed data, this time by generating a
proper error message and to leave the interpreter. While the
method table simplifies the interpreter loop, the flag
information seems to be odd. On the other hand a comparision of
the returned R:* token with the constant R:FAIL can easily be
optimized.
A completely different approach is using exceptions to deliver
the flag information. Using them requires the exception word
set, which may not be present on all systems. In addition, an
exception is a somewhat elaborate error handling tool and
usually means than something unexpected has happened. Matching
a string to a sequence of patterns means that exceptions are
used in a normal flow of compare operations. Exceptions are an
unusual way to organize a control flow.