Based on the feedback there are again some
changes.
First: I think I've identified the building blocks
that haven't (really) changed over the time.
The design of a recognizer as a combination of
a parsing word and 3 data handling methods
associated with the data is one of them.
Similarly the idea to group recognizers as
stacks is now "carved in stone".
What has changed here over the time? The POSTPONE
action turned into one that literally follows
the existing spec. It compiles all data necessary to
append the compilation action into the dictionary for
later use. Part of this is generic, part is data
dependent. The words that deal with the recognizer
stacks now have a stack-identifier as an additional
parameter. For system words, a common anchor
FORTH-RECOGNIZER is introduced as VALUE.
Second: I removed all use cases. I got many
complaints about changing too much (interpreter)
or too little (search order). In fact most
remarks were about such use cases. It is great to
have a tool with such a wide range of possible
uses, but they are not really part of this game.
With the current spec I want to achieve two goals:
Keep the early adopters happy (enough) and invite
all others to at least give recognizers a try
without fearing that their systems will get
conquered.
Third: The rationale section covers the design
decisions and why they were taken. That includes
alternatives as well. It's more or less an excerpt
of 5 years of work and experience. Some of the
use cases went to it for inspiration purposes.
The RFD documents are available via
http://www.forth200x.org/ More information
including some sources is at
http://amforth.sf.net/Recognizers.html
Thanks to all, who gave feedback and more. I highly
appreciate your work, if I forget someone, please
contact me.
--------- Core part of the RFD v3 -----------
XY.6 Glossary
XY.6.1 Recognizer Words
DO-RECOGNIZER ( addr len stack-id -- i*x R:TABLE | R:FAIL )
RECOGNIZER
Apply the string at "addr/len" to the elements of the
recognizer stack identified by stack-id. Terminate the
iteration if either one recognizer returns a information
token that is different from R:FAIL or the stack is
exhausted. In this case return R:FAIL.
"i*x" is the result of the parsing word. It represents
the data from the string. It may be on other locations
than the data stack. In this case the stack diagram
should be read accordingly.
FORTH-RECOGNIZER ( -- stack-id ) RECOGNIZER
A system VALUE with a recognizer stack id.
It is VALUE that can be changed using TO assigning a new
recognizer stack id. This change has immediate effect.
The recognizer stack from this stack-id shall be used in
all system level words like EVALUATE, LOAD etc.
GET-RECOGNIZERS ( stack-id -- rec-n .. rec-1 n ) RECOGNIZER
Return the execution tokens rec-1 .. rec-n of the
parsing words in the recognizer stack identified with
stack-id. rec-1 identifies the recognizer that is called
first and rec-n the word that is called last.
The recognizer stack is left unchanged.
R>COMP ( R:TABLE -- XT-COMPILE ) RECOGNIZER
Return the execution token for the compilation action
from the recognizer information token.
R>INT ( R:TABLE -- XT-INTERPRET ) RECOGNIZER
Return the execution token for the interpretation action
from the recognizer information token.
R>POST ( R:TABLE -- XT-POSTPONE ) RECOGNIZER
Return the execution token for the postpone action from
the recognizer information token.
R:FAIL ( -- R:FAIL ) RECOGNIZER
An information token with two uses: First it is used to
deliver the information that a specific recognizer could
not deal with the string passed to it. Second it is a
predefined information token whose elements are used
when no recognizer from the recognizer stack could
handle the passed string. These methods provide the
system error actions.
The actual numeric value is system dependent.
RECOGNIZER ( size -- stack-id ) RECOGNIZER
Create a new recognizer stack with size elements.
RECOGNIZER: ( XT-INTERPRET XT-COMPILE XT-POSTPONE
"<spaces>name" -- ) RECOGNIZER
Skip leading space delimiters. Parse name delimited by a
space. Create a recognizer information token "name" with
the three execution tokens.
The words for XT-INTERPRET, XT-COMPILE and XT-POSTPONE
are called with the parsed data that the associated
parsing word of the recognizer returned. The information
token itself is consumed by the caller.
Each of the words XT-INTERPRET, XT-COMPILE and
XT-POSTPONE has the stack effect ( ... i*x -- j*y ). The
words to compile and postpone the data shall consume the
data "i*x". If the data "i*x" is on different locations
(e.g. floating point numbers), these words shall use
that data.
SET-RECOGNIZERS ( rec-n .. rec-1 n stack-id -- ) RECOGNIZER
Set the recognizer stack identified by stack-id to the
recognizers identified by the execution tokens of their
parsing words rec-n .. rec-1. rec-1 will be the parsing
word of the recognizer that is called first, rec-n will
be the last one.
If the size of the existing recognizer stack is too
small to hold all new elements, an ambiguous situation
arises.
--------------