Comments please.
RfD: TRAVERSE-WORDLIST
----------------------
v1 20 January 2012, Alex McDonald
v2 24 January 2012, Alex McDonald
v3 03 February 2012, Alex McDonald
v4 28 February 2012, Alex McDonald
v4.1 03 March 2012, Alex McDonald
v5 04 September 2012, Alex McDonald
v6 25 July 2013, Alex McDonald
Change history:
20120120 First version
20120124 Minor typos, added section "Order of node visits"
Added to section "Remarks"
20120125 Minor typos, cleanup and added remarks
20120203 Changed order of stack for TRAVERSE-WORDLIST
to place the xt rightmost.
Removed return value from TRAVERSE-WORDLIST.
Changed xt-node halt flag to a continue flag.
Note that if xt-node modifies the wordlist, the
results are undefined.
Specify NAME>... supporting words in section 2.2.
Tidy up & correction of 4. Remarks
20120228 Section 2.1 and 2.2 moved and renumbered.
Rename NAME>... to NT>... to more clearly
indicate a name token rather than a name.
Tidy up table in section 3. Experience, and
add note re NT>... names.
Corrected Bruce McFarling's moniker
20120904 Updates based on comments by Anton Ertl, Peter Knaggs,
Bruce McFarling
20130725 Corrected stack signatures & reduce verbage
1. Problem
----------
There are no standard words in Forth for traversing a WORDLIST and
obtaining basic information about each node. While standard Forth
provides a great many features for extensibility of the language
(with CREATE ... DOES> being the classic example), standard Forth
lacks the basic capability of traversing the wordlist as a part of
the specification.
Such a capability is needed to provide some kinds of advanced
programming tools. For example, the programmer may want to determine
all instances of word name overlaps in all of the wordlists in the
current search order; or count, display or modify words contained in
a specific wordlist.
Many systems provide the TOOLS word WORDS that provides human-
readable output from the current wordlists in CONTEXT. TRAVERSE-
WORDLIST is a usable factor for the implementation of WORDS.
Wordlist traversal functionality is available through very similar
words in Win32Forth (VOC-ITERATE), VFX (WalkWordList), iForth
(doWORDS), lxf/ntf (WALK-WORDLIST), and ciforth (FOR-WORDS).
2. Proposal
-----------
TRAVERSE-WORDLIST ( xt wid -- ) "traverse-wordlist" TOOLS-EXT
Remove wid and xt from the stack. Traverse the wordlist wid,
executing xt (a user-specified word) once for every word in
the wordlist.
The invoked xt has the stack diagram ( nt -- u|0 ).
A non-zero value for u will invoke the xt again with a new nt until
all the nodes in the wordlist have been exhausted. Setting u to zero
(FALSE) terminates this traversal, and xt will not be invoked again.
xt will not be invoked if the wordlist wid is empty.
nt is a system-specific name token for the node. The xt can use this
token to display, count, modify or perform any other action on the
node that the system providing nt permits. The format of nt is
opaque, but it is guaranteed to be unique for each word in the
wordlist.
2.1 Opaque NTs
--------------
Since this proposal introduces the concept of an opaque nt (name
token), the following words allow system independent reference to
specific parts of the node.
NT>STRING ( nt -- addr count ) "name-to-string" TOOLS-EXT
NT>STRING returns the string addr count, the definition name of the
word represented by nt. Case is dependent on the case-sensitivity of
the Forth system (see /DPANS 3.3.1.2 Definition names/). For
restrictions see (3.1 NT>STRING Restrictions)
NT>INTERPRET ( nt -- xt ) "name-to-interpret" TOOLS-EXT
NT>INTERPRET returns an xt (execution token) as discussed in
/DPANS 3.4.3 Semantics/. This xt represents the execution semantics
of the definition. It may be zero if there are no execution semantics.
NT>COMPILE ( nt -- xt ) "name-to-compile" TOOLS-EXT
NT>COMPILE returns an xt (execution token) as discussed in
/DPANS 3.4. 3 Semantics/. This xt represents the compilation
semantics of the definition. It may be zero if there are no
compilation semantics.
Extend /DPANS Table 3.1 Data Types/
Symbol Data type Size on stack
nt name token 1 cell
Add section /DPANS 3.1.3.5 Name tokens/
Different definitions have different name tokens, although
information obtainable from the token may be identical for different
name tokens; for instance, NT>STRING for different values of nt may
return the same string.
2.2 Order of node visits
------------------------
TRAVERSE-WORDLIST gives no guarantee as to any specific order of node
visits to each word through the invoked xt with one exception.
If a wordlist contains duplicate entries, SEARCH-WORDLIST and FIND
for a duplicated name will return the execution token of the last
temporally defined name. For instance
: SAMPLE ... ( 1 ) ;
: TEST ... ;
: RETURNS ... ;
: SAMPLE ... ( 2 ) ;
: SAMPLE ... ( 3 ) ;
defines five words, one of which, SAMPLE, is defined three times.
SEARCH-WORDLIST and FIND will only return the last definition (3).
Whether duplicated nodes are visited or not by TRAVERSE-WORDLIST is
undefined. TRAVERSE-WORDLIST is only obliged to visit the nodes TEST,
RETURNS and SAMPLE (3) from the wordlist, and can do so in any order.
Some systems may permit visiting the node for the second and
subsequent definitions of SAMPLE. Exceptionally in this case,
TRAVERSE-WORDLIST *must* visit multiply defined nodes in the order
newest to oldest (but not necessarily consecutively); that is, SAMPLE
(3) must be returned before (2), and (2) before (1).
The remaining unique nodes can appear in any order, and may be
interspersed between the duplicately named and ordered words. If this
ordering is not possible, then TRAVERSE-WORDLIST *must* call xt with
only the FINDable definition of SAMPLE.
TRAVERSE-WORDLIST is affected by the operation of FORGET or MARKER,
and the results of a traversal *must not* return words that are
unFINDable after execution of FORGET or MARKER. Additionally, any
operation during the execution of xt-node that modifies the structure
of the wordlist (that is, defining words such as : (colon) or CREATE,
or execution of FORGET or MARKER) results in undefined behaviour.
3. Rationale
------------
3.1 NT>STRING Restrictions
--------------------------
Implementations may choose to point to the string in the word's header
built when the word is defined: point to a static buffer that is
reused on each call; point to a dynamic buffer that is replaced on
each call: or some other method. Whatever technique is used, the
following are required of the string returned by NT>STRING;
The string is read-only; operations that write to the string are not
permitted.
Reading outside of the bounds (addr) to (addr+count) is not
permitted.
The lifetime of the string is only guaranteed up to the next call of
NT>STRING; subsequent calls to NT>STRING must assume that a different
address, length and contents will be returned.
If the string is needed outside of this lifetime, it must be copied to
another buffer. The alternative - providing a buffer for NT>STRING's
used - has been rejected on the grounds that the buffer would have to
be of an unknown length. Although names of definitions are required
to be at least 31 characters in length (/DPANS 3.3.1.2 Definition
names/) there is no specified upper limit.
3.2 Order of node visits
------------------------
There are a number of orders in which nodes may be visited in a
wordlist, each of which depends on the specific implementation of
wordlists. Although they are often based on hash tables, no
assumptions are made in the ANS Forth specification about internal
implementations.
To conform with this and to simplify implementation of TRAVERSE-
WORDLIST and to allow various techniques to be used for wordlists, no
guarantee is given as to any specific order of node visits to each
word with the exception given in 2.2.
3.3 Use of the stack by TRAVERSE-WORDLIST
-----------------------------------------
Removal of the xt and wid by TRAVERSE-WORDLIST before invoking the xt
ensures that the data stack is not blocked; it is not permitted to
maintain control information for the traversal on the data stack.
This is to allow parameters beyond the nt to be passed to xt.
For instance, the caller may wish to maintain a count of nodes
visited on the stack.
4. Examples
-----------
: WORDS-COUNT ( x nt -- x' u ) DROP 1+ TRUE ;
0 ' WORDS-COUNT WID TRAVERSE-WORDLIST . ( count of nodes visited)
prints x', where x' is a count of the total number of nodes in the
wordlist WID.
: ALL-WORDS NT>STRING CR TYPE TRUE ;
['] ALL-WORDS GET-CURRENT TRAVERSE-WORDLIST
prints the names of words in the current compilation wordlist.
: CONTAINS-STRING NT>STRING 2OVER
SEARCH IF CR TYPE THEN FALSE ;
S" COM" ['] CONTAINS-STRING GET-CURRENT TRAVERSE-WORDLIST
prints the name of a word containing the string "COM", if it exists,
and then terminates.
4. Experience
-------------
The wordlist traversal functionality is available through very
similar words in
Win32Forth VOC-ITERATE
VFX WalkWordList
iForth doWORDS
ciForth FOR-WORDS
lxf/ntf WALK-WORDLIST
Commonly used to implement WORDS, these functions and others are
individual solutions with onflicting stack requirements and names.
However, all provided similar functionality to the proposed TRAVERSE-
WORDLIST, namely, the ability to inspect the individual entries in a
wordlist.
5. Remarks
----------
For this version V5 of the RFD dated 25 July 2013:
None so far.
For previous versions:
Krishna Myneni raised the topic in the comp.lang.forth newsgroup (http:
//
groups.google.com/group/comp.lang.forth/browse_frm/thread/
b5f6ad076607e0b8/0dde8de4c483b9e2#0dde8de4c483b9e2)
Krishna Myneni requested specific orderings. As there are many
possible orderings that may be useful for processing nodes, TRAVERSE-
WORDLIST can be used to support them by, for instance, building a
list during execution of xt-node and sorting the results.
Peter Fälth adds: "My system (lxf/ntf) supports such a word (but with
some differences); my name is WALK-WORDLIST ( xt wid -- ). I have
also a WALK-ALL-WORDLISTS ( xt -- ) that actually is more used I
suggest also to add such a word."
Bruce McFarling requested: that words with multiple definitions are
returned in newest to oldest order. Addressed above, but a use case
and rationale would be useful.
Bruce McFarling and Peter Fälth would prefer that xt-node use a "halt?"
flag; i.e., to halt on a non-zero return value, rather than on zero.
Due to changes in the TRAVERSE-WORDLIST signature, halt on zero has
been reinstated for xt-node.
Peter Fälth, Anton Ertl and others do not like the nt return value
from TRAVERSE-WORDLIST. It has been removed, and xt-node word can use
one of its agreed parameters to return a flag or last nt.
Stephen Pelc suggested that the stack signature be simplified to
TRAVERSE-WORDLIST ( xt-node wid -- ) with xt-node ( nt -- u ). Bruce
McFarling notes that there is a loss of functionality as xt-node
cannot operate on additional information on the stack; since under
the modified signature, the TRAVERSE-WORDLIST word is free to be
using the stack underneath the nt for its own purposes.
Various discussions on return flag values, the position of the xt on
the stack for TRAVERSE-WORDLIST, the return values have been
addressed.
Bruce McFarling proposes NAME>... here described as NT>... words.
Andrew Haley proposes a completely different mechanism for traversal;
although interesting, it has not been fully explored and is not
explained here.