Correct. This is a change.
> Colon definitions will contain a
> list of addresses of the words it is defined of.
Correct. This is normal.
> In code definitions
> the first address is zero in which case it will be followed by an
> address which is to the code for this word.
Correct. This is a change.
> The address in any colon
> definition using this word will be used the same as any other word,
Correct. This is normal.
> but the NEXT code will have to check every colon address for the zero
> escape address, no?
Yes, it will have to check for zero constantly. This adds some
overhead, but that may be compensated for by other changes. See below.
> Is that simpler and faster than using a CFA?
FYI, I've only written one Forth interpreter, and I'm not a Forth
programmer. I coded mine in C.
Simpler? Yes and/or no.
Technically, this only requires a conditional, but NEXT can be combined
with other code e.g., ENTER. I.e., the code can become more optimized,
especially if C is used. E.g., you might add a switch() to execute
code the primitives within the same routine, if byte code is being used.
Faster? ...
I'm not 100% sure, but I think it is faster, at least for C code.
It might be slower or the same for assembly, as not as much overhead is
present for Forth coded in assembly. This adds the overhead of a
conditional to NEXT, but it eliminates the C overhead of calling the
CFA, i.e., a C function, for every Forth word. This is probably
negligible for assembly instead of C. Assembly won't have the overhead
of parameter stack setup (prolog) and cleanup (epilog), nor calling
convention code needed to call a C function.
> I'm not experienced in writing Forth interpreters. I was reading a
> bit on this and there seemed to be a distinction between NEXT and
> ENTER. If so, I suppose the zero check only needs to be done on the
> first word in a definition to see if it is a colon or a code word.
>
FYI, this should all be qualified with AIUI and/or IIRC ...
QUIT is the code for the outer interpreter or text interpreter.
This parses text and looks up words in the dictionary to compile or
execute them.
NEXT is the code for the inner interpreter or address interpreter.
This interprets the address list for ITC (indirect threaded code) or
DTC (direct threaded code). There are also TTC and STC Forths. TTC
(token threaded code) is like the byte code interpreted BASICs of 8-bit
yesteryear. STC (subroutine threaded code) is basically normal
assembly code or a Forth compiler.
ENTER (a.k.a. doCOL) is the CFA (code field address) for colon
definitions. It switches interpretation from the word currently being
interpreted to the next word to be interpreted. It saves the
instruction pointer and adjusts it to the next word so that NEXT moves
to the next word to interpret. I.e., this is the equivalent of calling
another function from the current function in C, but for threaded code.
This could be be thought of as a function prolog (or prologue).
EXIT (a.k.a. ;S ) is a word that stops the interpretation for the
current word, and returns to execution of the prior word. I.e., this
is equivalent to returning from a function in C, but for threaded code.
This could be thought of as a function epilog (or epilogue).
A primitive, or code word, or low-level word, is a word whose CFA is
something other than ENTER (or doCOL). I.e., a word which is not
interpreted for an ITC or DTC Forth. In other words, it's a routine
which does something other than interpretation (i.e., ENTER). In most
cases, that's some special low-level operation, e.g., doCON for
constant, doVAR for variable, the CFA for EXIT to return from
INTERPRET to QUIT, the CFA for BYE to exit Forth, or the specific
routines for primitives, like >R R> DUP SWAP DROP OVER ROT etc. If a
Forth is built up from primitives, there will be 30 to 60 of primitives,
typically.
...
So, what happens for threaded code, is that NEXT starts interpretation
of the threaded code (i.e., address list or offset list). NEXT obtains
the CFA to execute from the code address, which is usually ENTER, but
may be other addresses for primitives etc. ENTER saves interpretation
for the prior word and starts interpretation for the current word. EXIT
stops interpretation for the current word, resets interpretation to the
prior word, and returns to NEXT. If the CFA of a word is something
other than ENTER then the code for word is executed, i.e., primitive.
The interpretation is moved from word to word repeatedly via
ENTER/EXIT. Nothing is actually executed in threaded code until a
primitive, or code word, or low-level word has a CFA other than ENTER.
Then, some work is done, e.g., by calling doCON, doVAR, primitive
for R> >R DUP DROP etc, instead of calling ENTER . Under normal
circumstances, the interpreter is calling ENTER and EXIT many times,
but the majority of Forth code in an ITC/DTC Forth which is built up
from primitives is threaded code. I.e., migrating interpretation from
word to word by calling ENTER repeatedly is "unnecessary" as most words
are interpreted, i.e., have a CFA of ENTER.
With CFA-less interpreter, this repetition of calling ENTER is
eliminated. Without the CFA, the threaded code basically becomes a
linked-list or binary tree is created with the terminating leaves being
primitives, and the nodes are the operations in each Forth word. The
interpreter is modified so that all Forth words are interpreted by
default, e.g., CFA is removed, so ENTER is merged into NEXT. All words
will be interpreted by default, unless something says to do something
other than interpret words, e.g., zero escape code. But, as I
mentioned before, removing the CFA may break DOES> code. I.e., you
could mess around with it for Forth, but it seems more useful to me for
creating your own interpreted language, which doesn't require all of
Forth's functionality.