FORTH control structures - A proposal

4 views
Skip to first unread message

Colin Plumb

unread,
Oct 19, 1986, 11:08:28 PM10/19/86
to
: LINE-EATER 80 EXPECT DROP ;

In the July, 1986 issue of _Dr. Dobb's Journal of Software Tools_ (the annual
FORTH issue), George W. Shaw II proposes some extensions to the standard FORTH
control structures. While I think some of his ideas are good, I feel that they
got rather mangled somewhere along the way. (i.e. Aren't implemented well.)
Herein are some of my suggestions.

The main idea in George Shaw's article was that of using linked lists instead
of the stack to store information about forward branches. Since a cell has to
be allocated after BRANCH or ?BRANCH in a definition to store the branch
address, which isn't available until the branch is resolved, it can be put to
use to allow a BEGIN-WHILE loop to have the structure: "BEGIN", any number of
"WHILE"s (including 0), and "REPEAT" or "UNTIL", as in this example:

: FOO BEGIN aaa WHILE bbb WHILE ccc UNTIL ;

where the linked list uses a variable as its head pointer, resulting in this
dictionary organization jusrt before "until" resolves the works:

{header stuff} aaa ?BRANCH ___ bbb ?BRANCH ___ ccc
^ | ^ | ^
STACK ----------+ null <+ +-------------+ +----- LIST-PTR

and, after UNTIL, we have:

+--------------------------------------------+
v |
{header stuff} aaa ?BRANCH ___ bbb ?BRANCH ___ ccc ?BRANCH ___
| | ^
+-------------->+------------------+

which is just as it should be. This can also be used to allow a word such as
THENS, which fixes all the IF aaa IF bbb IF ccc THEN THEN THEN stuff that
you often have to wade through. Before the first THEN, things would look like
this:

{header stuff} ?BRANCH ___ aaa ?BRANCH ___ bbb ?BRANCH ___ ccc
| ^ | ^ | ^
null <-+ +-------------+ +-------------+ +- IF-PTR

and THENS could resolve the whole linked list like so:

{header stuff} ?BRANCH ___ aaa ?BRANCH ___ bbb ?BRANCH ___ ccc
| | | ^
+-------------->+-------------->+------+

as it should be. Just plain THEN would just resolve the last link of the
list. ELSE would change the list from this:

{header stuff} ?BRANCH ___ aaa
| ^
null <-+ +---- IF-PTR

to this:

+-----------------+
| v
{header stuff} ?BRANCH ___ aaa BRANCH ___
| ^
null <-+ +--- IF-PTR

by just patching the last link of the list.

The other control word which can benifit from this is the F-83 standard
LEAVE, which branches _immediately_ to past the end of the loop. Since there
is no constant number of LEAVEs in a DO-LOOP, the stack can't be used to store
forward branch addresses for LOOP to patch. A linked list works admirably int
this application.

Since DO-LOOPs and BEGINs can be nested, it is necessary to put the head
pointer for the outer loop on the stack while the inner loop uses it for
bookkeeping. Since DO and BEGIN already use the stack to store backwards-branch
information, the addition of an extra value there should not upset anyone
else's compile-time use of the stack.

One really wonderful (I modestly think) consequence of this system is that IF,
ELSE, and THEN make _no_ use of the stack whatsoever at compile time, thus
allowing words like this:

: BAR aaa BEGIN bbb IF ccc REPEAT ddd THEN eee ;

That is, looping constructs such as BEGIN and REPEAT can be put inside IF
statements. The most crying need for this is in the use of a LEAVE command.
It's easy to write:

: HUNT
0 1000 DO aaa IF ." Search succeeds for n = " . LEAVE THEN bbb LOOP ;

but how do you add a "Search fails" message, to be printed if the loop ends by reaching 1000 with no result? The answer is:

: HUNT
0 1000 DO aaa IF ." Success! n = " . ELSE bbb LOOP ." Failed" THEN ;

which I think is rather more elegant than putting a flag on the stack and
testing it after the LOOP. The only catch is that a DO-LOOP puts some values
on the return stack, which have to be cleaned up before EXIT sees them and
crashes the system. An ENDLOOP word can be added somewhere between IF and ELSE
to straighten this out:

: ENDLOOP ( --) R> ( save return address to calling word)
R> R> 2DROP ( get rid of index and limit)
>R ( put return address back)
;

This use of intersecting control structures is quite commonly used in
interpreted BASIC (yes, I admit it, that's where I started), and I don't
see why I should have to give up a useful control structure when using a
better language.

Now, the guts of this article: the code.

Note: If you can think up better names for any of the words I use in this
code, I'd loove to hear them. I _do_ use a few clumsy multi-word
names, which isn't great.

******* WARNING: This code is UNTESTED. It is in no *******
******* way guaranteed to work, and in fact probably ******
***** _won't_ work the forst time around. Good luck! *****

( A few things you might not have - F-83 standard and otherwise)

: 0! 0 SWAP ! ; ( fairly obvious)
: >MARK HERE 2 ALLOT ; ( -- addr) ( see F-83 standard)
: >RESOLVE HERE SWAP ! ; ( addr --) ( ditto)
: <MARK HERE ; ( -- addr)
: <RESOLVE , ; ( addr --)

( The new stuff)

VARIABLE IF-LIST
VARIABLE DO-LIST
VARIABLE BEGIN-LIST

: INIT-LISTS IF-LIST 0! DO-LIST 0! BEGIN-LIST 0! ;

: >LISTMARK ( addr --) ( mark forward branch origin using list "addr")
HERE SWAP DUP ( here addr addr)
@ ( here addr list-head)
, ( here addr) ( point this link to last one)
! ; ( ) ( make list-ptr point to this link)

: >LISTRESOLVE ( addr --) ( resolve forward branch)
DUP @ ( addr list-head)
DUP @ ( addr list-head new-list-head)
ROT ! ( list-head) (make list-ptr point to previous link)
>RESOLVE ; ( ) ( do standard resolution stuff)

: >RESOLVELIST ( addr --) ( resolve entire list)
DUP 0! @ ( list-head) ( go down one link and set list-ptr to null)
BEGIN
DUP WHILE ( list-head) ( when the list-head is null, we've reached the end)
DUP @ SWAP ( new-list-head list-head)
>RESOLVE ( new-list-head) ( resolve this link)
REPEAT ( null)
DROP ; ( )

: IF ( --)
COMPILE ?BRANCH ( just like regular IF)
IF-LIST >LISTMARK ; ( mark forward branch)
; IMMEDIATE

: ELSE ( --)
COMPILE BRANCH
IF-LIST >LISTMARK ( These two lines are rather tricky)
IF-LIST @ >LISTRESOLVE ( Follow the algorithm to prove they work)
; IMMEDIATE

: THEN ( --)
IF-LIST >LISTRESOLVE ( does this need explaining?)
; IMMEDIATE

: THENS ( --) ( like THEN THEN THEN...., as often as necessary)
IF-LIST >RESOLVELIST
; IMMEDIATE

: BEGIN ( -- old-list-ptr back-branch-ptr)
BEGIN-LIST DUP @ ( 'list-ptr contents-of-list-ptr)
SWAP 0! ( old-list-ptr)
( The list-ptr now contains null, ready for this nesting level)
<MARK ( old-list-ptr back-branch-ptr)
; IMMEDIATE

: WHILE ( --)
COMPILE ?BRANCH
BEGIN-LIST >LISTMARK
; IMMEDIATE

: REPEAT ( old-list-ptr back-branch-ptr --)
COMPILE BRANCH
<RESOLVE ( old-list-ptr)
BEGIN-LIST DUP >RESOLVELIST ( old-list-ptr 'list-ptr)
! ( )
; IMMEDIATE

: UNTIL ( old-list-ptr back-branch-ptr --)
( The stack comments here are the same as those for REPEAT. These comments on what's going on also apply to REPEAT.)
COMPILE ?BRANCH
<RESOLVE ( fix branch back to BEGIN)
BEGIN-LIST DUP >RESOLVELIST
( Make all the WHILEs point to just past the UNTIL, keeping a copy of the address of BEGIN-LIST...)
! ( ...to restore the old list pointer to)
; IMMEDIATE

: DO ( -- old-list-ptr back-branch-ptr)
DO-LIST DUP @ ( 'list-ptr old-list-ptr) ( save old DO-LIST)
SWAP 0! ( old-list-ptr) ( set DO-LIST to null)
COMPILE (DO) ( old-list-ptr) ( DO run-time part)
<MARK ( old-list-ptr back-branch-ptr)
; IMMEDIATE

: LEAVE ( --)
COMPILE BRANCH
DO-LIST >LISTMARK
; IMMEDIATE

: LOOP ( old-list-ptr back-branch-ptr --) ( see BEGIN and UNTIL for comments)
COMPILE (LOOP)
<RESOLVE
DO-LIST DUP >RESOLVELIST
!
; IMMEDIATE

: +LOOP ( old-list-ptr back-branch-ptr --)
COMPILE (+LOOP)
<RESOLVE
DO-LIST DUP >RESOLVELIST
!
; IMMEDIATE

: ENDLOOP ( --) ( clear LOOP's gunk off the return stack)
R> R> R> 2DROP >R ; ( you've seen the comments already!)

EXIT ( ignore final comments)

Note: I did copy quite a bit of the preceeding code from the DDJ article.
DDJ has a policy of allowing use of any code published therein for
non-commercial use. If you intend to sell this code in a product, you
might want to talk to them (M&T Publishing, Inc., 501 Galveston Dr.,
Redwood City, CA 94063, U.S.A.) first. Myself, I've no objections
*whatsoever* to *any* use of these ideas - I'd find it highly flattering.
I'd like it even better if you'd tell me about it, or even give me a copy.
Since George Shaw is proposing this as a new standard, I don't think he'll
mind anyone spreading it around. (I'm proposing mine as a *better*
standard)

I hope you find this interesting! - Colin Plumb (ccp...@watnot.UUCP)

Quote: : LIFE BEGIN 5 0 DO WORK LOOP SLEEP SLEEP REPEAT ;
( See? With these new control structures you don't need AGAIN or 0 UNTIL!)

Colin Plumb

unread,
Oct 20, 1986, 5:46:55 PM10/20/86
to
: LINE-EATER 80 EXPECT DROP ;

Here are a few things to add to my previous posting:

--It would be a good idea to *execute* INIT-LISTS before starting to use any of
the new control structures. >RESOLVELIST will mess things up *badly* if you
give it phoney pointers. You might also want to add INIT-LISTS to ABORT.

--Depending on how much you trust yourself, and if memory address 0 is vital to
the system, you might want to have >LISTRESOLVE check for a null list pointer,
or perhaps change the value of null from 0 to something else (maybe even an
explicit VARIABLE NULL).

--If there's demand from people who have chronically non-standard systems, I
can post definitions for things like BRANCH, ?BRANCH, (DO), etc., In FORTH.

--You can't use a ... DO ... IF ... LOOP ... ELSE ... THEN ... syntax if you
follow some of the reccomendations I've seen around for defining ?DO.
(If you haven't seen this, ?DO skips the entire loop if the initial index is equal to the limit. It works like 2DUP - IF DO ... LOOP THEN)
A common definition is this:

: ?DO COMPILE 2DUP COMPILE - [COMPILE] IF [COMPILE] DO ; IMMEDIATE
: DO COMPILE 1 COMPILE IF [COMPILE] DO ; IMMEDIATE
: LOOP [COMPILE] LOOP [COMPILE] THEN ; IMMEDIATE
: +LOOP [COMPILE] +LOOP [COMPILE] THEN ; IMMEDIATE

which is patently absurd, since it makes just plain DO-LOOP worse in both size
and speed _at_run_time_. It would be far better to make these changes:

VARIABLE NULL ( Junk variable for THEN to patch)
: ?DO COMPILE 2DUP COMPILE - [COMPILE] IF [COMPILE] DO ; IMMEDIATE
: DO [COMPILE] DO NULL ; IMMEDIATE
(NULL leaves phoney address for LOOP to patch)

which wouldn't change the run-time behaviour of a DO-LOOP _at_all_. (THEN has
no run-time behaviour. It just finds the last IF, and tells it the address to
branch to if the condition fails. It does this by putting the address to branch
to in an address left for it (on the stack). If NULL is left there, it just puts
garbage in NULL, and has no other effect.)

Unfortunately, since both of these approaches use IF, and IFs have to be
properly nested, this doesn't solve the original problem. It's hard to come up
with a _really_ elegant solution to this problem, since LOOP and +LOOP have to
know whether the loop was started by DO or ?DO. Since branching past a loop is
a forward branch, it's easy to use a linked list to store the branch addresses,
but the only way I could think of for LOOP to know whether DO or ?DO started the
loop (and thus whether it needs to patch a forward branch address or not) is to
check the address it's supposed to loop back to, and compare it with the last
location that needs patching. If it finds they differ by 2, LOOP should patch
the address. Here's an example of when LOOP shold patch:

null <--+ +--- ?DO-PTR
| v
... (?DO) ___ ......
^
STACK ------+

and an example of when it shouldn't:

null <--+ +-- ?DO-PTR
| v
... (?DO) ___ ...... (DO) ___ ......
^ ^
| STACK -----+
+-- STACK-1

There are other ways around this problem. Another idea is to check the cell
just before the address to be LOOPed back to. If it contains (DO), don't patch.
If it contains 0 (or some other special value), do patch. The fact that this
makes LOOP's behaviour depend on something other than its input data (the values
on the stack it uses, and a linked list pointer assigned to it), makes it seem
less elegant to me. If anybody out there has any better ideas, please tell me!!
I'd love to hear them.

The code to add this to what's not already in my last posting is here:

VARIABLE ?DO-LIST

: INIT-LISTS IF-LIST 0! BEGIN-LIST 0! DO-LIST 0! ?DO-LIST 0! ;
INIT-LISTS ( Forgot this last time!)

: (?DO) ( limit index --) ( ?DO run-time part)
R> ( limit index addr)
( Save return address. Only necessary in high-level FORTH)
-ROT ( addr limit index) ( : -ROT ROT ROT ;)
2DUP = IF ( addr limit index) ( If we don't want to do the loop...)
2DROP ( addr) ( ... get rid of the unnecessary limit & index,)
@ ( addr') ( and fetch the address of the end of the loop)
( NOTE: A pre-incrementing FORTH is assumed)
ELSE
SWAP >R >R ( addr) ( put limit & index on return stack in usual order)
2+ ( addr') ( skip end-of-loop address)
THEN
>R ; ( ) ( restore modified return address)

: ?DO ( -- old-list-ptr back-branch-addr)
COMPILE (?DO) ( compile run-time part)
?DO-LIST >LISTMARK ( do this end of a forward branch)
DO-LIST DUP @ ( save old DO-LIST)
SWAP 0! ( and get ready for this level of LEAVEs)
<MARK ( leave back-branch-addr for LOOP)
; IMMEDIATE

: LOOP ( old-list-ptr back-branch-addr --)
COMPILE (LOOP) ( compile run-time part)
DUP ( keep one copy of resolution address for reference)
<RESOLVE ( and resolve the other)
?DO-LOOP @ ( get address of last ?DO)
- 2 = ( and check if they're close)
IF ( If so, we want to patch)
?DO-LOOP >LISTRESOLVE ( DUP could have been added above, but that
would have required an ELSE DROP. Either way works.)
THEN ; IMMEDIATE

: +LOOP ( old-list-ptr back-branch-addr --)
COMPILE (+LOOP)
DUP <RESOLVE
?DO-LOOP @
- 2 = IF
?DO-LOOP >LISTRESOLVE
THEN ; IMMEDIATE

EXIT ( Ignore further text)

Like last time, this code hasn't been tested. Use with caution.
(If somebody knows of a FORTH that can run on a VAX under UNIX, I could test
it, but it looks as if I'm gonna have to learn VAX assembler first.)


--One thing I've found useful is a ?IF word, which behaves like ?DUP IF
(You could try : ?IF COMPILE ?DUP [COMPILE] IF ; IMMEDIATE), but
doesn't test the value on the stack twice. Such a word _should_ be written
in assembler for speed (Which is the whole POINT of not just using ?DUP IF),
but a high-level definition is:

: (?IF) ( 0 -- | f -- f)
R> ( f addr) ( save return address)
OVER IF ( f addr) ( check flag)
2+ ( f addr') ( skip branch address)
ELSE ( 0 addr)
@ ( 0 addr') ( take branch)
SWAP DROP ( addr') ( or NIP if you have it in assembler)
THEN ( f addr' | addr' )
>R ; ( f | )

: ?IF ( --)
COMPILE (?IF)
IF-LIST >LISTMARK
; IMMEDIATE

This should be fairly easy to convert to ML. (You only need to convert (?IF).)

Note: All _this_ code is my own work. You're welcome to use it for any purpose
you wish.

I hope you find this useful! -Colin Plumb (ccp...@watnot.UUCP)

Quote : LIFE BEGIN WORK WORK WORK WORK WORK SLEEP SLEEP REPEAT ;
( "WORK" 5 times is 5 cells. "5 0 DO WORK LOOP" is 8, and slower as well.)

Reply all
Reply to author
Forward
0 new messages