Forth in C

Marc de Groot

unread,

Nov 2, 1989, 3:37:36 PM11/2/89

to

I am working on a Forth interpreter in C right now.

Yuck.

I thought C was a good language for writing systems-level code.

I only need one hack to make this work: I need to be able to
take the address of a label, i. e.

label1: /* Forth primitive code here */
.
.
.
forthdict[foo] = &label1;

Why?

Well, Forth (at least fig-Forth) is doubly indirect-threaded. That is to
say that the IP (instruction pointer) is a pointer to an array of pointers
to pointers to code. In other words, IP is a pointer into a parameter
field, which contains pointers to code fields (CFAs). The code fields contain
the addresses of machine code.

The fig-Forth address thread interpreter looks like this (except it's in
assembler):
void ***ip; /* Forth's interpreter pointer */
void **w; /* Temporary register */

next: w = *ip++; /* Fetch the CFA that IP points at */
jmp *w; /* Fetch what the CFA points at & jump there */
/* The code just jumped to will end with jmp next */

The best you can do in C is this:
void (***ip)(); /* Ptr to a ptr to a ptr to a function */
void (**w)();

next: w = *ip++;
(**w)();
goto next;

In fig-Forth, you do two dereferences on a pointer and a branch.
In C, you do two dereferences and a CALL to a C function. That
takes up a lot of unnecessary time.

There's a way around the overhead of a call to a function:
typedef TOKEN short;
TOKEN **ip;
TOKEN *w;

next: w = *ip++;
switch(*w) {
case LIT: ... goto next;
case EXECUTE: ... goto next;
.
.
.
}

In this latest method, w points at a token, which becomes the input to
a huge switch statement. All of the primitives are coded as cases in the
switch. In short, the entire interpreter becomes a single C function.

One advantage to this approach is that ip, w, and some temporary variables
can be declared as registers. Of course, the way most C compilers optimize
stuff, it still won't approach the efficiency of hand-coded assembler. Foo.

So...if I could get the base address of the code in each "case" clause above,
I could drop into assembler for one instruction in my C function and compile
and indirect branch instead of a switch statement.

--
Marc de Groot (KG6KF) These ARE my employer's opinions!
Noe Systems, San Francisco
UUCP: uunet!hoptoad!noe!marc
Internet: ma...@kg6kf.AMPR.ORG

Mitch Bradley

unread,

Nov 6, 1989, 1:34:16 PM11/6/89

to

> I only need one hack to make this work: I need to be able to
> take the address of a label, i. e.

Sorry, you lose. It can't be done in portable C. Labels don't have
visible addresses. Their scope is entirely contained within the enclosing
subroutine.

What you are suggesting is the first thing that I thought of doing when
I was writing C Forth 83, but I had to give up on that implementation
strategy. The two options that I know of are indirect subroutine calls
(relatively slow) and a big switch with an integer selector (which is what
C Forth 83 uses).

Mitch

Marc de Groot

unread,

Nov 17, 1989, 4:48:32 AM11/17/89

to

I wrote, about double indirect threaded Forth in C:

>> I only need one hack to make this work: I need to be able to
>> take the address of a label, i. e.

Mitch Bradley replied:

>What you are suggesting is the first thing that I thought of doing when
>I was writing C Forth 83, but I had to give up on that implementation
>strategy. The two options that I know of are indirect subroutine calls
>(relatively slow) and a big switch with an integer selector (which is what
>C Forth 83 uses).

Here's a third way to do it that I just tried. Thanks to Bob Kelley
(sequent!rjk) who gave me the idea.

I use the asm("..."); pseudo-op to assemble an indirect branch instruction.
This turns into some very efficient code, even with my awful pcc-derived
80286 C compiler. This, of course, is "non-portable" in that it is
implementation-dependent. On the other hand, the actual amount of effort
necessary to port this to another environment is minimal. It's tantalizing...

^Marc

/*
* Prototype Forth interpreter which REALLY uses a double-indirect jump.
* Note that it is assumed that sizeof(cell) == sizeof(void *)
*/
typedef short cell; /* typedef long cell for 32-bit pointers */
#include <setjmp.h>
jmp_buf jbuf;
#define NPRIMS 3 /* Number of primitives */
#define NSTK 50 /* Number of elements in each stack */
#define NDICT 50 /* Number of cells in the dictionary */
#define ONE 0
#define DROP 1
#define BRANCH 2
void *primaddr[NPRIMS]; /* Array of addresses of primitive code */
int naddr; /* Subscript into primaddr[] */
cell stk[NSTK]; /* Parameter stack */
cell rstk[NSTK]; /* Return stack */
cell dict[NDICT]; /* The dictionary */

main()
{
void **ip; /* The interpreter pointer */
void *w; /* A scratch register used by next */
cell *sp; /* The stack pointer */
cell *rp; /* The return stack pointer */

naddr = -1;
setjmp(jbuf); /* entry() inside the loop below calls longjmp() */

/* This loop initializes the array primaddr[] with the addresses
* of primitive code. The switch statement contains all of the code
* for primitives in the system.
*/
while (++naddr < NPRIMS) {
switch(naddr) {
case ONE: entry(); *--sp = 1; goto next;
case DROP: entry(); sp++; goto next;
case BRANCH: entry(); ip += (cell)*ip; goto next;
}
}

/* Now that the prim addrs are initialized, we build the
* equivalent of a colon definition's parameter field
* in the dictionary.
*/
dict[0] = (cell)primaddr[ONE]; /* addr of "1" */
dict[1] = (cell)primaddr[DROP]; /* addr of "drop" */
dict[2] = (cell)primaddr[BRANCH]; /* addr of "branch" */
dict[3] = (cell)(-6); /* branch offset */

/* Now we initialize Forth's registers */
ip = (void **)dict; /* Point IP at the : def we built */
sp = &stk[NSTK]; /* Init param stack pointer */
rp = &rstk[NSTK]; /* Init return stack pointer */

/* Here's the inner interpreter */
next: w = *ip++; /* W <- CFA */
next1: asm(" jmp *%ax"); /* JMP [W] */
}

entry()
{
int i;
primaddr[naddr] = (void *)(&i)[1];
longjmp(jbuf, 1);

Philip Koopman

unread,

Nov 18, 1989, 7:03:04 AM11/18/89

to

In article <7...@noe.UUCP>, ma...@noe.UUCP (Marc de Groot) writes:
> Here's a third way to do it that I just tried. Thanks to Bob Kelley
> (sequent!rjk) who gave me the idea.
>
> I use the asm("..."); pseudo-op to assemble an indirect branch instruction.
> This turns into some very efficient code, even with my awful pcc-derived
> 80286 C compiler. This, of course, is "non-portable" in that it is
> implementation-dependent. On the other hand, the actual amount of effort
> necessary to port this to another environment is minimal. It's tantalizing...

How much faster did it go? The PCC compilers I've seen tend to
use an indirect jump table to implement case statements, which
isn't that far away from what you're doing. Did you measure
enough speed improvement to be worthwhile? Does your C compiler
us jump tables?

-- Phil

Marc de Groot

unread,

Nov 21, 1989, 8:41:57 PM11/21/89

to

In article <70...@pt.cs.cmu.edu> koo...@a.gp.cs.cmu.edu (Philip Koopman) writes:
>In article <7...@noe.UUCP>, ma...@noe.UUCP (Marc de Groot) writes:
>> I use the asm("..."); pseudo-op to assemble an indirect branch instruction.
>> This turns into some very efficient code, even with my awful pcc-derived
>> 80286 C compiler.

>How much faster did it go? The PCC compilers I've seen tend to
>use an indirect jump table to implement case statements, which
>isn't that far away from what you're doing. Did you measure
>enough speed improvement to be worthwhile? Does your C compiler
>us jump tables?

Yes, my C compiler does use jump tables. No, I did not actually time
it. I looked at a disassembly of the code. My token-threaded Forth
has the dictionary in an array of short. That means that every
reference into the dictionary requires a left-shift of the subscript
and adding the dictionary's base address before fetching. Then it
sets up the arguments for the jump through the table. The indirect-
threaded code that I posted does not need to do this and uses
considerably fewer instructions.

^M

>
>-- Phil